Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

Non-Shared File Systems

Contents

About Directories and Files

LSF is designed for networks where all hosts have shared file systems, and files have the same names on all hosts.

LSF includes support for copying user data to the execution host before running a batch job, and for copying results back after the job executes.

In networks where the file systems are not shared, this can be used to give remote jobs access to local data.

Supported file systems

UNIX

On UNIX systems, LSF supports the following shared file systems:

Windows

On Windows, directories containing LSF files can be shared among hosts from a Windows server machine.

Non-shared directories and files

LSF is usually used in networks with shared file space. When shared file space is not available, LSF can copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes. See Remote File Access for more information.

Some networks do not share files between hosts. LSF can still be used on these networks, with reduced fault tolerance. See Using LSF with Non-Shared File Systems for information about using LSF in a network without a shared file system.

Using LSF with Non-Shared File Systems

LSF installation

To install LSF on a cluster without shared file systems, follow the complete installation procedure on every host to install all the binaries, man pages, and configuration files.

Configuration files

After you have installed LSF on every host, you must update the configuration files on all hosts so that they contain the complete cluster configuration. Configuration files must be the same on all hosts.

Master host

You must choose one host to act as the LSF master host. LSF configuration files and working directories must be installed on this host, and the master host must be listed first in lsf.cluster.cluster_name.

You can use the parameter LSF_MASTER_LIST in lsf.conf to define which hosts can be considered to be elected master hosts. In some cases, this may improve performance.

For Windows password authentication in a non-shared file system environment, you must define the parameter LSF_MASTER_LIST in lsf.conf so that jobs will run with correct permissions. If you do not define this parameter, LSF assumes that the cluster uses a shared file system environment.

Fault tolerance

Some fault tolerance can be introduced by choosing more than one host as a possible master host, and using NFS to mount the LSF working directory on only these hosts. All the possible master hosts must be listed first in lsf.cluster.cluster_name. As long as one of these hosts is available, LSF continues to operate.

Remote File Access

Using LSF with non-shared file space

LSF is usually used in networks with shared file space. When shared file space is not available, use the bsub -f command to have LSF copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.

LSF attempts to run a job in the directory where the bsub command was invoked. If the execution directory is under the user's home directory, sbatchd looks for the path relative to the user's home directory. This handles some common configurations, such as cross-mounting user home directories with the /net automount option.

If the directory is not available on the execution host, the job is run in /tmp. Any files created by the batch job, including the standard output and error files created by the -o and -e options to bsub, are left on the execution host.

LSF provides support for moving user data from the submission host to the execution host before executing a batch job, and from the execution host back to the submitting host after the job completes. The file operations are specified with the -f option to bsub.

LSF uses the lsrcp command to transfer files. lsrcp contacts RES on the remote host to perform file transfer. If RES is not available, the UNIX rcp command is used. See File Transfer Mechanism (lsrcp) for more information.

bsub -f

The -f "[local_file operator [remote_file]]" option to the bsub command copies a file between the submission host and the execution host. To specify multiple files, repeat the -f option.

local_file

File name on the submission host

remote_file

File name on the execution host

The files local_file and remote_file can be absolute or relative file path names. You must specific at least one file name. When the file remote_file is not specified, it is assumed to be the same as local_file. Including local_file without the operator results in a syntax error.

operator

Operation to perform on the file. The operator must be surrounded by white space.

Valid values for operator are:

>

local_file on the submission host is copied to remote_file on the execution host before job execution. remote_file is overwritten if it exists.

<

remote_file on the execution host is copied to local_file on the submission host after the job completes. local_file is overwritten if it exists.

<<

remote_file is appended to local_file after the job completes. local_file is created if it does not exist.

><, <>

Equivalent to performing the > and then the < operation. The file local_file is copied to remote_file before the job executes, and remote_file is copied back, overwriting local_file, after the job completes. <> is the same as ><

If the submission and execution hosts have different directory structures, you must ensure that the directory where remote_file and local_file will be placed exists. LSF tries to change the directory to the same path name as the directory where the bsub command was run. If this directory does not exist, the job is run in your home directory on the execution host.

You should specify remote_file as a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where the bsub command is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.

bsub -i

If the input file specified with bsub -i is not found on the execution host, the file is copied from the submission host using the LSF remote file access facility and is removed from the execution host after the job finishes.

bsub -o and bsub -e

The output files specified with the -o and -e arguments to bsub are created on the execution host, and are not copied back to the submission host by default. You can use the remote file access facility to copy these files back to the submission host if they are not on a shared file system.

For example, the following command stores the job output in the job_out file and copies the file back to the submission host:

bsub -o job_out -f "job_out <" myjob

Example

To submit myjob to LSF, with input taken from the file /data/data3 and the output copied back to /data/out3, run the command:

bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3

To run the job batch_update, which updates the batch_data file in place, you need to copy the file to the execution host before the job runs and copy it back after the job completes:

bsub -f "batch_data <>" batch_update batch_data

File Transfer Mechanism (lsrcp)

The LSF remote file access mechanism (bsub -f) uses lsrcp to process the file transfer. The lsrcp command tries to connect to RES on the submission host to handle the file transfer.

See Remote File Access for more information about using bsub -f.

Limitations to lsrcp

Because LSF client hosts do not run RES, jobs that are submitted from client hosts should only specify bsub -f if rcp is allowed. You must set up the permissions for rcp if account mapping is used.

File transfer using lscrp is not supported in the following contexts:

See Authorization options for more information.

Workarounds

In these situations, use the following workarounds:

rcp on UNIX

If lsrcp cannot contact RES on the submission host, it attempts to use rcp to copy the file. You must set up the /etc/hosts.equiv or HOME/.rhosts file in order to use rcp.

See the rcp(1) and rsh(1) man pages for more information on using the rcp command.

Custom file transfer mechanism

You can replace lsrcp with your own file transfer mechanism as long as it supports the same syntax as lsrcp. This might be done to take advantage of a faster interconnection network, or to overcome limitations with the existing lsrcp. sbatchd looks for the lsrcp executable in the LSF_BINDIR directory as specified in the lsf.conf file.


Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index