[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
This chapter describes some techniques you can use to determine the cause of a problem within an LSF desktop support environment. It also describes how the LSF batch commands behave in an LSF desktop support environment, and how each of the LSF scheduling policies affects LSF desktop support jobs.
- Desktop Server Stops Dispatching Jobs
- The Desktop Client Stops Working
- Writing LSF desktop support logs to a single directory
- Debugging the Desktop Client
- Debugging a Desktop Application
- Recovering from a Power Outage
- Job Blocked at the Desktop Server with Many File Transfers
- LSF Policies in LSF Desktop Support
[ Top ]
Desktop Server Stops Dispatching Jobs
Follow this procedure if you determine that the desktop server is not dispatching jobs.
If jobs are not being dispatched:
- Is the desktop server running?
- Are Apache and Tomcat running?
- Run ACH_TOP
/etc/lsfac_daemons stop
to shut down the desktop server.- Shut down and restart Apache and Tomcat.
- Start the desktop server.
- Are jobs being dispatched now?
With EGO management of LSF desktop support services enabled, you should not use the command lsfac_daemons to start Apache or Tomcat services. Instead, you should use the egosh command to start these services.
[ Top ]
The Desktop Client Stops Working
Follow this procedure if you determine that a desktop client stops running jobs.
If a desktop client is not working:
- Look at http://
LSF desktop support
_host/servlet/StatsViewer to see when was the last time the desktop client requested a job or returned status. Has the desktop client logged in within the last polling interval?- Look at the Job Status page at
http://
host_name/servlet/StatsViewer
. Are there jobs waiting to be run?
- No. There are no jobs to run.
- Yes. Continue with step 3.
- Can you ping the desktop server from the desktop client?
- No. There is a problem with the network.
- Yes. Continue with step 4.
- Restart the desktop client by shutting down the service SED and starting it again. Does the desktop client start to run a job?
[ Top ]
Writing LSF desktop support logs to a single directory
You can make it easier to find the Tomcat and Apache log files by choosing to write these files to the directory that contains log files for other LSF desktop support services.
You must restart Apache and Tomcat after changing the configuration.
Configure the Apache log files:
Configuration file Parameter and syntax Behavior apache/conf/ httpd.conf
Error Log path_name/error_log.host_name
CustomLog path_name/access_log.host_name common
Configure the Tomcat log files:
Configure the Tomcat shell script:
Configure the LSF desktop support log directory:
Configuration File Syntax Behavior AC_TOP/wscache.conf
wscache log path_name
[ Top ]
Debugging the Desktop Client
You can place a desktop client in debug mode to log all debug information regarding the desktop client. The desktop client (
SED
) records connection error and debug messages created during job execution in the Windows application event service. LSF desktop support administrators can easily retrieve these error messages using the remote event viewer or a terminal services session.LSF desktop support does not write any messages containing passwords or other authentication information to the Windows event service.
To set a desktop client to debug mode:
[ Top ]
Debugging a Desktop Application
LSF desktop support traps
stdout
andstderr
messages in files specified by thebsub -e
,-eo
,-o
, and-oo
options.Set a maximum log file size:
To prevent verbose applications from generating large log files, you can set a maximum log file size.
Configuration file Parameter and syntax Default Behavior SEDConfig.xml
<SEDMaxOutputLogSize>file_size
</SEDMaxOutputLogSize>
Not defined
The file size is unlimited
[ Top ]
Recovering from a Power Outage
Under normal circumstances, even after a power outage, LSF desktop support jobs should continue to run. Follow the procedures listed here to ensure resumption of normal processing.
After a power outage:
Restart the desktop server. In most cases, jobs will continue to run normally within the LSF desktop support.
If the desktop server cannot start:
- Check the file
sbatchd.log.
clustername
(in the directory specified in LSF_LOGDIR) to see if an event record is corrupted. If an event record is corrupted, the log will point to the corrupted line number, be located in either $ACH_TOP/work/.
clustername
.sbd/med.events
or $ACH_TOP/work/.
clustername
.sbd/sbd.events
.- If an event record is corrupted, contact Platform Technical Support for assistance.
If a job seems to be `stuck':
- Issue the
bkill
command to kill the job if you do not want the job to continue. Otherwise, issue the brequeue command to redispatch the job.- If
bkill
does not work:[ Top ]
Job Blocked at the Desktop Server with Many File Transfers
LSF desktop support supports a maximum of 32 file transfer requests with the
-f
option in thebsub
command. Specifying more than 32 file transfer requests can cause abnormal behavior.Use zip and unzip commands to reduce the number of file transfer requests.
In the following example,
myjob.exe
requires a total of 66 file transfers: 33 to copy files to the desktop client, and 33 to copy the results from the desktop client.bsub -f "data1>data1" ... -f "data33 > data33" -f "result1 < result1" ... "result33 < result33" myjob.exe
- Zip the data files together into one file. For example:
zip data.zip data1 data2 data3 ...data33- Create a job wrapper that unzips the data files, runs the executable and zips the results. For example, the wrapper
myjob.bat
might look like this:unzip data.zip myjob.exe zip result.zip result1 result2 result3 ... result33- Submit the job, transferring the data files, the wrapper and the executable to the desktop client, and transferring the zipped results file back from the desktop client. For example:
bsub -f "data.zip > data.zip" -f "myjob.bat > myjob.bat" -f "myjob.exe > myjob.exe" -f "result.zip < result.zip" myjob.bat- When the job is completed, unzip the result file:
unzip result.zipIf you do not have zip and unzip on your system, you can get them from the Internet, and install them on each desktop client as required.
[ Top ]
LSF Policies in LSF Desktop Support
Because LSF desktop support runs jobs on desktop clients rather than directly on LSF hosts, some LSF scheduling policies and commands behave differently or are unsupported in LSF desktop support.
Batch commands
Command Supported in
LSF desktop support?Description bacct
Partially
The data is kept, but items such as queue time are misleading, since the denote only the time before being dispatched to MED.
bbot
Yes
Jobs that are pending can be moved using bbot
. Jobs that are running cannot.
bchkpnt
No
bclusters
Yes
LSF desktop support does not affect this command.
bhist
Yes
Only displays some v queue and dispatch data. The desktop client that takes the job is logged in bhist
.
bhosts
Yes
Note that this is only `near' real-time data.
bhpart
Not applicable
bjobs
Yes
The host listed is the MED host. The desktop client that actually takes the job is logged in bjobs using
bpost
.
bkill
Yes
Signals may be sent but they do not make any sense, since signals are not supported.
bmgroup
Not applicable
bmig
Yes
bmig forces LSF desktop support to terminate a job, which is requeued onto another MED host or the same MED host.
bmod
Not applicable
None of the post-dispatch options are supported. You can change any parameter provided the job has not been dispatched to LSF desktop support yet.
bparams
Yes
bpeek
No
You cannot peed at the output of a job running on a desktop client.
bpost
Yes
You can bpost
andbread
to LSF desktop support jobs.
bqueues
Partially
bqueues
displays information about LSF queues, but once a job is dispatched to an MED, it is queued on the Web server. The Web server queue is not displayed in bqueues.
bread
Yes
brequeue
Yes
brequeue
forces LSF to kill and reschedule an LSF desktop support job.
brestart
No
bresume
No
You cannot suspend or resume an LSF desktop support job.
bstatus
Yes
bstatus
is part of thebpost
,bread
command set and is supported for individual LSF desktop support jobs.
bstop
No
You cannot suspend or resume an LSF desktop support job.
bsub
Yes
See bsub options.
bswitch
Yes
bswitch
can change jobs that have not been dispatched to an MED to another queue.bswitch
does not operate on any chunk job that has been dispatched (same as LSF).
btop
Yes
Works on all pending jobs.
bugroup
Yes
busers
Yes
ls*
Partially
The ls
commands work only in LSF, not in LSF desktop support.
x*
Yes
Graphical interfaces work in accordance with their respective command line batch commands listed above.
bsub options
LSF features
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: January 29, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.