[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Platform LSF makes use of SGI cpusets to enforce processor limits for LSF jobs. When a job is submitted, LSF creates a cpuset and attaches it to the job before the job starts running, After the job finishes, LSF deallocates the cpuset. If no host meets the CPU requirements, the job remains pending until processors become available to allocate the cpuset.
- About SGI cpusets
- Configuring LSF with SGI Cpusets
- Using LSF with SGI Cpusets
- Using SGI Comprehensive System Accounting facility (CSA)
- Using SGI User Limits Database (ULDB--IRIX only)
- SGI Job Container and Process Aggregate Support
[ Top ]
About SGI cpusets
An SGI cpuset is a named set of CPUs. The processes attached to a cpuset can only run on the CPUs belonging to that cpuset.
Jobs are attached to a cpuset dynamically created by LSF. The cpuset is deleted when the job finishes or exits. If not specified, the default cpuset type is dynamic.
Jobs are attached to a static cpuset specified by users at job submission. This cpuset is not deleted when the job finishes or exits. Specifying a cpuset name at job submission implies that the cpuset type is static. If the static cpuset does not exist, the job will remain pending until LSF detects a static cpuset with the specified name.
System architecture
![]()
How LSF uses cpusets
On systems running IRIX 6.5.24 and up or SGI Altix or AMD64 (x86-64) ProPack 3.0 and up, cpusets can be created and deallocated dynamically out of available machine resources. Not only does the cpuset provide containment, so that a job requiring a specific number of CPUs will only run on those CPUs, but also reservation, so that the required number of CPUs are guaranteed to be available only for the job they are allocated to.
LSF can be configured to make use of SGI cpusets to enforce processor limits for LSF jobs. When a job is submitted, LSF creates a cpuset and attaches it to the job when the job is scheduled. After the job finishes, LSF deallocates the cpuset. If no host meets the CPU requirements, the job remains pending until processors become available to allocate the cpuset.
Assumptions and limitations
- When LSF selects cpuset jobs to preempt, MINI_JOB and LEAST_RUN_TIME are ignored in the PREEMPT_FOR parameter in
lsb.params
- When using cpusets, LSF schedules jobs based on the number of slots assigned to the hosts instead of the number of CPUs. The
lsb.params
parameter setting PARALLEL_SCHED_BY_SLOTS=N has no effect.- Preemptable queue preference is not supported
- Before upgrading from a previous version, clusters must be drained of all running jobs (especially cpuset hosts)
- The new cpuset integration cannot coexist with the old integration within the same cluster
- Under the MultiCluster lease model, both clusters must use the same version of the cpuset integration
- LSF supports up to ProPack 6.0.
- LSF will not create a cpuset on hosts of different ProPack versions.
Since backfill and slot reservation are based on an entire host, they may not work correctly if your cluster contains hosts that use both static and dynamic cpusets or multiple static cpusets.
Jobs submitted to a chunk job queue are not chunked together, but run as individual LSF jobs inside a dynamic cpuset.
- When LSF selects cpuset jobs to preempt, specialized preemption preferences, such as MINI_JOB and LEAST_RUN_TIME in the PREEMPT_FOR parameter in
lsb.params
, and others are ignored when slot preemption is required.- Preemptable queue preference is not supported.
Job pre-execution programs run within the job cpuset, since they are part of the job. By default, post-execution programs run outside of the job cpuset.
If JOB_INCLUDE_POSTPROC=Y is specified in
lsb.applications
, post- execution processing is not attached to the job cpuset, and Platform LSF does not release the cpuset until post-execution processing has finished.Jobs suspended (for example, with
bstop
) will release their cpusets.
- SGI Altix Linux ProPack versions 4 and lower do not support memory migration; you must define RESUME_OPTION=ORIG_CPUS to force LSF to recreate the original cpuset when LSF resumes a job.
- SGI Altix Linux ProPack 5 supports memory migration and does not require additional configuration to enable this feature. If you submit and then suspend a job using a dynamic cpuset, LSF will create a new dynamic cpuset when the job resumes. The memory pages for the job are migrated to the new cpuset as required.
- SGI Altix Linux ProPack 3 only supports CPUSET_OPTIONS=CPUSET_MEMORY_LOCAL. If the cpuset job runs on an Altix host, other cpuset attributes are ignored.
- SGI Altix Linux ProPack 4 and ProPack 5 do not support CPUSET_OPTIONS=CPUSET_MEMORY_MANDATORY or CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE attributes. If the cpuset job runs on an Altix host, the cpusets created on the Altix system will have their memory usage restricted to the memory nodes containing the CPUs assigned to the cpuset. The CPUSET_MEMORY_MANDATORY and CPUSET_CPU_EXCLUSIVE attributes are ignored.
- SGI Altix Linux ProPack 4 and ProPack 5 static cpuset definitions must include both the cpus and the memory nodes on which the cpus reside. The memory node assignments should be non-exclusive, which allows other cpusets to use the same nodes. With non-exclusive assignment of memory nodes, the allocation of cpus will succeed even if the cpuset definition does not correctly map cpus to memory nodes.
PAM on IRIX cannot launch parallel processes within cpusets.
For PAM jobs on Altix, the SGI Array Services daemon
arrayd
must be running and AUTHENTICATION must be set to NONE in the SGI array services authentication file/usr/lib/array/arrayd.auth
(comment out the AUTHENTICATION NOREMOTE method and uncomment the AUTHENTICATION NONE method).To run a mulithost MPI applications, you must also enable
rsh
without password prompt between hosts:
- The remote host must defined in the
arrayd
configuration.- Configure
.rhosts
so thatrsh
does not require a password.For more information about SGI Array Services, see SGI Job Container and Process Aggregate Support.
For more information about PAM jobs, see SGI Vendor MPI Support.
The administrator must use
brun -c
to force a cpuset job to run. If job is forced to run on non-cpuset hosts, or if any host in the host list specified with-m
is not a cpuset host,-extsched
cpuset options are ignored and the job runs with no cpusets allocated.If the job is forced to run on a cpuset host:
- For dynamic cpusets: LSF allocates a dynamic cpuset without any cpuset options and runs the job inside the dynamic cpuset
- For static cpusets: LSF runs the job in static cpuset. If the specific static cpuset does not exsit, the job is requeued.
Jobs running in a cpuset cannot be resized.
[ Top ]
Configuring LSF with SGI Cpusets
Automatic configuration at installation and upgrade
During installation and upgrade,
lsfinstall
adds theschmod_cpuset
external scheduler plugin module name to the PluginModule section oflsb.modules
:Begin PluginModule SCH_PLUGIN RB_PLUGIN SCH_DISABLE_PHASES schmod_default () () schmod_cpuset () () End PluginModule
The schmod_cpuset plugin name must be configured after the standard LSF plugin names in the PluginModule list.
For upgrade,
lsfinstall
comments out theschmod_topology
external scheduler plugin name in the PluginModule section oflsb.modules
During installation and upgrade,
lsfinstall
sets the following parameters inlsf.conf
:
- LSF_ENABLE_EXTSCHEDULER=Y
LSF uses an external scheduler for cpuset allocation.
- LSB_CPUSET_BESTCPUS=Y
LSF schedules jobs based on the shortest CPU radius in the processor topology using a best-fit algorithm for cpuset allocation.
LSF_IRIX_BESTCPUS is obsolete.
- LSB_SHORT_HOSTLIST=1
Displays an abbreviated list of hosts in
bjobs
andbhist
for a parallel job where multiple processes of a job are running on a host. Multiple processes are displayed in the following format:processes*hostAFor upgrade,
lsfinstall
comments out the following obsolete parameters inlsf.conf
, and sets the corresponding RLA configuration:
- LSF_TOPD_PORT=port_number, replaced by LSB_RLA_PORT=port_number, using the same value as LSF_TOPD_PORT.
Where port_number is the TCP port used for communication between the LSF topology adapter (RLA) and
sbatchd
.The default port number is 6883.
- LSF_TOPD_WORKDIR=directory parameter, replaced by LSB_RLA_WORKDIR=directory parameter, using the same value as LSF_TOPD_WORKDIR
Where directory is the location of the status files for RLA. Allows RLA to recover its original state when it restarts. When RLA first starts, it creates the directory defined by LSB_RLA_WORKDIR if it does not exist, then creates subdirectories for each host.
During installation and upgrade,
lsfinstall
defines the cpuset Boolean resource inlsf.shared
:Begin Resource RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION ... cpuset Boolean () () (cpuset host) ... End Resource
You should add the cpuset resource name under the RESOURCES column of the Host section oflsf.cluster.
cluster_name. Hosts without the cpuset resource specified are not considered for scheduling cpuset jobs.
lsf.cluster.cluster_name
For each cpuset host,
hostsetup
adds thecpuset
Boolean resource to the HOST section oflsf.cluster.
cluster_name.See the Platform LSF Configuration Reference for information about the
lsb.modules
,lsf.conf
,lsf.shared
, andlsf.cluster.
cluster_name files.Optional configuration
- In some pre-defined LSF queues, such as
normal
, the default MEMLIMIT is set to 5000 (5 MB). However, if ULDB is enabled (LSF_ULDB_DOMAIN is defined), the MEMLIMIT should be set greater than 8000.- MANDATORY_EXTSCHED=CPUSET[cpuset_options]
Sets required cpuset properties for the queue. MANDATORY_EXTSCHED options override
-extsched
options used at job submission.- DEFAULT_EXTSCHED=CPUSET[cpuset_options]
Sets default cpuset properties for the queue if the
-extsched
option is not used at job submission.-extsched
options override the options set in DEFAULT_EXTSCHED.See Specifying cpuset properties for jobs for more information about external scheduler options for setting cpuset properties.
- LSB_RLA_UPDATE=seconds
Specifies how often the LSF scheduler refreshes cpuset information from RLA.
The default is 600 seconds.
- LSB_RLA_WORKDIR=directory parameter, where directory is the location of the status files for RLA. Allows RLA to recover its original state when it restarts. When RLA first starts, it creates the directory defined by LSB_RLA_WORKDIR if it does not exist, then creates subdirectories for each host.
You should avoid using
/tmp
or any other directory that is automatically cleaned up by the system. Unless your installation has restrictions on the LSB_SHAREDIR directory, you should use the default:LSB_SHAREDIR/cluster_name/rla_workdir
You should not use a CXFS file system for LSB_RLA_WORKDIR.
- LSF_PIM_SLEEPTIME_UPDATE=Y
On Altix hosts, use this parameter to improve job throughput and reduce a job's start time if there are many jobs running simultaneously on a host. This parameter reduces communication traffic between
sbatchd
and PIM on the same host.When this parameter is defined:
sbatchd
does not query PIM immediately as it needs information--it will only query PIM every LSF_PIM_SLEEPTIME seconds.sbatchd
may be intermittently unable to retrieve process information for jobs whose run time is smaller than LSF_PIM_SLEEPTIME.- It may take longer to view resource usage with
bjobs -l
.By default, Linux sets the maximum file descriptor limit to 1024. This value is too small for jobs using more than 200 processes. To avoid MPI job failure, specify a larger file descriptor limit. For example:
# /etc/init.d/lsf stop # ulimit -n 16384 # /etc/init.d/lsf startAny host with more than 200 CPUs should start the LSF daemons with the larger file descriptor limit. SGI Altix already starts the
arrayd
daemon with the sameulimit
specifier, so that MPI jobs run without LSF can start as well.See the Platform LSF Configuration Reference for information about the
lsb.queues
andlsf.conf
files.Resources for dynamic and static cpusets
If your environment uses both static and dynamic cpusets or you have more than one static cpuset configured, you must configure decreasing numeric resources to represent the cpuset count, and use
-R "rusage"
in job submission. This allows preemption, and also lets you control number of jobs running on static and dynamic cpusets or on each static cpuset.
- Edit
lsf.shared
and configure resources for cpusets and configure resources for static cpusets and non-static cpusets. For example:Begin Resource RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION # Keywords ... dcpus Numeric () N scpus Numeric () N End ResourceWhere:
dcpus
is the number CPUs outside static cpusets (that is the total number of CPUs minus the number of CPUs in static cpusets).scpus
is the number of CPUs in static cpusets. For static cpusets, configure a separate resource for each static cpuset. You should use the cpuset name as the resource name.
The namesdcpus
andscpus
can be any name you like.
- Edit
lsf.cluster.
cluster_name to map the resources to hosts. For example:Begin ResourceMap RESOURCENAME LOCATION dcpus (4@[hosta]) # total cpus - cpus in static cpusets scpus (8@[hostc]) # static cpusets End ResourceMap- Edit
lsb.params
and configure your cpuset resources as preemptable. For example:Begin Parameters ... PREEMPTABLE_RESOURCES = scpus dcpus End Parameters- Edit
lsb.hosts
and set MXJ greater than or equal to the total number of CPUs in static and dynamic cpusets you have configured resources for.Use the following commands to verify your configuration:
bhosts -s
RESOURCE TOTAL RESERVED LOCATION dcpus 4.0 0.0 hosta scpus 8.0 0.0 hostalshosts -s
RESOURCE VALUE LOCATION dcpus 4 hosta scpus 8 hostabhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hosta ok - - 1 1 0 0 0To use preemption on systems running IRIX or TRIX versions earlier than 6.5.24, use
cpusetscript
as the job suspend action inlsb.queues
:Begin Queue ... JOB_CONTROLS = SUSPEND[cpusetscript] ... End QueueTo enable checkpointing before the job is migrated by the
cpusetscript
, specify the CHKPNT=chkpnt_dir parameter in the configuration of the preemptable queue.You must use
-R "rusage"
in job submission. This allows preemption, and also lets you control number of jobs running on static and dynamic cpusets or on each static cpuset.Configuring default and mandatory cpuset options
Use the DEFAULT_EXTSCHED and MANDATORY_EXTSCHED queue paramters in
lsb.queues
to configure default and mandatory cpuset options.
Use keywords SGI_CPUSET[] or CPUSET[] to identify the external scheduler parameters. The keyword SGI_CPUSET[] is deprecated. The keyword CPUSET[] is preferred.
DEFAULT_EXTSCHED=[SGI_]CPUSET[cpuset_options]
Specifies default cpuset external scheduling options for the queue.
-extsched
options on thebsub
command are merged with DEFAULT_EXTSCHED options, and-extsched
options override any conflicting queue-level options set by DEFAULT_EXTSCHED.For example, if the queue specifies:
DEFAULT_EXTSCHED=CPUSET[CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE]and a job is submitted with:
-extsched "CPUSET[CPUSET_TYPE=dynamic;CPU_LIST=1,5,7-12; CPUSET_OPTIONS=CPUSET_MEMORY_LOCAL]"LSF uses the resulting external scheduler options for scheduling:
CPUSET[CPUSET_TYPE=dynamic;CPU_LIST=1, 5, 7-12; CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE CPUSET_MEMORY_LOCAL]DEFAULT_EXTSCHED can be used in combination with MANDATORY_EXTSCHED in the same queue. For example, if the job specifies:
-extsched "CPUSET[CPU_LIST=1,5,7-12;MAX_CPU_PER_NODE=4]"and the queue specifies:
Begin Queue ... DEFAULT_EXTSCHED=CPUSET[CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE] MANDATORY_EXTSCHED=CPUSET[CPUSET_TYPE=dynamic;MAX_CPU_PER_NODE=2] ... End QueueLSF uses the resulting external scheduler options for scheduling:
CPUSET[CPUSET_TYPE=dynamic;MAX_CPU_PER_NODE=2;
CPU_LIST=1, 5, 7-12;CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE]If cpuset options are set in DEFAULT_EXTSCHED, and you do not want to specify values for these options, use the keyword with no value in the
-extsched
option ofbsub
. For example, ifDEFAULT_EXTSCHED=CPUSET[MAX_RADIUS=2]
, and you do not want to specify any radius option at all, use-extsched "CPUSET[MAX_RADIUS=]"
.See Specifying cpuset properties for jobs for more information about external scheduling options.
MANDATORY_EXTSCHED=[SGI_]CPUSET[cpuset_options]
Specifies mandatory cpuset external scheduling options for the queue.
-extsched
options on thebsub
command are merged with MANDATORY_EXTSCHED options, and MANDATORY_EXTSCHED options override any conflicting job-level options set by-extsched
.For example, if the queue specifies:
MANDATORY_EXTSCHED=CPUSET[CPUSET_TYPE=dynamic;MAX_CPU_PER_NODE=2
]and a job is submitted with:
-extsched "CPUSET[MAX_CPU_PER_NODE=4
;CPU_LIST=1,5,7-12;]"LSF uses the resulting external scheduler options for scheduling:
CPUSET[CPUSET_TYPE=dynamic;MAX_CPU_PER_NODE=2;
CPU_LIST=1, 5, 7-12]MANDATORY_EXTSCHED can be used in combination with DEFAULT_EXTSCHED in the same queue. For example, if the job specifies:
-extsched "CPUSET[CPU_LIST=1,5,7-12;MAX_CPU_PER_NODE=4]"and the queue specifies:
Begin Queue ... DEFAULT_EXTSCHED=CPUSET[CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE] MANDATORY_EXTSCHED=CPUSET[CPUSET_TYPE=dynamic;MAX_CPU_PER_NODE=2] ... End QueueLSF uses the resulting external scheduler options for scheduling:
CPUSET[CPUSET_TYPE=dynamic;MAX_CPU_PER_NODE=2;
CPU_LIST=1, 5, 7-12;CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE]If you want to prevent users from setting certain cpuset options in the
-extsched
option ofbsub
, use the keyword with no value. For example, if the job is submitted with-extsched "CPUSET[MAX_RADIUS=2]"
, useMANDATORY_EXTSCHED=CPUSET[MAX_RADIUS=]
to override this setting.See Specifying cpuset properties for jobs for more information about external scheduling options.
The options set by
-extsched
can be combined with the queue-level MANDATORY_EXTSCHED or DEFAULT_EXTSCHED parameters. If-extsched
and MANDATORY_EXTSCHED set the same option, the MANDATORY_EXTSCHED setting is used. If-extsched
and DEFAULT_EXTSCHED set the same options, the-extsched
setting is used.topology scheduling options are applied in the following priority order of level from highest to lowest:
- Queue-level MANDATORY_EXTSCHED options override ...
- Job level
-ext
options, which override ...- Queue-level DEFAULT_EXTSCHED options
For example, if the queue specifies:
DEFAULT_EXTSCHED=CPUSET[MAX_CPU_PER_NODE=2]and the job is submitted with:
bsub -n 4 -ext "CPUSET[MAX_CPU_PER_NODE=1]" myjobThe cpuset option in the job submission overrides the DEFAULT_EXTSCHED, so the job will run in a cpuset allocated with a maximum of 1 CPU per node, honoring the job- level MAX_CPU_PER_NODE option.
If the queue specifies:
MANDATORY_EXTSCHED=CPUSET[MAX_CPU_PER_NODE=2]
and the job is submitted with:
bsub -n 4 -ext "CPUSET[MAX_CPU_PER_NODE=1]" myjobThe job will run in a cpuset allocated with a maximum of 2 CPUs per node, honoring the MAX_CPU_PER_NODE option in the queue.
[ Top ]
Using LSF with SGI Cpusets
Specifying cpuset properties for jobs
To specify cpuset properties for LSF jobs, use:
- The
-extsched
option ofbsub
.- DEFAULT_EXTSCHED or MANDATORY_EXTSCHED, or both, in the queue definition (
lsb.queues
).If a job is submitted with the
-extsched
option, LSF submits jobs with hold, then resumes the job before dispatching it to give time for LSF to attach the-extsched
options. The job starts on the first execution host.For more information about job operations, see Administering Platform LSF.
For more information about
bsub
, see the Platform LSF Command Reference.-ext
[sched
]"
[SGI_
]CPUSET[
cpuset_options]"
Specifies a list of CPUs and cpuset attributes used by LSF to allocate a cpuset for the job.
You can abbreviate the -extsched option to -ext. Use keywords SGI_CPUSET[] or CPUSET[] to identify the external scheduler parameters. The keyword SGI_CPUSET[] is deprecated. The keyword CPUSET[] is preferred.
where cpuset_options are:
CPUSET_TYPE=static |dynamic | none;
Specifies the type of cpuset to be allocated.
If you specify
none
, no cpuset is allocated and you cannot specify any other cpuset options, and the job runs outside of any cpuset.CPUSET_NAME=
name
;
name is the name of a static cpuset. If you specify CPUSET_TYPE=static, you must provide a cpuset name. If you specify a cpuset name, but specify CPUSET_TYPE that is not static, the job is rejected.
MAX_RADIUS=
radius
;
radius is the maximum cpuset radius the job can accept. If the radius requirement cannot be satisfied the job remains pending.
MAX_RADIUS
implies that the job cannot span multiple hosts. LSF puts each cpuset host into its own group to enforce this when MAX_RADIUS is specified.RESUME_OPTION=ORIG_CPUS;
Specifies how LSF should recreate a cpuset when a job is resumed.
By default, LSF tries to create the original cpuset when a job resumes. If this fails, LSF tries to create a new cpuset based on the default memory option.
- ORIG_CPUS specifies that the job must be run on the original cpuset when it resumes. If this fails, the job remains suspended.
Because memory migration is not supported on Altix for ProPack versions 4 or lower, you must define RESUME_OPTION=ORIG_CPUS to force LSF to recreate the original cpuset when LSF resumes a job.
CPU_LIST=
cpu_ID_list
;
cpu_ID_list is a list of CPU IDs separated by commas. The CPU ID is a positive integer or a range of integers. If incorrect CPU IDs are specified, the job remains pending until the specified CPUs are available.
You must specify at least as many CPU IDs as the number of CPUs the job requires (
bsub -n
). If you specify more CPU IDs than the job requests, LSF selects the best CPUs from the list.CPUSET_OPTIONS=
option_list
;
option_list is a list of cpuset attributes joined by a pipe (|). If incorrect cpuset attributes are specified, the job is rejected. See Cpuset attributes for supported cpuset options.
MAX_CPU_PER_NODE=max_num_cpus;
max_num_cpus is the maximum number of CPUs on any one node that will be used by this job. Cannot be used with the NODE_EX option.
MEM_LIST=
mem_node_list
;
(Altix ProPack 4 and ProPack 5) mem_node_list is a list of memory node IDs separated by commas. The memory node ID is a positive integer or a range of integers. For example:
"CPUSET[MEM_LIST=0,1-2]"Incorrect memory node IDs or unavailable memory nodes are ignored when LSF allocates the cpuset.
NODE_EX=Y | N;
Allocates whole nodes for the cpuset job. This option cannot be used with the MAX_CPU_PER_NODE option.
When a job is submitted using
-extsched
, LSF creates a cpuset with the specified CPUs and cpuset attributes and attaches it to the processes of the job. The job is then scheduled and dispatched.Running jobs on specific CPUs
The CPUs available for your jobs may have specific features you need to take advantage of (for example, some CPUs may have more memory, others have a faster processor). You can partition your machines to use specific CPUs for your jobs, but the cpusets for your jobs cannot cross hosts, and you must run multiple operating systems
You can create static cpusets with the particular CPUs your jobs need, but you cannot control the specific CPUs in the cpuset that the job actually uses.
A better solution is to use the CPU_LIST external scheduler option to request specific CPUs for your jobs. LSF can choose the best set of CPUs from the CPU list to create a cpuset for the job. The best cpuset is the one with the smallest CPU radius that meets the CPU requirements of the job. CPU radius is determined by the processor topology of the system and is expressed in terms of the number of router hops between CPUs.
To make job submission easier, you should define queues with the specific CPU_LIST requirements. Set CPU_LIST in MANDATORY_EXTSCHED or DEFAULT_EXTSCHED option in your queue definitions in lsb.queues.
CPU_LIST is interpreted as a list of possible CPU selections, not a strict requirement. For example, if you subit a job with the the
-R "span[ptile]"
option:bsub -R "span[ptile=1]" -ext "CPUSET[CPU_LIST=1,3]" -n2 ...the following combination of CPUs is possible:
CPUs on host 1 CPUs on host 2 1
1
1
3
3
1
3
3
Cpuset attributes
The following cpuset attributes are supported in the list of cpuset options specified by CPUSET_OPTIONS:
- CPUSET_CPU_EXCLUSIVE--defines a restricted cpuset
- CPUSET_MEMORY_LOCAL--threads assigned to the cpuset attempt to assign memory only from nodes within the cpuset. Overrides the MEM_LIST cpuset option.
- CPUSET_MEMORY_EXCLUSIVE--threads not assigned to the cpuset do not use memory from within the cpuset unless no memory outside the cpuset is available
- CPUSET_MEMORY_KERNEL_AVOID--kernel attempts to avoid allocating memory from nodes contained in this cpuset
- CPUSET_MEMORY_MANDATORY--kernel limits all memory allocations to nodes contained in this cpuset
- CPUSET_POLICY_PAGE--Causes the kernel to page user pages to the swap file to free physical memory on the nodes contained in this cpuset. This is the default policy if no other policy is specified. Requires CPUSET_MEMORY_MANDATORY.
- CPUSET_POLICY_KILL--The kernel attempts to free as much space as possible from kernel heaps, but will not page user pages to the swap file. Requires CPUSET_MEMORY_MANDATORY.
See the SGI resource administration documentation and the man pages for the
cpuset
command for information about these cpuset attributes.
- SGI Altix Linux ProPack versions 4 and lower do not support memory migration; you must define RESUME_OPTION=ORIG_CPUS to force LSF to recreate the original cpuset when LSF resumes a job.
- SGI Altix Linux ProPack 5 supports memory migration and does not require additional configuration to enable this feature. If you submit and then suspend a job using a dynamic cpuset, LSF will create a new dynamic cpuset when the job resumes. The memory pages for the job are migrated to the new cpuset as required.
- SGI Altix Linux ProPack 3 only supports CPUSET_OPTIONS=CPUSET_MEMORY_LOCAL. If the cpuset job runs on an Altix host, other cpuset attributes are ignored.
- SGI Altix Linux ProPack 4 and ProPack 5 do not support CPUSET_OPTIONS=CPUSET_MEMORY_MANDATORY or CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE attributes. If the cpuset job runs on an Altix host, the cpusets created on the Altix system will have their memory usage restricted to the memory nodes containing the CPUs assigned to the cpuset. The CPUSET_MEMORY_MANDATORY and CPUSET_CPU_EXCLUSIVE attributes are ignored.
Restrictions on CPUSET_MEMORY_MANDATORY
- CPUSET_OPTIONS=CPUSET_MEMORY_MANDATORY implies node-level allocation
- CPUSET_OPTIONS=CPUSET_MEMORY_MANDATORY cannot be used together with MAX_CPU_PER_NODE=max_num_cpus
Restrictions on CPUSET_CPU_EXCLUSIVE
The scheduler will not use CPU 0 when determining an allocation on IRIX or TRIX. You must not include CPU 0 in the list of CPUs specified by CPU_LIST.
MPI_DSM_MUSTRUN environment variable
You should not use the MPI_DSM_MUSTRUN=ON environment variable. If a job is suspended through preemption, LSF can ensure that cpusets are recreated with the same CPUs, but it cannot ensure that a certain task will run on a specific CPU. Jobs running with MPI_DSM_MUSTRUN cannot migrate to a different part of the machine. MPI_DSM_MUSTRUN also interferes with job checkpointing.
Including memory nodes in the allocation (Altix ProPack4 and Propack 5)
When you specify a list of memory node IDs with the cpuset external scheduler option MEM_LIST, LSF creates a cpuset for the job that includes the memory nodes specified by MEM_LIST in addition to the local memory attached to the CPUs allocated for the cpuset. For example, if
"CPUSET[MEM_LIST=30-40]"
, and a 2-CPU parallel job is scheduled to run on CPU 0-1 (physically located on node 0), the job is able to use memory on node 0 and nodes 30-40.Unavailable memory nodes listed in MEM_LIST are ignored when LSF allocates the cpuset. For example, a 4-CPU job across two hosts (hostA and hostB) that specifies MEM_LIST=1 allocates 2 CPUs on each host. The job is scheduled as follows:
If hostB only has 2 CPUs, only node 0 is available, and the job will only use the memory on node 0.
MEM_LIST is only available for dynamic cpuset jobs at both the queue level and the command level.
CPUSET_MEMORY_LOCAL
When both MEM_LIST and CPUSET_OPTIONS=CPUSET_MEMORY_LOCAL are both specified for the job, the root cpuset nodes are included as the memory nodes for the cpuset. MEM_LIST is ignored, and CPUSET_MEMORY_LOCAL overrides MEM_LIST.
CPU radius and processor topology
If LSB_CPUSET_BESTCPUS is set in
lsf.conf
, LSF can choose the best set of CPUs that can create a cpuset. The best cpuset is the one with the smallest CPU radius that meets the CPU requirements of the job. CPU radius is determined by the processor topology of the system and is expressed in terms of the number of router hops between CPUs.For better performance, CPUs connected by metarouters are given a relatively high weights so that they are the last to be allocated
Best-fit and first-fit CPU list
By default, LSB_CPUSET_BESTCPUS=Y is set in
lsf.conf
. LSF applies a best-fit algorithm to select the best CPUs available for the cpuset.For example, the following command creates an exclusive cpuset with the 8 best CPUs if available:
bsub -n 8 -extsched "CPUSET[CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE]" myjobIf LSB_CPUSET_BESTCPUS is not set in
lsf.conf
, LSF builds a CPU list on a first- fit basis; in this example, the first 8 available CPUs are used.Maximum radius for dynamic cpusets
Use the MAX_RADIUS cpuset external scheduler option to specify the maximum radius for dynamic cpuset allocation. If LSF cannot allocate a cpuset with radius less than or equal to MAX_RADIUS, the job remains pending.
MAX_RADIUS
implies that the job cannot span multiple hosts. LSF puts each cpuset host into its own group to enforce this when MAX_RADIUS is specified.How the best CPUs are selected
Allocating cpusets on multiple hosts (Altix only)
On SGI Altix systems, if a single host cannot satisfy the cpuset requirements for the job, LSF will try to allocate cpusets on multiple hosts, and the parallel job will be launched within the cpuset.
If you define the external scheduler option
CPUSET[CPUSET_TYPE=none]
, no cpusets are allocated and the job is dispatched and run outside of any cpuset.
Spanning multiple hosts is not supported on TRIX. Platform HPC creates cpusets on a single host (or on the first host in the allocation.)
LSB_HOST_CPUSETS environment variable
After dynamic cpusets are allocated and before the job starts running LSF sets the LSB_HOST_CPUSETS environment variable. LSB_HOST_CPUSETS has the following format:
number_hosts host1_name cpuset1_name host2_name cpuset2_name ...For example, if
hostA
andhostB
have 2 CPUs, andhostC
has 4 CPUs, cpuset1-0
is created onhostA
,hostB
andhostC
, and LSB_HOST_CPUSETS set to:3 hostA 1-0 hostB 1-0 hostC 1-0LSB_HOST_CPUSETS is only set for jobs that allocate dynamic cpusets.
LSB_CPUSET_DEDICATED environment variable
When a static or dynamic cpuset is allocated, LSF sets the LSB_CPUSET_DEDICATED environment variable. For
CPUSET_TYPE=none
, LSB_CPUSET_DEDICATED is not set.The LSB_CPUSET_DEDICATED variable is set by LSF as follows:
- For
CPUSET_TYPE=dynamic
cpusets, LSB_CPUSET_DEDICATED=YES.This implies MPI_DISTRIBUTE=ON to get good NUMA placement in MPI jobs. The cpusets assigned to this job are not intended to be shared with other jobs or other users.
- For
CPUSET_TYPE=static
cpusets, LSB_CPUSET_DEDICATED=NO.Static cpusets are typically used to run a number of jobs concurrently. The cpusets assigned to this job are intended to be shared with other jobs, or it is unknown whether the cpusets assigned are intended to be shared.
How cpuset jobs are suspended and resumed
When a cpuset job is suspended (for example, with bstop), job processes are moved out of the cpuset and the job cpuset is destroyed. LSF keeps track of which processes belong to the cpuset, and attempts to recreate a job cpuset when a job is resumed, and binds the job processes to the cpuset.
When a job is resumed, regardless of how it was suspended, the RESUME_OPTION is honored. If RESUME_OPTION=ORIG_CPUS then LSF first tries to get the original CPUs from the same nodes as the original cpuset in order to use the same memory. If this does not get enough CPUs to resume the job, LSF tries to get any CPUs in an effort to get the job resumed.
SGI Altix Linux ProPack 5 supports memory migration and does not require additional configuration to enable this feature. If you submit and then suspend a job using a dynamic cpuset, LSF will create a new dynamic cpuset when the job resumes. The memory pages for the job are migrated to the new cpuset as required.
Assume a host with 2 nodes, 2 CPUs per node (total of 4 CPUs)
When a job running within a cpuset that contains cpu 1 is suspended:
When the job is resumed:
The RESUME_OPTION parameter determines which CPUs are used to recreate the cpuset:
- If RESUME_OPTION=ORIG_CPUS, only CPUs from the same nodes originally used are selected.
- If RESUME_OPTION is not ORIG_CPUS LSF will first attempt to use cpus from the original nodes to minimize memory latency. If this is not possible, any free CPUs from the host will be considered.
If the job originally had a cpuset containing cpu 1, the possibilities when the job is resumed are:
RESUME_OPTION Eligible CPUs ORIG_CPUS
0
1
not ORIG_CPUS
0
1
2
3
Viewing cpuset information for your jobs
The
bacct -
l,bjobs -l
, andbhist -l
commands display the following information for jobs:
- CPUSET_TYPE=static | dynamic | none
- NHOSTS=number
- HOST=host_name
- CPUSET_NAME=cpuset_name
- NCPUS=num_cpus--the number of actual CPUs in the cpuset; can be greater than the number of slots
bjobs -l 221
Job <221>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Com mand <myjob> Thu Dec 15 14:19:54 2009: Submitted from host <hostA>, CWD <$HOME >, 2 Processors Requested; Thu Dec 15 14:19:57 2009: Started on 2 Hosts/Processors <2*hostA> , Execution Home </home/user1>, Execution CWD </home/user1>;Thu Dec 15 14:19:57 2009: CPUSET_TYPE=dynamic;NHOSTS=1;HOST=hostA;CPUSET_NAME=
/reg62@221;NCPUS=2;
Thu Dec 15 14:20:03 2009: Done successfully. The CPU time used is 0.0 seconds. SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - EXTERNAL MESSAGES: MSG_ID FROM POST_TIME MESSAGE ATTACHMENT 0 - - - - 1 - - - - 2 root Dec 15 14:19 JID=0x118f; ASH=0x0 Nbhist -l 221
Job <221>, User <user1>, Project <default>, Command <myjob> Thu Dec 15 14:19:54 2009: Submitted from host <hostA>, to Queue < normal>, CWD <$HOME>, 2 Processors Requested; Thu Dec 15 14:19:57 2009: Dispatched to 2 Hosts/Processors <2*hostA>;Thu Dec 15 14:19:57 2009: CPUSET_TYPE=dynamic;NHOSTS=1;HOST=hostA
;CPUSET_NAME=/reg62@221;NCPUS=2;
Thu Dec 15 14:19:57 2009: Starting (Pid 4495); Thu Dec 15 14:19:57 2009: External Message "JID=0x118f; ASH=0x0" was posted from "ro ot" to message box 2; Thu Dec 15 14:20:01 2009: Running with execution home </home/user1>, Execution CWD </home/user1>, Execution Pid <4495>; Thu Dec 15 14:20:01 2009: Done successfully. The CPU time used is 0.0 seconds; Thu Dec 15 14:20:03 2009: Post job process done successfully; Summary of time in seconds spent in various states by Thu Dec 15 14:20:03 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 3 0 4 0 0 0 7bacct -l 221
Accounting information about jobs that are: - submitted by all users. - accounted on all projects. - completed normally or exited - executed on all hosts. - submitted to all queues. - accounted on all service classes. ------------------------------------------------------------------------------ Job <221>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Com mand <myjob> Thu Dec 15 14:19:54 2009: Submitted from host <hostA>, CWD <$HOME>; Thu Dec 15 14:19:57 2009: Dispatched to 2 Hosts/Processors <2*hostA>;Thu Dec 15 14:19:57 2009: CPUSET_TYPE=dynamic;NHOSTS=1;HOST=hostA;CPUSET_NAME=
/reg62@221;NCPUS=2;
Thu Dec 15 14:20:01 2009: Completed <done>. Accounting information about this job: CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP 0.03 3 7 done 0.0042 0K 0K ------------------------------------------------------------------------------ SUMMARY: ( time unit: second ) Total number of done jobs: 1 Total number of exited jobs: 0 Total CPU time consumed: 0.0 Average CPU time consumed: 0.0 Maximum CPU time of a job: 0.0 Minimum CPU time of a job: 0.0 Total wait time in queues: 3.0 Average wait time in queue: 3.0 Maximum wait time in queue: 3.0 Minimum wait time in queue: 3.0 Average turnaround time: 7 (seconds/job) Maximum turnaround time: 7 Minimum turnaround time: 7 Average hog factor of a job: 0.00 ( cpu time / turnaround time ) Maximum hog factor of a job: 0.00 Minimum hog factor of a job: 0.00
Use brlainfo
to display topology information for a cpuset host. It displays
- Cpuset host name
- Cpuset host type
- Total number of CPUs
- Free CPUs
- Total number of nodes
- Free CPUs per node
- Available CPUs with a given radius
- List of static cpusets
brlainfo
HOSTNAME CPUSET_OS NCPUS NFREECPUS NNODES NCPU/NODE NSTATIC_CPUSETS hostA SGI_TRIX 2 2 1 2 0 hostB PROPACK_4 4 4 2 2 0 hostC PROPACK_4 4 3 2 2 0brlainfo -l
HOST: hostC CPUSET_OS NCPUS NFREECPUS NNODES NCPU/NODE NSTATIC_CPUSETS PROPACK_4 4 3 2 2 0 FREE CPU LIST: 0-2 NFREECPUS ON EACH NODE: 2/0,1/1 STATIC CPUSETS: NO STATIC CPUSETS CPU_RADIUS: 2,3,3,3,3,3,3,3Examples
- Specify a dynamic cpuset:
bsub -n 8 -extsched "CPUSET[CPUSET_TYPE=dynamic;CPU_LIST=1, 5, 7-12;]" myjobIf CPUSET_TYPE is not specified, the default cpuset type is dynamic:
bsub -R "span[hosts=1]" -n 8 -extsched "CPUSET[CPU_LIST=1, 5, 7-12;]" myjobJobs are attached to a cpuset dynamically created by LSF. The cpuset is deleted when the job finishes or exits.
- Specify a list of CPUs for an exclusive cpuset:
bsub -n 8 -extsched "CPUSET[CPU_LIST=1, 5, 7-12; CPUSET_OPTIONS=CPUSET_CPU_EXCLUSIVE|CPUSET_MEMORY_LOCAL]" myjobThe job
myjob
will succeed if CPUs 1, 5, 7, 8, 9, 10, 11, and 12 are available.- Specify a static cpuset:
bsub -n 8 -extsched "CPUSET[CPUSET_TYPE=static; CPUSET_NAME=MYSET]" myjobSpecifying a cpuset name implies that the cpuset type is static:
bsub -n 8 -extsched "CPUSET[CPUSET_NAME=MYSET]" myjobJobs are attached to a static cpuset specified by users at job submission. This cpuset is not deleted when the job finishes or exits.
- Run a job without using any cpuset:
bsub -n 8 -extsched "CPUSET[CPUSET_TYPE=none]" myjobUsing preemption
- Jobs requesting static cpusets:
bsub -n 4 -q low rusage[scpus=4]" -extsched "CPUSET[CPUSET_NAME=MYSET]" sleep 1000 bsub -n 4 -q low rusage[scpus=4]" -extsched "CPUSET[CPUSET_NAME=MYSET]" sleep 1000After these two jobs start running, submit a job to a high priority queue:
bsub -n 4 -q high rusage[scpus=4]" -extsched "CPUSET[CPUSET_NAME=MYSET]" sleep 1000The most recent job running on the low priority queue (job 102) is preempted by the job submitted to the high priority queue (job 103):
bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 103 user1 RUN high hosta 4*hosta *eep 1000 Jan 22 08:24 101 user1 RUN low hosta 4*hosta *eep 1000 Jan 22 08:23 102 user1 SSUSP low hosta 4*hosta *eep 1000 Jan 22 08:23 bhosts -s RESOURCE TOTAL RESERVED LOCATION dcpus 4.0 0.0 hosta scpus 0.0 8.0 hosta- Jobs request dynamic cpusets:
bsub -q high rusage[dcpus=1]" -n 3 -extsched "CPUSET[CPU_LIST=1,2,3]" sleep 1000 bhosts -s RESOURCE TOTAL RESERVED LOCATION dcpus 3.0 1.0 hosta scpus 8.0 0.0 hosta[ Top ]
Using SGI Comprehensive System Accounting facility (CSA)
The SGI Comprehensive System Accounting facility (CSA) provides data for collecting per-process resource usage, monitoring disk usage, and chargeback to specific login accounts. If is enabled on your system, LSF writes records for LSF jobs to CSA.
SGI CSA writes an accounting record for each process in the
pacct
file, which is usually located in the/var/adm/acct/day
directory. SGI system administrators then use thecsabuild
command to organize and present the records on a job by job basis.For each job running on the SGI system, LSF writes an accounting record to CSA when the job starts and when the job finishes. LSF daemon accounting in CSA starts and stops with the LSF daemon.
See the SGI resource administration documentation for information about CSA.
Setting up SGI CSA
- Set the following parameters in
/etc/csa.conf
toon
:- Run the
csaswitch
command to turn on the configuration changes in/etc/csa.conf
.See the SGI resource administration documentation for information about the
csaswitch
command.Information written to the pacct file
LSF writes the following records to the
pacct
file when a job starts and when it exits:
- Job record type (job start or job exit)
- Current system clock time
- Service provider (LSF)
- Submission time of the job (at job start only)
- User ID of the job owner
- Array Session Handle (ASH) of the job (not available on Altix)
- SGI job container ID (PAGG job ID on Altix)
- SGI project ID (not available on Altix)
- LSF job name if it exists
- Submission host name
- LSF queue name
- LSF external job ID
- LSF job array index
- LSF job exit code (at job exit only)
- NCPUS--number of CPUs the LSF job has been using
Viewing LSF job information recorded in CSA
Use the SGI
csaedit
command to see the ASCII content of thepacct
file. For example:# csaedit -P /var/csa/day/pacct -AFor each LSF job, you should see two lines similar to the following:
------------------------------------------------------------------------------- --------- 37 Raw-Workld-Mgmt user1 0x19ac91ee000064f2 0x0000000000000000 0 REQID=1771 ARRAYID=0 PROV=LSF START=Jun 4 15:52:01 ENTER=Jun 4 15:51:49 TYPE=INIT SUBTYPE=START MACH=hostA REQ=myjob QUE=normal ... 39 Raw-Workld-Mgmt user1 0x19ac91ee000064f2 0x0000000000000000 0 REQID=1771 ARRAYID=0 PROV=LSF START=Jun 4 16:09:14 TYPE=TERM SUBTYPE=EXIT MACH=hostA REQ=myjob QUE=normal-- ------------------------------------------------------------------------------- ---------The REQID is the LSF job ID (1771).
See the SGI resource administration documentation for information about the
csaedit
command.[ Top ]
Using SGI User Limits Database (ULDB--IRIX only)
The SGI user limits database (ULDB) allows user-specific limits for jobs. If no ULDB is defined, job limits are the same for all jobs. If you use ULDB, you can configures LSF so that jobs submitted to a host with the SGI job limits package installed are subject to the job limits configured in the ULDB.
Set LSF_ULDB_DOMAIN=domain_name in
lsf.conf
to specify the name of the LSF domain in the ULDB domain directive. A domain definition of name domain_name must be configured in thejlimit.in
input file.The ULDB contains job limit information that system administrators use to control access to a host on a per user basis. The job limits in the ULDB override the system default values for both job limits and process limits. When a ULDB domain is configured, the limits will be enforced as SGI job limits.
If the ULDB domain specified in LSF_ULDB_DOMAIN is not valid or does not exist, LSF uses the limits defined in the domain named
batch
. If thebatch
domain does not exist, then the system default limits are set.When an LSF job is submitted, an SGI job is created, and the job limits in the ULDB are applied.
Next, LSF resource usage limits are enforced for the SGI job under which the LSF job is running. LSF limits override the corresponding SGI job limits. The ULDB limits are used for any LSF limits that are not defined. If the job reaches the SGI job limits, the action defined in the SGI system is used.
SGI job limits in the ULDB apply only to batch jobs.
You can also define resource limits (rlimits) in the ULDB domain. One advantage to defining rlimits in ULDB as opposed to in LSF is that rlimits can be defined per user and per domain in ULDB, whereas in LSF, limits are enforced per queue or per job.
See the SGI resource administration documentation for information about configuring ULDB domains in the
jlimit.in
file.
SGI ULDB is not supported on Altix systems, so no process aggregate (PAGG) job-level resource limits are enforced for jobs running on Altix. Other operating system and LSF resource usage limits are still enforced.
LSF resource usage limits controlled by ULDB job limits
- PROCESSLIMIT--Corresponds to SGI JLIMIT_NUMPROC;
fork
(2) fails, but the existing processes continue to run- MEMLIMIT--Corresponds to JLIMIT_RSS; Resident pages above the limit become prime swap candidates
- DATALIMIT--Corresponds to LIMIT_DATA; malloc(3) calls in the job fail with errno set to ENOMEM
- CPULIMIT--Corresponds to JLIMIT_CPU; a SIGXCPU signal is sent to the job, then after the grace period expires, SIGINT, SIGTERM, and SIGKILL are sent
- FILELIMIT--No corresponding limit; use process limit RLIMIT_FSIZE
- STACKLIMIT--No corresponding limit; use process limit RLIMIT_STACK
- CORELIMIT--No corresponding limit; use process limit RLIMIT_CORE
- SWAPLIMIT--Corresponds to JLIMIT_VMEM; use process limit RLIMIT_VMEM
Increasing the default MEMLIMIT for ULDB
In some pre-defined LSF queues, such as
normal
, the default MEMLIMIT is set to 5000 (5 MB). However, if ULDB is enabled (LSF_ULDB_DOMAIN is defined) the MEMLIMIT should be set greater than 8000 inlsb.queues
.Example ULDB domain configuration
The following steps enable the ULDB domain
LSF
for useruser1
:
You can set the LSF_ULDB_DOMAIN to include more than one domain. For example:
LSF_ULDB_DOMAIN="lsf:batch:system"
- Configure the domain directive
LSF
in thejlimit.in
file:domain <LSF> { # domain for LSF jlimit_numproc_cur = unlimited jlimit_numproc_max = unlimited # JLIMIT_NUMPROC jlimit_nofile_cur = unlimited jlimit_nofile_max = unlimited # JLIMIT_NOFILE jlimit_rss_cur = unlimited jlimit_rss_max = unlimited # JLIMIT_RSS jlimit_vmem_cur = 128M jlimit_vmem_max = 256M # JLIMIT_VMEM jlimit_data_cur = unlimited jlimit_data_max =unlimited # JLIMIT_DATA jlimit_cpu_cur = 80 jlimit_cpu_max = 160 # JLIMIT_CPU }- Configure the user limit directive for
user1
in thejlimit.in
file:user user1 { LSF { jlimit_data_cur = 128M jlimit_data_max = 256M } }- Use the IRIX
genlimits
command to create the user limits database:genlimits -l -v[ Top ]
SGI Job Container and Process Aggregate Support
An SGI job contains all processes created in a login session, including array sessions and session leaders. Job limits set in ULDB are applied to SGI jobs either at creation time or through the lifetime of the job. Job limits can also be reset on a job during its lifetime.
SGI IRIX job containers
If SGI Job Limits is installed, LSF creates a job container when starting a job, uses the job container to signal all processes in the job, and uses the SGI job ID to collect job resource usage for a job.
If LSF_ULDB_DOMAIN is defined in
lsf.conf
, ULDB job limits are applied to the job.The SGI job ID is also used for kernel-level checkpointing.
SGI Altix Process Aggregates (PAGG)
Similar to an SGI job container, a process aggregate (PAGG) is a collection of processes. A child process in a PAGG inherits membership, or attachment, to the same process aggregate containers as the parent process. When a process inherits membership, the process aggregate containers are updates for the new process member. When a process exits, the process leaves the set of process members and the aggregate containers are updated again.
Since SGI ULDB is not supported on Altix systems, no PAGG job-level resource limits are enforced for jobs running on Altix. Other operating system level and LSF resource limits are still enforced.
Viewing SGI job ID and Array Session Handle (ASH)
Use
bjobs
andbhist
to display SGI job ID and Array Session Handle.
On Altix systems, the array session handle is not available. It is displayed as ASH=0x0.
bjobs -l 640
Job <640>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Command <pam -mpi -auto_place myjob> Tue Jan 20 12:37:18 2009: Submitted from host <hostA>, CWD <$HOME>, 2 Processors Re quested; Tue Jan 20 12:37:29 2009: Started on 2 Hosts/Processors <2*hostA>, Execution Home </home/user1>, Execution CWD </home/user1>; Tue Jan 20 12:37:29 2009: CPUSET_TYPE=dynamic;NHOSTS=1;ALLOCINFO=hostA 640-0; Tue Jan 20 12:38:22 2009: Resource usage collected. MEM: 1 Mbytes; SWAP: 5 Mbytes; NTHREAD: 1 PGID: 5020232; PIDs: 5020232 SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - EXTERNAL MESSAGES: MSG_ID FROM POST_TIME MESSAGE ATTACHMENT 0 - - - - 1 - - - - 2 root Jan 20 12:41 JID=0x2bc0000000001f7a; ASH=0x2bc0f Nbhist -l 640
Job <640>, User <user1>, Project <default>, Command <pam -mpi -auto_place myjob> Sat Oct 19 14:52:14 2009: Submitted from host <hostA>, to Queue <normal>, CWD <$HOME>, Requested Resources <unclas>; Sat Oct 19 14:52:22 2009: Dispatched to <hostA>; Sat Oct 19 14:52:22 2009: CPUSET_TYPE=none;NHOSTS=1;ALLOCINFO=hostA; Sat Oct 19 14:52:23 2009: Starting (Pid 5020232); Sat Oct 19 14:52:23 2009: Running with execution home </home/user1>, Execution CWD </home/user1>, Execution Pid <5020232>; Sat Oct 19 14:53:22 2009: External Message "JID=0x2bc0000000001f7a; ASH=0x2bc0f" was posted from "root" to message box 2; Summary of time in seconds spent in various states by Sat Oct 19 14:54:00 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 8 0 98 0 0 0 106[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: January 10, 2011
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2011 Platform Computing Corporation. All rights reserved.