kfU(b>U(b>by nameby nameBUILDd kHA./AA k:&/ mT(b>U(b>U(b>2 `./lpp_namemm4 R S LoadL { LoadL.html 3.1.0.5 01 N U en_US LoadLeveler HTML Pages [ *ifreq LoadL.full (3.1.0.0) 3.1.0.13 *ifreq LoadL.msg.En_US (3.1.0.0) 3.1.0.4 *ifreq LoadL.msg.en_US (3.1.0.0) 3.1.0.4 *ifreq LoadL.so (3.1.0.0) 3.1.0.13 % /usr/lpp/LoadL/html 336 /usr/lpp/SAVESPACE 336 /usr/lib/objrepos 8 INSTWORK 56 24 % % % IY25275 6 Required Fixes for LoadL 3.1.0 IY29622 4 LL does not report free real mem correctly with LG pages used. IY35212 2 Incorrect ship location for IY35212 README IY37022 1 REQ CREATING OF UNPREEMPT FOR API THAT DOES NOT NEED LL_START_ IY38135 1 changes brought in by 90071 broke the build IY38684 1 Design change to not unload LL KE % ] } k\A./usrAA k{A./usr/lppAA kꠦA./usr/lpp/LoadL/LoadL.html/3.1.0.5AA kb;& mU(b>S(b>S(b>2 ]./usr/lpp/LoadL/LoadL.html/3.1.0.5/liblpp.amm 6380 0 68 5202 0 17 188 0 1045800862 30007 1 644 9 productid` LoadL 5765-E6900 268 566 68 1045800862 30007 1 644 20 LoadL.html.copyright` Licensed Materials - Property of IBM 5765E6900 (C) Copyright International Business Machines Corp. 1985, 2001. All rights reserved. US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 3651 4330 188 1046620243 30007 1 644 20 LoadL.html.inventory` /usr/lpp/LoadL/html/am2ugmst119.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 3023 checksum = "41411 3 " /usr/lpp/LoadL/html/am2ugmst232.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 6284 checksum = "42685 7 " /usr/lpp/LoadL/html/am2ugmst269.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 31259 checksum = "04512 31 " /usr/lpp/LoadL/html/am2ugmst30.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 3070 checksum = "34202 3 " /usr/lpp/LoadL/html/am2ugmst302.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 6330 checksum = "27401 7 " /usr/lpp/LoadL/html/am2ugmst321.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 2428 checksum = "42837 3 " /usr/lpp/LoadL/html/am2ugmst329.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 6208 checksum = "53298 7 " /usr/lpp/LoadL/html/am2ugmst33.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 7960 checksum = "15245 8 " /usr/lpp/LoadL/html/am2ugmst34.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 4417 checksum = "42025 5 " /usr/lpp/LoadL/html/am2ugmst36.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 12785 checksum = "14936 13 " /usr/lpp/LoadL/html/am2ugmst368.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 6686 checksum = "43579 7 " /usr/lpp/LoadL/html/am2ugmst388.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 29556 checksum = "34158 29 " /usr/lpp/LoadL/html/am2ugmst45.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 2695 checksum = "14194 3 " /usr/lpp/LoadL/html/am2ugmst84.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 3156 checksum = "51094 4 " /usr/lpp/LoadL/html/addenda.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 15698 checksum = "58284 16 " /usr/lpp/LoadL/html/index.html: owner = bin group = bin mode = 644 type = FILE class = apply,inventory,LoadL.html size = 562 checksum = "37863 1 " 67 4506 566 1046620243 30007 1 644 15 LoadL.html.size` /usr/lpp/LoadL/html 336 /usr/lpp/SAVESPACE 336 /usr/lib/objrepos 8 592 5202 4330 1046620243 30007 1 644 13 LoadL.html.al` ./usr/lpp/LoadL/html/am2ugmst119.html ./usr/lpp/LoadL/html/am2ugmst232.html ./usr/lpp/LoadL/html/am2ugmst269.html ./usr/lpp/LoadL/html/am2ugmst30.html ./usr/lpp/LoadL/html/am2ugmst302.html ./usr/lpp/LoadL/html/am2ugmst321.html ./usr/lpp/LoadL/html/am2ugmst329.html ./usr/lpp/LoadL/html/am2ugmst33.html ./usr/lpp/LoadL/html/am2ugmst34.html ./usr/lpp/LoadL/html/am2ugmst36.html ./usr/lpp/LoadL/html/am2ugmst368.html ./usr/lpp/LoadL/html/am2ugmst388.html ./usr/lpp/LoadL/html/am2ugmst45.html ./usr/lpp/LoadL/html/am2ugmst84.html ./usr/lpp/LoadL/html/addenda.html ./usr/lpp/LoadL/html/index.html 1070 6380 4506 1046620242 30007 1 644 18 LoadL.html.fixdata` fix: name = IY25275 abstract = Required Fixes for LoadL 3.1.0 type = f filesets = "LoadL.full:3.1.0.1\n\ LoadL.html:3.1.0.1\n\ LoadL.msg.En_US:3.1.0.1\n\ LoadL.msg.en_US:3.1.0.1\n\ LoadL.so:3.1.0.1\n\ LoadL.tguides:3.1.0.1\n\ " symptom = "" fix: name = IY29622 abstract = LL does not report free real mem correctly with LG pages used. type = f filesets = "LoadL.full:3.1.0.5\n\ LoadL.msg.En_US:3.1.0.4\n\ LoadL.msg.en_US:3.1.0.4\n\ LoadL.so:3.1.0.5\n\ " symptom = "" fix: name = IY35212 abstract = Incorrect ship location for IY35212 README type = f filesets = "LoadL.full:3.1.0.11\n\ LoadL.html:3.1.0.4\n\ " symptom = "" fix: name = IY37022 abstract = REQ CREATING OF UNPREEMPT FOR API THAT DOES NOT NEED LL_START_ type = f filesets = "LoadL.full:3.1.0.13\n\ " symptom = "" fix: name = IY38135 abstract = changes brought in by 90071 broke the build type = f filesets = "LoadL.full:3.1.0.13\n\ " symptom = "" fix: name = IY38684 abstract = Design change to not unload LL KE type = f filesets = "LoadL.full:3.1.0.13\n\ " symptom = "" 185 0 5202 0 0 0 0 0 ` 6 68 188 566 4330 4506 5202 productidLoadL.html.copyrightLoadL.html.inventoryLoadL.html.sizeLoadL.html.alLoadL.html.fixdata k R(b>Ky<7XY>2 ./usr/lpp/LoadL/html/am2ugmst119.html IBM LoadLeveler for AIX 5L: Using and Administering - Routing jobs to NQS machines IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering


Routing jobs to NQS machines

Users can submit NQS scripts to LoadLeveler and have them routed to a machine outside of the LoadLeveler cluster that runs NQS. LoadLeveler supports COSMIC NQS version 2.0 and other versions of NQS that support the same commands and options and produce similar output for those commands.

The following diagram illustrates a typical environment that allows users to have their jobs routed to machines outside of LoadLeveler for processing:

Figure 18. Environment illustrating jobs being routed to NQS machines.

View figure.

As the diagram illustrates, machines A, B, and C, are members of the LoadLeveler cluster. Machine A has the central manager running on it and machine B has both LoadLeveler and NQS running on it. Machine C is a third member of the cluster. Machine D is outside of the cluster and is running NQS.

When a user submits a job to LoadLeveler, machine A, that runs the central manager, schedules the job to machine B. LoadLeveler running on machine B routes the job to machine D using NQS. Keep this diagram in mind as you continue to read this chapter.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] _ k.R(b>ky<9XY>2./usr/lpp/LoadL/html/am2ugmst232.html IBM LoadLeveler for AIX 5L: Using and Administering - Understanding the LoadLeveler job object model IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

Understanding the LoadLeveler job object model

The ll_get_data subroutine of the data access API allows you to access the LoadLeveler job model. The LoadLeveler job model consists of objects that have attributes and connections to other objects. An attribute is a characteristic of the object and generally has a primitive data type (such as integer, float, or character). The job name, submission time and job priority are examples of attributes.

Objects are connected to one or more other objects via relationships. An object can be connected to other objects through more than one relationship, or through the same relationship. For example, A Job object is connected to a Credential object and to Step objects through two different relationships. A Job object can be connected to more than one Step object through the same relationship of "having a Step." When an object is connected through different relationships, different specifications are used to retrieve the appropriate object.

When an object is connected to more than one object through the same relationship, there are Count, GetFirst and GetNext specifications associated with the relationship. The Count operation returns the number of connections. You must use the GetFirst operation to initialize access to the first such connected object. You must use the GetNext operation to get the remaining objects in succession. You can not use GetNext after the last object has been retrieved.

You can use the ll_get_data subroutine to access both attributes and connected objects. See ll_get_data subroutine for more information.

The root of the job model is the Job object, as shown in Figure 26. The job is queried for information about the number of steps it contains and the time it was submitted. The job is connected to a single Credential object and one or more Step objects. Elements for these objects can be obtained from the job.

You can query the Credential object to obtain the ID and group of the submitter of the job.

The Step object represents one executable unit of the job (all the tasks that are executed together). It contains information about the execution state of the step, messages generated during execution of the step, the number of nodes in the step, the number of unique machines the step is running on, the time the step was dispatched, the execution priority of the step, the unique identifier given to the step by LoadLeveler, the class of the step and the number of processes running for the step (task instances). The Step is connected to one or more Switch Table objects, one or more Machine objects and one or more Node objects. The list of Machines represents all of the hosts where one or more nodes of the step are running. If two or more nodes are running on the same host, the Machine object for the host occurs only once in the step's Machine list. The Step object is connected to one Switch Table object for each of the protocols (MPI and/or LAPI) used by the Step.

Each Node object manages a set of executables that share common requirements and preferences. The Node can be queried for the number of tasks it manages, and is connected to one or more Task objects.

Figure 26. LoadLeveler job object model

View figure.

The Task object represents one or more copies of the same executable. The Task object can be queried for the executable, the executable arguments, and the number of instances of the executable.

Table 15 describes the specifications and elements available when you use the ll_get_data subroutine. Each specification name describes the object you need to specify and the attribute returned. For example, the specification LL_JobGetFirstStep includes the object you need to specify (LL_Job) and the value returned (GetFirstStep).

This table is sorted alphabetically by object; within each object the specifications are also sorted alphabetically.

When using the 2.1 release API of ll_get_data, you must use the new 2.1 release keywords. For instance, you can not use the min_processors and max_processors from the 1.3.0 release with the 2.1 release API ll_get_data. You must use the new keyword, node.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] ecte lVOzR(b>y<:XY>2vT./usr/lpp/LoadL/html/am2ugmst269.html    e"<>LTainorst /=ADEINPScdfhlmu%-.08BFGHORbpy2:CVWYgjwz,Mkvx#()15JU_q!34679&KX[$';Q\]|*xZxfW:%@3$O& hFhKWjaVmފV& Ǩw86W7kĉo!# I1( D$ZLp)IkZd}V'|nM,$`$BGrbP IlάE) pV/"x0`G(8,t x! JI@Z!& ;JH%#]&`W<XnPp 0XB;BMvRp)Ik^~,* xA0q`C[@ ﬚d;7r4gAy1\a݋sqJo0"x\ } ],Pew'k4_chM :e@0 Mr#Am áy;Ö$OBr5k4M4M4M4M4M4M4M4M4M4M4x0-t kba:FJ`t`{rRSUս+ٻ:ȥZY [Wlz3h:[dR,ïFw{7U*[s#imL( flLuۭҺZڼg83:׶Qi~Xez30l=nKx36̼)jW?6lt;l0 E*0l:gEӮ|M-Y9^b{dR,+f֥\Lz3:(,oxNט6qfWl2ڼg8y`+[pء$@ |@ȇ:(HEw :(H #);"|abu$ rD0XA`:0l y&rPy!ȇ:(HEw :(H #);"|:Q3H.(\`o/0l y&rPy!"P|@tQtQ‘>D p0L互 {W"q@,C/Uo3&ΖL$|#DzF(IDzFDSw D5tg%Ԑ\P5ڹ8b/eͽۨ{MD pbu$ rD0X@uͭo珌ΫN{7o ?&W/`͡\%3KVlczڼg>Qm^` -(AH<JGtP'pt ktg%Ԑ\P5ڹ8b29(<tQtQ‘>D pbu$ rD0X@uͭo珌ΫN}~c+me~XR zp"S4fȯ&7Ͷs1mVe bd̑(>D[:(HEw :(HHnH"XA&r]I\8~!F&Թo1DJ2G?Gk;GkE7pO\0XA&r]I\8~!F&Zskik;~f,Sj_UͮW}.ִglLW-L՛"޶6Ϧ9[7~w<:ҡj_UͮTQo1DJ2GH"P-?Gk;GkE7pO\ ,C9.^ȜP?A %lr ܷg%#Ί#5@tN#5"R'ȁ,C9.^ȜP?A x-)ūo[fjT33x_mS9ոm8)ZdW9+fދ -(AH<JGtP'pt ktg%Ԑ\P5ڹ8b3;Ky&rPy>=#]I$=#]")"|:Q3H.(\`"~ʁ+~X޽hŖ0REy16ug/Luzp"S4fȯ&7Ͷs鏛{_7؟>-j㔶1Blbd̑(>D[:(HEw :(HHnH"XA&r]I\8~!S3eFg%#Ί#5@tN#5"R'ȁ,C9.^ȜP?A wVV[ҿ,o^jy0@&7ؙae[g3y=^ߖ)5oNJfy}1ovŭ_yט96l3;Tk@,o[Wl)2Z ,Xqby1LvulOQ~3;|YKc&W)zjiEL$|dA%#DzF(IDzFDSw D5:Q3H.(\`wL-(AH>GtP'pt k =DK kr'q: 6='ݯU,XҪ( \%3KVlczڼg>H\ 1Lzco^2TWk}S-_jLrL}_onH5hƱ ) @IESsT3ΫuXC)Pϖ7-zd|6a[Q3̑y%Ȕ |E:N'EM)@ =DK kr'q:0~L$|#DzF(IDzFDSw D5tg%Ԑ\P5ڹ8bhҼFlLw,O>:q M@lYMW=#]I$=#]")"|:Q3H.(\`C3=f֊cxm2ϵY+}ښR'ƗfxǬ\.3y+Lfͯ]fxu[\%3KVl.0-F q71j+m_pϟfϲ`љgz -E>(;138m^VVgvi؞gJmP1T0U{]斯;737yF0 y&rPy>2Do=#]I$=#]")"|abu$ rD0X@Xg%#Ί#5@tN#5"R'ȁ,C9.^ȜP?A +|fyo jέlLWOn3[{;?-DŽ&pD[ϔ6 n[|meb|\gZ- > uř[֫%x*lnik;5iyx y`πV+]2VJT٢{_Z: vk5Jk-[pءȔ#DzF(IDzFDSt6 [:H>GtP'ptIjmcX+|fyE"Py8lPo1DJ2GH"P-?Gk;GkE7pO\ ,C9.^ȜP?A n6=o1DJ2G?Gk;GkE7pO\0XA&r]I\8~!I+IV,o 1\ue7W-\3Ϋ8mr[4:kt*lOUlsJu8)ZdW%ktVJ/ya:T:[8 :(tya_7j@<ѝ[Q3̑y%Ȕ |E:N'EM)@ =DK kr'q:TKڭmr[Q3̑|E:N'EM)@ !zRAq@/jN(t)zn9[7J4?j Vޓ&1lmfwPU/jMVY斮8u2ƵElͫ\%3KVl-(AH<JGtP'pt ktg%Ԑ\P5ڹ8bY-(AH>GtP'pt k =DK kr'q:c}f/1n7k6rޜ-Y+SUՀ,o=jޕc} )<`:[FwsrV_L|h5`?xN3 <.V[Q3̑y%Ȕ |E:N'EM)@ =DK kr'q:fL$|#DzF(IDzFDSw D5tg%Ԑ\P5ڹ8b 16Vᗳe=fۇ1EXcxmzͪE8)ZdQo1DJ2GH"P-?Gk;GkE7pO\ ,C9.^ȜP?A ))қZ[Q3̑|E:N'EM)@ !zRAq@/jN(t$ͭ_c؟t[Ӂ6EL$|dA%#DzF(IDzFDSw D5:Q3H.(\`V[xio1DJ2G?Gk;GkE7pO\0XA&r]I\8~!!`7o;xՅtS9ո*l|`[Ӄϔ0τ[-Yד߆'}[S,{e DzcolO:S=0^X޶6Ȣޜ|[gV- 1[zmO~aE8D[:(HEw :(HHnH"XA&r]I\8~!?JmVڪTg%#Ί#5@tN#5"R'ȁ,C9.^ȜP?A RRj*3-\1B;v,o 2)\%3KVl-(AH<JGtP'pt ktg%Ԑ\P5ڹ8b48-(AH>GtP'pt k =DK kr'q:iqYZO:Xcxmcq[48k%>lZiP=6^uXzp"S4fȢbd̑(>D[:(HEw :(HHnbg$k‘>D p,XA&r]I\8~!-8lPJ=#]I$=#]")y`$|#DzF(IDzFDSt$鬭'26wɳwٿׂԨgƷFV.| >J`^Lo΂_7צ:z>+յ)jXu2:T:[c:{ 8Mf@`) @Q"UlSz3Cud̑(<6(@g%#$J2G -7{ 8Mf@`) @Q"UlSz3Cu^S-_jm^myVέé>Xi`*k\8D0G5ȞAޜ|Q"`Px)- > h7zMbby9l j^?[TJa Rw,<6a `(p+Ɋc,o[WlئijVd[WuoJdR-ף:((b'[gq3&Ίb9[Ӄϔ6{`0-[| n3;'_:N̛;ezpy/p[| n3;'_7mozgu)}Rᒢ LM-v|6'^ ά(|`[ӂV{?UTRԮ~Xb~yvੳV;v];$]6 k(Le@R"UTRԮ^a_}*KTQo>Px#\Sd6<V5PIKRcxmҹۂ:۝Tɕ֎o;$]6 k(Le@R"UTRԮ^a_}*KTQo0(޶g|6^}V5PIKRѝ63;؟^hzJ֕ ^ෘ6<cxmfLf 걪JZ^ckٿɵLؘ&7Fwmoig߂ɕcxmlSz3֥\}fW߂ɕcxmlSz3~KT΁9Y斨aoN>Px(Po0lyJn8 0Tɳ9lOM);͖u$ޕu6lPzpE1;Y[6)r[}*KWkgY,6sjT3RQV;V[YlU[Qu\鬭FYFfMJ0Tɳzk+cQy=Һyik;rè>x+9^^5پ3$W-I@CI(otQA) I$=#]")"|abu$ rD0XA`:0l y&rPy!ȇ:(H $tQ‘>D p0L互 {W"q@,C8:32lA`L$BtQA) I$=#]")"|,C9.^ȜP?A D p0L互 {W"q@,C/ෘg%#Ί#5@tN#5"R'ȁ!zRAq@/jN(t%<qEKy&rPy>=#]I$=#]")"|,C9.^ȜP?A & MDmǾ^nC>XRw,<6qֹmZUvZH5hƱ E1HH DOt+eE8)ZdW9ڥhmz=[p(g%#$J(Ί#5@tN#5"R'ȁ!zRAq@/jN(t%<qmV[Ky&rPy>=#]I$=#]")"|abu$ rD0XBS'=JoYmdg%#Ί#5@tN#5"R'ȁ:Q3H.(\`V'_x7: >V?6nhZ_}*KTw Iƺm62 ) @@PDdrgN›p2[^azp"S4fȢbd̑(>D[:(HEw :(HHnH"XA&r]I\8~!Е6|N y&rPy>=#]I$=#]")"|abu$ rD0XBS':cu\k6TJpE@u[7ٮ5* C8"k%9(<tQtQ‘>D p!zRAq@/jN(t$3gն8_'yγlO|:uͽۯ|dFfMہ\uڵҫO4@rk3ac @E1HH ͳ6@<à+Dij͞{]?^ 6kfʂiP^Lo[Wlں(g%#$J(Ί#5@tN#5"R'ȁ!zRAq@/jN(teA4ao1DJ2G?Gk;GkE7pO\ ,C9.^ȜP?A 7ٮ5* C8"k%9(<tQtQ‘>D p!zRAq@/jN(t$=3'x+pxٲf[Wy+}l2\g>b` b`kP2:ZUvZw Iƺm62Le@RE!6l& y@g%#$J(Ί#5@tN#5"R'ȁ!zRAq@/jN(t%<x[-(AH>GtP'pt ktg%Ԑ\P5ڹ8b5ܯYmdg%#Ί#5@tN#5"R'ȁ:Q3H.(\`ߙ=5X?,O<Ӯ~m}jeW e:,or?l]ɷڽ{_:PI2q)֥\:Z]*KTw Iƺm62 ) @RPGE 6nnUbd̑(<6(@(&̓9;E!ָ'N̛;Ø6JLrgYoN "H+ɍnCpOZf~e l\4Qlo^9*+ft럛|h_]u .^jy+VU{]斨w@tl1d@02 )AI :܇Ztr4 |?Do(E:N'Ee" pO\ tg%Ԑ\P5ڹ8bk+<-(AH"=#]I$=#\4D k@!zRAq@/jN(t6ԣyy`[Q3̑<JGtP'pt kL互 {W"q@,CM"rhL$|#DzF(IDzFDSw D5 =DK kr'q: ZjX_eXo'`^5Z.W0#qiiC:5y?){fِ?h>6΂qBY斯N j -nelX/l[3?vco^ٲdW-L՛"y&rPy>2Do=#]I$=#]")"|`$C9.^ȜP?A gnb[\[Q3̑|E:N'EM)@$C9.^ȜP?A ))kCpOR׽c{4Erޜ h/uUR͕GtP'pt k@!zRAq@/jN(t)zn>X~X5S;L}X1;~a\%3KVlczڼg>bwՅL$|dA%#DzF(IDzFDSw D5HA&r]I\8~!M7Z+<-9(<tQtQ‘>D pD:Q3H.(\`5Z/Mڲ^cνDXZe3oΫug^OKeE>Rsӽel~`/~I8[Ӂ6Ey1m^mLtS[Q3̑y%Ȕ |E:N'EM)@"u$ rD0X@"4hKWAZo1DJ2G?Gk;GkE7pO\L互 {W"q@,CEj>\͹=e+<65XԪ ܵ M]Yme+<μ8mk `Qo1DJ2GHd [Ӄϔ6 !$[|7_|kteo>XZfbzpyD"Dy`ρ-[qzpID^Lo%kt:ԛ6 (Sg| 8YwczV^QV9[7ç\ফE/msEͶs;o^b޽5پ3$WtZ)<㕳w'VJڻS,{,|GtP'pt k =DK kr'q:+%Jh&7؟,>ھu+6+Mzp"S4fȯ&7Ͷs鏖7Ln+fn䶦-(AH<JGtP'pt ktg%Ԑ\P5ڹ8b%6*+E8w9(<tQtQ‘>D pbu$ rD0X@MJ|u^Lo >FY}ۯ|W,mmT0u\%3KVlczڼg>c}mT33{/ٻ[-+y&rPy>2Do=#]I$=#]")"|abu$ rD0X@`[Q3̑|E:N'EM)@ !zRAq@/jN(tj>OIc~Vj_UͮTWlOQ}R}ښRéD[:(HEw :(HHnH"XA&r]I\8~!MV)]o1DJ2G?Gk;GkE7pO\0XA&r]I\8~!MV)][^Lo >FY%}ۯ~X߂[SUWp+Dij͑E9(<"P|@tQtQ‘>D p0L互 {W"q@,C;g-(AH>GtP'pt k =DK kr'q:yn^Lo >FY%}ۯ{SUWqé> j^koNJf[Q3̑y%Ȕ |E:N'EM)@ =DK kr'q:֎ ▩o1DJ2G?Gk;GkE7pO\0XA&r]I\8~!PEcx+oZkGAZ^[ѩ׵ڞ@cKjjJcoӮ~Xb6n_.mgA (_MVqٻev_>bkJxkGAZ߄KW  /1zp"S4fȢbd̑(>D[:(HEw :(HHnH"XA&r]I\8~!Ndg%#Ί#5@tN#5"R'ȁ,C9.^ȜP?A (SU|OoYZO:XcyKYV?6EsdFfM-fq٦3zWYVZz3e l玫EɍjmcΫ[-yԯ)k2(g%#$J(Ί#5@tN#5"R'ȁ!zRAq@/jN(t h/mr[Q3̑|E:N'EM)@ !zRAq@/jN(t h/ms^Lo >FY%}ۯ,n7@}kkyKkr(g%#$J(Ί#5@tN#5"AgpO\ !zRAq@/jN(t zpx%l zp|Q+osY,O>|S3&΁[7è="2v-&m5goس+cw'_7;{ucK][5N̛:+ɍk0Nv+YC }1յW }Yo{@'0l xIĠbhN%uUϢL|6bkt8v}m܊-Fپm+_;3mT33m5gn\|XZfbzpyD"Dy`ρ-śޜD9!@Wn,u6lP[ӄ:ԛ6 ec6mY~J[W 2lwY5_V|63r[7uw:cν;̡Mڷ͋Kͭǝ \PPH!;4EqֿjUvZH5hƱ aLe@R$ t/wWpޜ|Q"`<ͱ6ܷz[+y*ԛ6 [Ӄϔ6 !$[|ګ8$"rC &7}_oWqٰ֤g'`_B;W-LSjMrvcQ,o%kt:ԛ6 e43KW7gobW@9*Ǧ=my̴f3ߖ7Melu-'y畳|:u͋MVyImnv<**+Wګ/,O<)rMh+Y:LzUo>bߖ'7YK}6mPLٻ/ms]3՝klUXښR֎U[7K=+,O=V7,iw5_ZzpyD"Dy`πC6ro7nRl3,QoN>Px(Po0lyqzpID^Loqٰ֤g'`_B;QoNh;czV^bqپ:ލ7R, 1][7TlPF^5ҙ-`6Yfʂ@h2`b`bT?٣T=+u\j jMV)] پu+צ& Ͷs٣T=+u\~V߷ښRq5a\uڵҫO4@rk3ac $Zd y(TѪgtua[ӁoJG "H @Ȕ |stP'pp2CR'ȁ!zRAq@/jN(t6g%"|stP'pp2CR'ȁ,C9.^ȜP?A GtP'pt k =DK kr'q:myfeB`_ߖ'7mS9ոmg^4W@R7ͫ7\!b*w|j0Kj斨yB8Ft`p$@7b~~*w:ګ`W-L՛%9(<"P|@tQtQ‘>D p0L互 {W"q@,C|?5LL$|#DzF(IDzFDSw D5tg%Ԑ\P5ڹ8bDtiڳuϲ,oZyƆ566o(e/՛|[$)\f/yZvLż!ޜ-Y[Q3̑y%Ȕ |E:N'EM)@ =DK kr'q:Ѫdg%#Ί#5@tN#5"R'ȁ,C9.^ȜP?A & M@_ϫk^[Ӂ6Ky&rPy>2Do=#]I$=#]")"|abu$ rD0X@xѪdg%#Ί#5@tN#5"R'ȁ,C9.^ȜP?A & M@_Qm~7?]cᛌfe(\QY֭r}mRWq6Vy\%3KVlL$|dA%#DzF(IDzFDSw D5:Q3H.(\`34j-(AH>GtP'pt k =DK kr'q:m|P?}ѝbyV<¹oNJf-(AH<JGtP'pt ktg%Ԑ\P5ڹ8b35A>4j-(AH>GtP'pt k =DK kr'q:m|P?}c~35A"oNJf-(AH<JGtP'pt ktg%Ԑ\P5ڹ8b*B`_Ky&rPy>=#]I$=#]")"|:Q3H.(\`oNm|3g&ѭ'jܙ_6Y/ew{/&^3=迟);͖ub`6,Sc~mhUi3/ewޙ[pi8f63='yβXm-L՛%i(<9(<"P|@tQtQ‘>D p0L互 {W"q@,C6Y POKy&rPy>=#]I$=#]")"|:Q3H.(\`"`4^mY̦5o5{_1EX'՝}ۨ[Ӂ6Ky&rPy>2D$pءޜ|Q"`<ͱ6ܷz[+y*ԛ6 [Ӄϔ6 !$[| |Ys pzpID^LoL$ٰ֤g'`_B;W-"gs,[צ:bپ:5Z.W@}kk ,^my{xպ鬭'Vv`ٝA"oJG "H @Ȕ |stP'pp2CR'ȁ!zRAq@/jN(t6g%"|stP'pp2CR'ȁ,C9.^ȜP?A XޘH#ؘ[Ӂ6Ey1m^mL{WEL$|dA%#DzF(IDzFDSw D5:Q3H.(\`"`/ӽbo1DJ2G?Gk;GkE7pO\0XA&r]I\8~!LFkն/ewkt׵ۭiPϖ7&+Dij͑^Lo[WlE9(<"P|@tQtQ‘>D p0L互 {W"q@,C6Yzs9(<tQtQ‘>D pbu$ rD0XBLoZskbyj1ߖ7Azy6ww͕uaE9(<"P|@tQtQ‘>D p0L互 {W"q@,CH g%#Ί#5@tN#5"R'ȁ,C9.^ȜP?A (SU|GtP'pt k =DK kr'q: k:|go6~\TfX1\ZK‹y&rPy>2Do=#]I$=#]")"|abu$ rD0X@S;~a[Q3̑|E:N'EM)@ !zRAq@/jN(t)zn>X~X5S;L}X1;~a\%3KVlczڼg>bwՅL$|dAHB8D p0u5`$BtQtQFR"w D5t6ԣyy`[̑<JGtP'pt kt%3Q-H>GtP'pt k (SUyXA%-JFtWjmc6ͭb( 2jt( 0m`P@lgJQ`R-H<JGtP'pt kt%3ϥ\Ky>=#]I$=#]")"|: h7}V5PIKRRmrkE$|dA%#DzF(IDzFDSw D5:[F$|#DzF(IDzFDSw D5tѪg{_cV[lO:V5PIKRC$|dAHB8y<;XY>2Z ./usr/lpp/LoadL/html/am2ugmst30.html IBM LoadLeveler for AIX 5L: Using and Administering - What is LoadLeveler? IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering


What is LoadLeveler?

LoadLeveler is a job management system that allows users to run more jobs in less time by matching the jobs' processing needs with the available resources. LoadLeveler schedules jobs, and provides functions for building, submitting, and processing jobs quickly and efficiently in a dynamic environment.

Figure 1 shows the different environments to which LoadLeveler can schedule jobs. Together, these environments comprise the LoadLeveler cluster. An environment can include heterogeneous clusters, dedicated nodes, and the RISC System/6000(R) Scalable POWERparallel(R) System (SP).

Figure 1. Example of a LoadLeveler configuration

View figure.

In addition, LoadLeveler can schedule jobs written for NQS to run on machines outside of the LoadLeveler cluster. As Figure 1 also illustrates, a LoadLeveler cluster can include submit-only machines, which allow users to have access to a limited number of LoadLeveler features. This type of machine is further discussed in Roles of machines.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] bi kR(b>y<;XY>2`./usr/lpp/LoadL/html/am2ugmst302.html IBM LoadLeveler for AIX 5L: Using and Administering - Step 8: Manage a job's status using control expressions IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

Step 8: Manage a job's status using control expressions

You can control running jobs by using five control functions as Boolean expressions in the configuration file. These functions are useful primarily for serial jobs. You define the expressions, using normal C conventions, with the following functions:

START
SUSPEND
CONTINUE
VACATE
KILL

The expressions are evaluated for each job running on a machine using both the job and machine attributes. Some jobs running on a machine may be suspended while others are allowed to continue.

The START expression is evaluated twice; once to see if the machine can accept jobs to run and second to see if the specific job can be run on the machine. The other expressions are evaluated after the jobs have been dispatched and in some cases, already running.

When evaluating the START expression to determine if the machine can accept jobs, Class != { "Z" } evaluates to true only if Z is not in the class definition. This means that if two different classes are defined on a machine, Class != { "Z" } (where Z is one of the defined classes) always evaluates to false when specified in the START expression and, therefore, the machine will not be considered to start jobs.

START: expression that evaluates to T or F (true or false)
Determines whether a machine can run a LoadLeveler job. When the expression evaluates to T, LoadLeveler considers dispatching a job to the machine.

When you use a START expression that is based on the CPU load average, the negotiator may evaluate the expression as F even though the load average indicates the machine is Idle. This is because the negotiator adds a compensating factor to the startd machine's load average every time the negotiator assigns a job. For more information, see the NEGOTIATOR_INTERVAL keyword.

SUSPEND: expression that evaluates to T or F (true or false)
Determines whether running jobs should be suspended. When T, LoadLeveler temporarily suspends jobs currently running on the machine. Suspended LoadLeveler jobs will either be continued or vacated. This keyword is not supported for parallel jobs.

CONTINUE: expression that evaluates to T or F (true or false)
Determines whether suspended jobs should continue execution. When T, suspended LoadLeveler jobs resume execution on the machine.

VACATE: expression that evaluates to T or F (true or false)
Determines whether suspended jobs should be vacated. When T, suspended LoadLeveler jobs are removed from the machine and placed back into the queue (provided you specify restart=yes in the job command file). If a checkpoint was taken, the job restarts from the checkpoint. Otherwise, the job restarts from the beginning.

KILL: expression that evaluates to T or F (true or false)
Determines whether or not vacated jobs should be sent the SIGKILL signal and replaced in the queue. It is used to remove a job that is taking too long to vacate. When T, vacated LoadLeveler jobs are removed from the machine with no attempt to take checkpoints.

Typically, machine load average, keyboard activity, time intervals, and job class are used within these various expressions to dynamically control job execution.

How control expressions affect jobs

After LoadLeveler selects a job for execution, the job can be in any of several states. Figure 35 shows how the control expressions can affect the state a job is in. The rectangles represent job or daemon states, and the diamonds represent the control expressions.

Figure 35. How control expressions affect jobs

View figure.

Criteria used to determine when a LoadLeveler job will enter Start, Suspend, Continue, Vacate, and Kill states are defined in the LoadLeveler configuration files and may be different for each machine in the cluster. They may be modified to meet local requirements.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] ion is kB?| R(b>y<;XY>2| ./usr/lpp/LoadL/html/am2ugmst321.html IBM LoadLeveler for AIX 5L: Using and Administering - Routing jobs to NQS machines IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering


Routing jobs to NQS machines

The following procedure details how to set up your system for routing jobs to machines running NQS.

Assume Figure 36 depicts your environment. You have three machines in the cluster named A, B, and C. Outside of the cluster, you have machine D running NQS.

Figure 36. Environment illustrating jobs being routed to NQS machines.


View figure.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]  k @R(b>y<;XY>2@./usr/lpp/LoadL/html/am2ugmst329.html IBM LoadLeveler for AIX 5L: Using and Administering - Gang scheduling concepts IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

Gang scheduling concepts

The mechanism that the Gang Scheduler uses to perform both the coordination and the synchronization is the gang scheduling matrix where columns represent the scheduling slots that run jobs and the rows represent the blocks of time available to run jobs, called time-slices. The intersection of a row and a column specifies the single job step that is to be running on that node's processor at a given time.
Note:Even though the Gang Scheduler can schedule to the individual processors on a node, you cannot bind a job process to a specific CPU.

Also present in the matrix is synchronization information and the size of the time-slice, set by the GANG_MATRIX_TIME_SLICE keyword in the configuration file. As it considers nodes to run a job, the Negotiator requires that each node can run the job step at the same time. That is, all scheduling slots that the Negotiator assigns to the job step will have nothing running in the same row in the Gang scheduling matrix.

As the number of time-slices for a node under Gang Scheduling increases, the percentage of total time that any one job receives decreases, so it is desirable to prevent the number of rows in the gang scheduling matrix from increasing indefinitely. The total number of jobs that can be concurrently dispatched to a node is still constrained by the MAX_STARTERS keyword which, along with the number of jobs to be allowed to run in the same time-slice (presumably on separate CPUs on the node), effectively limits the number of rows in the gang scheduling matrix. There is an internal LoadLeveler limit of eight unique rows in the Gang Scheduling Matrix. The number of jobs that can run on a node (and the number of rows in the matrix) is also limited by several other factors:

  • Job requirements
  • Real memory available
  • Memory needed by applications
  • Limits set by system administration
  • Attributes set by system administration

For an illustration of how keywords can affect the Gang matrix, see Figure 37.

As the scheduler builds the matrix, each matrix element is considered a scheduling slot occupied by a job task . The entire job occupies some number of scheduling slots across a time-slice. This allows parallel tasks to context switch in and out of a set of processors simultaneously. The number of scheduling slots occupied by a job depends on two factors:

  • Number of parallel task required
  • Proportion of execution time relative to other jobs

Figure 37. Effect of keywords on a Gang matrix subset.

  1. This illustration represents one Gang matrix node subset
  2. GANG_MATRIX_NODE_SUBSET_SIZE: Sets the minimum number of nodes in a Gang matrix subset
  3. Px: Represents an individual node processor
  4. tx: Represents an individual time-slice
  5. execution_factor: Defines the number of time-slices a job step receives
  6. GANG_MATRIX_TIME_SLICE: Defines the time-slice duration (all time-slices must be the same duration)
  7. max_smp_tasks: Defines the maximum number of simultaneous tasks possible in a time-slice for a single node.
    • The default value for this keyword is the number of CPUs and the maximum value is 128
    • For nodes 1 and 3, max_smp_tasks = 4
    • For nodes 2 and 4, max_smp_tasks = 2
    • The number of tasks possible may differ from the number of tasks actually running
  8. The number of time-slices (rows) a node can support is limited to the value of MAX_STARTERS divided by the max_smp_tasks value for that node
    • A Gang matrix supports the largest number of time-slices required up to a maximum of eight unique time-slices
    • MAX_STARTERS defines the maximum number of tasks that can run on a node
      • For nodes 1 and 3, MAX_STARTERS = 20
      • For nodes 2 and 4, MAX_STARTERS = 10



View figure.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] k!R(b>y<;XY>2./usr/lpp/LoadL/html/am2ugmst33.html IBM LoadLeveler for AIX 5L: Using and Administering - Network job management and job scheduling systems IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

Network job management and job scheduling systems

A network job management and job scheduling system, such as LoadLeveler, is a software program that schedules and manages jobs that you submit to one or more machines under its control. LoadLeveler accepts jobs that users submit and reviews the job requirements. LoadLeveler then examines the machines under its control to determine which machines are best suited to run each job.

Job definition

LoadLeveler schedules your jobs on one or more machines for processing. The definition of a job, in this context, is a set of job steps. For each job step, you can specify a different executable (the executable is the part of the job that gets processed). You can use LoadLeveler to submit jobs which are made up of one or more job steps, where each job step depends upon the completion status of a previous job step. For example, Figure 2 illustrates a stream of job steps:

Figure 2. LoadLeveler job steps

View figure.

Each of these job steps is defined in a single job command file. A job command file specifies the name of the job, as well as the job steps that you want to submit, and can contain other LoadLeveler statements.

LoadLeveler tries to execute each of your job steps on a machine that has enough resources to support executing and checkpointing each step. If your job command file has multiple job steps, the job steps will not necessarily run on the same machine, unless you explicitly request that they do.

You can submit batch jobs to LoadLeveler for scheduling. Batch jobs run in the background and generally do not require any input from the user. Batch jobs can either be serial or parallel. A serial job runs on a single machine. A parallel job is a program designed to execute as a number of individual, but related, processes on one or more of your system's nodes. When executed, these related processes can communicate with each other (through message passing or shared memory) to exchange data or synchronize their execution.

LoadLeveler will execute two different types of parallel jobs:

job_type = PVM
job_type = parallel
 

With a job_type of PVM, LoadLeveler supports a PVM API to allocate nodes and launch tasks. With a job_type of parallel, LoadLeveler interacts with Parallel Operating Environment (POE) to allocate nodes, assign tasks to nodes, and launch tasks.

Machine definition

In order for LoadLeveler to schedule a job on a machine, the machine must be a valid member of the LoadLeveler cluster. A cluster is the combination of all of the different types of machines that use LoadLeveler. The following types of machines can comprise a LoadLeveler cluster:

  • RISC System/6000 (and compatible hardware running AIX)
  • SP System

To make a machine a member of the LoadLeveler cluster, the administrator has to install the LoadLeveler software onto the machine and identify the central manager (described in Roles of machines). Once a machine becomes a valid member of the cluster, LoadLeveler can schedule jobs to it.

Roles of machines

Each machine in the LoadLeveler cluster performs one or more roles in scheduling jobs. These roles are described below:

  • Scheduling Machine: When a job is submitted, it gets placed in a queue managed by a scheduling machine. This machine contacts another machine that serves as the central manager for the entire LoadLeveler cluster. (This role is described below). This scheduling machine asks the central manager to find a machine that can run the job, and also keeps persistent information about the job. Some scheduling machines are known as public scheduling machines, meaning that any LoadLeveler user can access them. These machines schedule jobs submitted from submit-only machines, which are described below.
  • Central Manager Machine: The role of the Central Manager is to examine the job's requirements and find one or more machines in the LoadLeveler cluster that will run the job. Once it finds the machine(s), it notifies the scheduling machine.
  • Executing Machine: The machine that runs the job is known as the executing machine.
  • Submitting Machine: This type of machine is known as a submit-only machine. It participates in the LoadLeveler cluster on a limited basis. Although the name implies that users of these machines can only submit jobs, they can also query and cancel jobs. Users of these machines also have their own Graphical User Interface (GUI) that provides them with the submit-only subset of functions. The submit-only machine feature allows workstations that are not part of the LoadLeveler cluster to submit jobs to the cluster.

Keep in mind that one machine can assume multiple roles.

Machine availability

There may be times when some of the machines in the LoadLeveler cluster are not available to process jobs; for instance, when the owners of the machines have decided to make them unavailable. This ability of LoadLeveler to allow users to restrict the use of their machines provides flexibility and control over the resources.

Machine owners can make their personal workstations available to other LoadLeveler users in several ways. For example, you can specify that:

  • The machine will always be available
  • The machine will be available only between certain hours
  • The machine will be available when the keyboard and mouse are not being used interactively.

Owners can also specify that their personal workstations never be made available to other LoadLeveler users.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] kAR(b>z<;XY>2A./usr/lpp/LoadL/html/am2ugmst34.html IBM LoadLeveler for AIX 5L: Using and Administering - How LoadLeveler schedules jobs IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

How LoadLeveler schedules jobs

When a user submits a job, LoadLeveler examines the job command file to determine what resources the job will need. LoadLeveler determines which machine, or group of machines, is best suited to provide these resources, then LoadLeveler dispatches the job to the appropriate machine(s). To aid this process, LoadLeveler uses queues. A job queue is a list of jobs that are waiting to be processed. When a user submits a job to LoadLeveler, the job is entered into an internal database-which resides on one of the machines in the LoadLeveler cluster-until it is ready to be dispatched to run on another machine, as shown in Figure 3.

Figure 3. Job queues

View figure.

Once LoadLeveler examines a job to determine its required resources, the job is dispatched to a machine to be processed. Arrows 2 and 3 indicate that the job can be dispatched to either one machine, or-in the case of parallel jobs-to multiple machines. Once the job reaches the executing machine, the job runs.

Jobs do not necessarily get dispatched to machines in the cluster on a first-come, first-serve basis. Instead, LoadLeveler examines the requirements and characteristics of the job and the availability of machines, and then determines the best time for the job to be dispatched.

LoadLeveler also uses job classes to schedule jobs to run on machines. A job class is a classification to which a job can belong. For example, short running jobs may belong to a job class called short_jobs. Similarly, jobs that are only allowed to run on the weekends may belong to a class called weekend. The system administrator can define these job classes and select the users that are authorized to submit jobs of these classes. For more information on job classes, see Step 3: Specify class stanzas.

You can specify which types of jobs will run on a machine by specifying the type(s) of job classes the machine will support. For more information, see Step 1: Specify machine stanzas.

LoadLeveler also examines a job's priority in order to determine when to schedule the job on a machine. A priority of a job is used to determine its position among a list of all jobs waiting to be dispatched. For more information on job priority, see Setting and changing the priority of a job.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] ID Wor k(1R(b>5z<21./usr/lpp/LoadL/html/am2ugmst36.html IBM LoadLeveler for AIX 5L: Using and Administering - The LoadLeveler job cycle IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

The LoadLeveler job cycle

Figure 4 illustrates the information flow through the LoadLeveler cluster:

Figure 4. High-level job flow

View figure.

The managing machine in a LoadLeveler cluster is known as the central manager. There are also machines that act as schedulers, and machines that serve as the executing machines. The arrows in Figure 4 illustrate the following:

  • Arrow 1 indicates that a job has been submitted to LoadLeveler.
  • Arrow 2 indicates that the scheduling machine contacts the central manager to inform it that a job has been submitted, and to find out if a machine exists that matches the job requirements.
  • Arrow 3 indicates that the central manager checks to determine if a machine exists that is capable of running the job. Once a machine is found, the central manager informs the scheduling machine which machine is available.
  • Arrow 4 indicates that the scheduling machine contacts the executing machine and provides it with information regarding the job.

Figure 4 is broken down into the following more detailed diagrams illustrating how LoadLeveler processes a job.

  1. Submit a LoadLeveler job:

    Figure 5. Job is submitted to LoadLeveler

    View figure.

    Figure 5 illustrates that the schedd daemon runs on the scheduling machine. This machine can also have the startd daemon running on it. The negotiator daemon resides on the central manager machine. The arrows in Figure 5 illustrate the following:

    • Arrow 1 indicates that a job has been submitted to the scheduling machine.
    • Arrow 2 indicates that the schedd daemon, on the scheduling machine, stores all of the relevant job information on local disk.
    • Arrow 3 indicates that the schedd daemon sends job description information to the negotiator daemon.
  2. Permit to run:

    Figure 6. LoadLeveler authorizes the job

    View figure.

    In Figure 6, arrow 4 indicates that the negotiator daemon authorizes the schedd daemon to begin taking steps to run the job. This authorization is called a permit to run. Once this is done, the job is considered Pending or Starting. (See LoadLeveler job states for more information.)

  3. Prepare to run:

    Figure 7. LoadLeveler prepares to run the job


    View figure.

    In Figure 7, arrow 5 illustrates that the schedd daemon contacts the startd daemon on the executing machine and requests that it start the job. The executing machine can either be a local machine (the machine from which the job was submitted) or a remote machine (another machine in the cluster).

  4. Initiate job:

    Figure 8. LoadLeveler starts the job

    View figure.

    The arrows in Figure 8 illustrate the following:

    • The two arrows numbered 6 indicate that the startd daemon on the executing machine, spawns a starter process and awaits more work.
    • The two arrows numbered 7 indicate that the schedd daemon sends the starter process the job information and the executable.
    • Arrow 8 indicates that the schedd daemon notifies the negotiator daemon that the job has been started and the negotiator daemon marks the job as Running. (See LoadLeveler job states for more information.)

    The starter forks and executes the user's job, and the starter parent waits for the child to complete.

  5. Complete job:

    Figure 9. LoadLeveler completes the job

    View figure.

    The arrows in Figure 9 illustrate the following:

    • The arrows numbered 9 indicate that when the job completes, the starter process notifies the startd daemon, and the startd daemon notifies the schedd daemon.
    • Arrow 10 indicates that the schedd daemon examines the information it has received and forwards it to the negotiator daemon.

LoadLeveler job states

As LoadLeveler processes a job, the job moves through various states. Possible job states are listed in Table 2 and detailed in the appendix under Job states. For more information about the daemons controlling these job states see Daemons.


Table 2. Job states
Job state Abbreviation Details on page:
Canceled CA ***
Checkpointing CK ***
Completed C ***
Complete Pending CP ***
Deferred D ***
Idle I ***
Not Queued NQ ***
Not Run NR ***
Pending P ***
Preempted E ***
Preempt Pending EP ***
Rejected X ***
Reject Pending XP ***
Removed RM ***
Remove Pending RP ***
Resume Pending MP ***
Running R ***
Starting ST ***
System Hold S ***
User & System Hold HS ***
Terminated TX ***
User Hold H ***
Vacated V ***
Vacate Pending VP ***
Note:Job states that include "Pending," such as Complete Pending and Vacate Pending are intermediate, temporary states.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] R k&1R(b>?z<2./usr/lpp/LoadL/html/am2ugmst368.html IBM LoadLeveler for AIX 5L: Using and Administering - PVM 3.3.11+ (SP2MPI architecture) IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

PVM 3.3.11+ (SP2MPI architecture)

Figure 48 shows a sample job command file for PVM 3.3.11+ (SP2MPI architecture). Before using PVM, users should contact their administrator to determine which PVM architecture has been installed. The SP2MPI architecture version should be used when users require that their jobs run in user space.

Figure 48. Sample PVM 3.3.11+ (SP2MPI Architecture) job command file

#!/bin/ksh
# @ job_type      = parallel
# @ class         = PVM3
# @ requirements  = (Adapter == "hps_user")
# @ output = my_PVM_program.$(cluster).$(process).out
# @ error  = my_PVM_program.$(cluster).$(process).err
# @ node = 3,3
# @ queue
 
# Set PVM daemon and starter path dictated by LoadLeveler administrator
starter_path=/home/userid/loadl/pvm3/bin/SP2MPI
daemon_path=/home/userid/loadl/pvm3/lib/SP2MPI
 
# Export "MP_EUILIB" before starting PVM3 (default is "ip")
export MP_EUILIB=us
echo MP_EUILIB=$MP_EUILIB
 
# Clean up old PVM log and daemon files belonging to user
filelog=/tmp/pvml.id | awk -F'=' '{print $2}' | awk -F'(' '{print $1}'
filedaemon=/tmp/pvmd.id | awk -F'=' '{print $2}' | awk -F'(' '{print $1}'
rm -f $filelog > /dev/null
rm -f $filedaemon > /dev/null
 
# Start PVM daemon in background
$daemon_path/pvmd3 &
echo "pvm background pid=$!"
echo "Sleep 2 seconds"
sleep 2
echo "PVM daemon started"
 
# Start parallel executable
llnode_cnt=`echo "$LOADL_PROCESSOR_LIST" | awk '{print NF}'`
actual_cnt=expr "$llnode_cnt" - 1
$starter_path/starter -n $actual_cnt /home/userid/my_PVM_program
echo "Parallel executable starting"
 
# Check processes running and halt PVM daemon
echo "ps -a" | /home/userid/loadl/pvm3/lib/SP2MPI/pvm
echo "Halt PVM daemon"
echo "halt" | /home/userid/loadl/pvm3/lib/SP2MPI/pvm
wait
echo "PVM daemon completed"

Note the following requirements for PVM 3.3.11+ (SP2MPI architecture) jobs:

  • The job must have job_type = parallel.
  • You must specify one more processor then you actually need to run the parallel job. PVM spawns an additional task to relay messages to and from the PVM daemon. Parallel tasks cannot communicate with PVM daemon directly. The additional task will be spawned on the last processor in the LOADL_PROCESSOR_LIST. For more information on this environment variable set by LoadLeveler see Obtaining allocated host names.
  • You must use the PVM daemon and starter path dictated by the LoadLeveler administrator. The parallel_path keyword is ignored.
  • You must export MP_EUILIB as us when running in user space over the switch. MP_PROCS, MP_RMPOOL and MP_HOSTFILE are ignored when running under LoadLeveler.
  • You should clean up any temporary PVM log or daemon files before starting the PVM daemon.
  • You must start the PVM daemon in the job script, and you must start it in the background ($daemon_path/pvmd3 &).
  • You must compile your parallel program following the PVM guidelines for PVM 3.3.11+ (SP2MPI architecture).
  • You must start the parallel executable through the PVM starter program. The PVM starter program has no relationship to the LoadLeveler starter daemon.
  • You must specify the parallel executable as an argument to the PVM starter program.
  • You must specify the actual number of parallel tasks to the PVM starter program. This number must be one less then the number of processors allocated through LoadLeveler.
  • You must halt the PVM daemon when the PVM starter program completes.
  • You can invoke the PVM starter program only once.

Sequence of events in a PVM 3.3.11+ job

This example demonstrates the sequence of events that occur when you submit the sample job command file shown in Figure 48.

Figure 49 illustrates the following:

  • From the job command file, (1) the PVM daemon, pvmd3, and (2) the PVM starter are started under the LoadLeveler starter. The PVM starter tells the PVM daemon to start two tasks (my_PVM_program).
  • (3) The PVM daemon starts the POE Partition Manager, which in turn (4) starts the POE daemons, (represented as pvmd2) on all three nodes.
  • (5) The POE daemons (pvmd2) start the parallel tasks, my_PVM_program, on all nodes under the LoadLeveler starter. The last parallel task, my_PVM_program on Node 3, is the additional task which relays messages between the PVM daemon and the parallel tasks.

Figure 49. Sequence of events in a PVM 3.3.11+ job

View figure.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] vm l@GtsR(b>Oz<2L./usr/lpp/LoadL/html/am2ugmst388.html   aet<>hilnors "/=AILTcdfmpu.BDEHNPRbgvwy%,-3FGMOSVWj0125789:CUXY_kqxz4!#()6?J'KQ&;\|Z@[]{}h" LDw\0bHÎ8jcx) khsG@kjh"0C;U,+uN H2OrW+HH 9-~BVr ʭ94SHsh 0%嫔IZyyyyyyyyyyy014pX(('H*[@0HEJu'"My2 E8 p '@g+֦YlKC0~TG)f5cHW5Aǚ62OR\/Ξz͚)iDP 朘51 yL&Hؤ'Hi5N+2ucN"4ƹ$yHprf4D<"D(Ǟh k$EI0rC5iC+0\=K[߯R_HN ^+2|,)=b\;͞ Q"͚vkR4W>'jdN+2ubVf/5Ịk ŊBm8c]{t0<p9&592NjdcSIh$yLDQ<((4H8`jӖV<aF7焖m.={w};׵ e}^+2u))f4"(sNLkނH<ԇ&oA$Cjb$M∍LyFDP!ē$8V2 ( 5— xIOΦKiqܬS<찭+J~K̙~O:?|+  }l-ׄܨ%kbr10N0"IBbq{N o;F9)کԜF1-ྛ[1HZg$ W:;cN"4ƹ&ILjcz "@RM y7"51@LZr'@ (V~S+_vkj>K"w(,Ɯ$EiɍsSIh$yLDQ<8(4H8`jӖV<aF93.+2uڏ__H]g\\%~=m|_&W'¹B{QxK9_En^lg=1W=kruGX͚qY-{窏Y+_zb,cN"4ƹ&I94$94gr>;F:1MuJ/뷬n^lӊ̟ hF5jӆ;5׹+|"P޳_5, R,r,[)}qmd-@ bDrSR¹RkrW nQkSrNZt^+IZ1\+ 4q G ޹~?FdriW9j#A8 4cPy@C\ӐqC@@q4 MsNA4[>^/6iغ\c'Jo|%ҝ8Eg96t|/µg;F:1ʴB߳ȱՅ_1I./j$S^^Vy_)~EL[)_aXMN9JZqBE>8ZtΙ%Ea9hN11!Q< o"HhN1qLA$CjC7!4`iBsDC&Hq9he`P k_8 >!e 2$PƜ$EiɏsSIh$y"b"DF<@@\"(IyNZX3`N9ħv8Hӓ朙'52N1$yHprf4D<"D(Ǟhk$EI0rC5iC+0\ AvWeryO5ZΕ9]y2٧? OL8p9&5LoA$CjC7!51&DF<@@\"(IyNZX&VZ C%YE2vk{+p LbX$Sߪ +yؤt^X[4'Թz3.Zzefʋ1 @rc\ӓ$I51 yL&H ?¾+qtenqf4"(sNLkނH<ԇ&oA$Cjb$M∍LyFDP!ē$8V2 ( 5ɿ=X 1_j]_rW[lpY8Hӓ朙'52N1$yHprf4D<"D(Ǟhk$EI0rC5iC+0\DL@7¾S Wn1LiDP 朘51 yL&H3!f4"(sNLkrd8Ʀ7!5!ɛIx#SyPHQhq$<է- x roю`Nzӆ;0i_ }~{UxJ^rB9S!%'Jb,Ɯ$EiɍsSIh$yLDQ<8(4H8`jӖV<aF7¼!N'JeX~v 1NE}_ `ES3g#TG)?)_vTY8Hӓ朙'52N1$yHprf4D<"D(Ǟhk$EI0rC5iC+0\hsWT[N扠I)Zp% |5ħjr9NK96w|:z3)BEӄA91jcz "@RM y7"51@LZr'@ (""& % fXcV%7c=Rle"_vX "udSR¹]#.ܧ`~_ 5NUL8flܕ^\?L^es^#٥r+ 0P!1/])'Ndkz3)BE91пrW\3_ [~]p9&592Niɠ1!1\g5Aǚ?-\<>SG)~(+94g9j#A8ii8ơ 8Q S\ӐqC@@q4 sNA4bӦz3)BE>8}ȧQ1H,_AF4jH ,z$& `rH"qB B@ DHoӒEI4cĄ@HpxW7:Sؗ\Zَs7<\Pɭؤt1$. }="rM\9$\ԑr 1){ ʾI#~uَ3_Lq_й9$\Ӂ8Ʀć9sSIh$yLDh) sNLkނH<ԇ&oA$Cjb$Mp2QC>8Zَӌ B1Q1T/̝}"+"?f9C%{UXl'+\jl'/]k׽^ל}~uo, ʶr'V32q^LjEV1{@%ָFgؤزfNIyu֜14MN^|%Zӄ$rWc:N4 9 UZejӄ$rH"ӑ 4@jHIR DGv. ׹YL~| b퓎 fcQt9zAb_9+2~%3N8,Ɯ.jHֿ_9+2~^Q[|f4sREɿ;Kr8ɋ17¼#\NK rVdϓΥL*,Ƒr+{D8Mxn_IQ)کa\S¾+n*RdKOk-u'JbrW~|K=0WHڏ_JvXW#>ϯy2{BE=Y}}r^:}~Mv#|;_{͞Wˑm}~qY+X_Eo}{Qzb+~t|0~(N1hH+bN1",)ؤזю`fRyGY.g+2W)௚_9gιfOc=ۦ0KfbS+^Lj$PiCюb3)vsWU q#B 8cD$ N@pӈEg=>C_[8e"~6>:\)l ¼#0K{F9ӆ;IԎ\"=^+2u#)ۦ"Hs1F<0ӆ;MNb 4 8ƌpB DRI whihq8!1$.Eog|1]V1݊ML}"~ }~S<+\cqڻ0UuE@R 4 8/~S;D$k8MN[\,8Ќ|t}:Ў| BPL^ephrHq"\9hr8MpӐ\QA[Y@CNA8x  Dh44會j5H kr1r-l$Sosܕ~KL^eq^&྇^VS -ǓS%|$}~u%hG:O~]%}򵲘ahF>YaxKmrL'9[9_5 h(cAB8ӆ;0i_ }~p2|5׹+á%~9[9 h=XQ warӢ&d.krWΝW]'.ٙ4ϯYĮ}V6t/JoήVz NpYMsS$7!5!ɛI\ю!=SN朘51 yL&H^Z_j.$׆NQ.ӑw^|%Y|,¹H,_yyesBXZqBE=vm|%׵ g܂w汚+քsoVI _j>W1IsF:^_N1Qλ\k8ch)\AܬSָpӈEr^ָ~䯵70r)ռ^rbS+Q|ơ 8Q @f 0jh HwDjc48jdcSIh$y"b"DFP@Y#<+ۦ~9+֝=YLQfRU^gKmM_t~`fWlaW~zyfL_M\jl'/]qW-L~WlaW2[}/})کa\Sj_+_[_=ז~KԽw[ՙi 1 9&592NjdcSIh$yLDQ<У\4]6 ʾt'̴ӓ7!5!ɛIx#Syk~[Sa9W3g{2Go}ځ LqR\_.?irʏ_s-/UJU2鞘+.*|+Uf^b9P|lWܕ1a\JN7Bq_9xIIssosKNrcNLkrd8Ʀ7!5!ɛIx#SyF7^|%5tyIz-!f4ƹ$yHprf4D<"D(Ǟhᘥ\'뷹^[p+.*5PY.TG)S)ۦԹuBEz1Mk_][;U,+ ~u.ǯj>ε,fb_DFq4h&8i|rk̟yۦ#\, uϯKjr9CG .X}5hIS~8-F>POrTeY_:+)o ї½Ktjr9O\$Dg$ Aff=T~*+m.={~S e$S̟3/1Isa9 1&592NjdcSIh$yLDQ<У\\z+)ڏW2cNLkނH<ԇ&oA$Cjb$M∍Lyp8QB1ڏES$QS%z~\z+)ӓ¢__L׹YLWnP rb,ޣ@|$r -3YsI^|%+e!5|xI~u2[K^e2]2\gev<7KŧL\"0_j?GW*XuCKn}~u'Jb뷹QVd{Jekτs\L^e2̴ߔŧL .[~u2[K^e2cԝ)lKa6a|6EFʲMxo:+X[̝yhF+|lW3g{2G\ '%EP)Y0#OGf,Ƒpi GW2 de9hF>SO=}ۦan+2u[)^}摒UcH9b+͚)KWGQ{Q\8C4HHFv=gY|{x-cF:1$׆ڏW2cNLkrdӓ@bCcHj4~Z@x pgvIX){_{j> fZ\^1IX){fO¾PO9}~uGXri1rc%_:-ǯrO ͌$P4Ӈ1r|+R)^_1i gCQxJ+“^Ν[!f4sRE,{g:zڛC_% ڏ1"2Wur)׵ gux RWb-ǯrO'-/ Qt:_K|6L-ǯrAf4sN$Eɿ:ӌfZ)Յ]We`s\z+)RbMC_ }~zҝ;A\ƑsF9r<f pt 3L` v G:-ǯrJӆ9 8Ʀćq$%9sFqIǒyε4YH,osQDg, )$yDhԇ8an 3LӐsN$rc\ӓ$朚E^,-p пS%VS)_\XH ez䩃|-::L ,~esuޤ嶦Z пS%VS)׵ g*LZt!j> S-|+~B;u2[K^e2~VW} gb W~=rΦKiqܬS-~C_bN1"ߝA{~KiqܬS¾>ϩ9mB|%ా4/ܕ0p]p/H/t/m.={yO`2_I^de+_62Oe.%B)N-GYfnQxJ+“^Ν[!}7t/9z3)BE//z\ XGYӎuxS}ܽrΦKiqܬP/LՕ_{8#p/B1}L׹YLP$P~W:,B{_{-GY~U_ iu2[KܬPY9$\Ӂ8ƑpNK rK}X¾oa9yҙ-dYLyhk־kxKp!fxJq'*ZTUxJܤ$S}f}RrYE௹NK3XNA||يUR )_qܤs\z+)=m| @EZ,Ƒpt%Ea9y#l$Sw)+de*e>ϵ<͵^EW4 5[p +1Ab{Qz'ui8Y"\9h񤄁c-՟G\VlLj^⼙~?Wj>ӐsNơ 8Q @S\ӐqC@@q4 sNA4[ߔFe(H窏U/9u_VlLj^gfNUG|+Uz)Oy]c5ָG/Woq|Ahe~\z+)2%:ezM^&窏U/9z3)BE<^|%~䩌S 䶽IhMZ_!XH,U׵ g}2tW|#akfXxW!"ژ,Ɯ.jH)9->:-ǯrO) &RP ~ E)"~SWMΦKiqܬSj> ՜1'%Q prH"kVsc6<1hs]ihwaʨUV1gf3cfNUG~zU_9}~u| 1$.%̮{zЎW__)کa\SՉ=XSп_kqS%5p_ָG^ZtBE=H/=K/2cF:?TILT8ӆ;oq|1$pN1\+ 4"@\#p9]Vp4< 8G嫔S\(Y{}k0W`! <  Dh44 湧 Eɿ:]uovۦռ^r%~=vov2tܧbS+Q}77)ؤљW̘̟b3g2~vkk}~zMxnt ~S\equ/9z3)BE:ӧ¹Br/]׺Sxcgqϳӆ;)}_9DfN߯\hU1__)|+1Ԝ pEsPyp\(Yd{ҝ>Fw:)5TӐsNqC@@q4 sNA5Aǚhh$599j#AH$4cHQ.^lMxoNGY>TG(R)^ЌL(/6i1ؤքq wӦ| b$SQVdܼ̂٧B^¿_'Iz#|F__z}PyO16i,~F!~rλR)^׀XIH$S4~юcT O~FwĠ#tpyC#21#tЌ~Fw9yNюpU)7ŧLH/6iҝ9;fOg}fy-" )!_ H1i&ˬ֝>ªS(_½Vs<%KSa9WoWcS.J~yzs\v45;8f`Jׄ8e"ؤR{_c%e!._*fFw>9IPW1Ip*tua1oq$@1) i#yfhb@@f1$94f+|%dD_D0 " DHϯ -NJFWoy~K^lӊ̟RkrW+$8pǟ' iIq"jr9NZ/~3) p%~n͚rw?̟,({~F>R o_v3QfWb~/Μ'Iz# CYG>Fv.h3 PQ*{*^+Ԧ^⼳])کa\S¾PN+2u 94g9j#A( 59.~K=Je{*^x/e3oΣ4Kb_Uیq_62C*^o{Y.TG)_Fe(H:~zyfN2W^|*^G便: ? evt|jY8זc2|yj_=m|^Y/S B}|(+2uLSYĿ0_N}\n}~W a-Vd͞5L^escքrjMΔTG)1bsN.JvFwpuaLSgHe}_zev)کa\PY"$p8C4'6+L5͏5)9pǒyCAqok lבӐw/jy!ǛkY(t4 ^&Ӑw/jy!ǛkY(t^ 9rvN9[ڴ ڇHpqVJ/]@`o$$$FJI w1r<ѓx6S sNA4#AiIrJ k^c R(b>hz<=XY>2A ./usr/lpp/LoadL/html/am2ugmst45.html IBM LoadLeveler for AIX 5L: Using and Administering - The LoadLeveler main window IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

The LoadLeveler main window

LoadLeveler's main window has three sub-windows, titled Jobs, Machines, and Messages, as shown in Figure 10. Each of these sub-windows has its own menu bar.

Figure 10. Main window of the LoadLeveler GUI

View figure.

The menu bar on the Jobs window relates to actions you can perform on jobs. The menu bar on the Machines window relates to actions you can perform on machines. Similarly, the menu bar on the Messages window displays actions you can perform related to LoadLeveler generated messages.

When you select an item from a menu bar, a pull-down menu appears. You can select an item from the pull-down menu to carry out an action or to bring up another pull-down menu originating from the first one.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]  k?T R(b>z<=XY>2wT ./usr/lpp/LoadL/html/am2ugmst84.html IBM LoadLeveler for AIX 5L: Using and Administering - Understanding striping IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

Understanding striping

For a job to successfully run using the striped adapter method, there must be a common communication path among the nodes and adapters on the system. This communication path between different nodes and adapters is called the communication fabric.
View figure.
Consider these sample scenarios using the network configuration as indicated in the preceding figure:

  • If a three node job requests a striped adapter, it will be dispatched to Node 1, Node 2 and Node 4 where it can communicate on Network B. It cannot run on Node 3 because that node only has a common communication path with Node 2, namely Network A.
  • If a three node job requests css0, it will not be run because there are not enough connected adapters on css0 to run the job. Notice both adapter A on Node1 and adapter A on Node 4 are both at fault.
  • If a three node job requests striped IP and some but not all of the nodes have multi-linked addresses, the job will only be dispatched to the nodes that have the multi-link addresses.

As you can see from these scenarios, LoadLeveler will find enough nodes on the same fabric to run the job. If enough nodes in the fabric cannot be found, no communication can take place and the job will not run.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ] 84.h kR=S(b>@$Y>DXY>2#R=./usr/lpp/LoadL/html/addenda.htmlml LoadLeveler 3.1 documentation addenda; January 2003

LoadLeveler 3.1 documentation addenda; January 2003


Contents

AIX Large Page support added to LoadLeveler

Specifying a time limit for the negotiator cycle

Specifying UNIX domain socket file locations

Additional Checkpoint/Restart Restrictions

Specifying policy for enforcing resources

New query flag for Data Access API

Modifications to APIs to support preemption under API scheduler

Process Tracking Kernel Extension changes

New Config File Keyword PREEMPTION_SUPPORT


AIX Large Page support added to LoadLeveler

In AIX 5.1D and subsequent releases, a feature called Technical Large Page Support is available. Technical Large Page Support involves the selective use of large virtual and physical memory pages to back private data segments of a process. When specified, the user process heap, the main program BSS, and the main program data areas are backed by large pages. LoadLeveler users can take advantage of this AIX feature and enable Large Page support for their jobs.

Note for administrators:
Although large pages are supported when running jobs under LoadLeveler, running LoadLeveler daemons with large pages enabled is not supported. If you attempt to start up LoadLeveler on a node using the llctl start or llctl recycle commands and the AIX environment variable LDR_CNTRL=LARGE_PAGE_DATA=Y or LDR_CNTRL=LARGE_PAGE_DATA=M is set then LoadLeveler will not start and the LoadL_master daemon will report an error.

For more information about LDR_CNTRL, please refer to the AIX documentation.

If some of the machines in your LoadLeveler cluster are configured to exploit the Large Page feature, and if you want LoadLeveler to provide support for large pages, then the information contained in the items listed below is needed for effective use of this feature:

  1. VM_IMAGE_ALGORITHM keyword setting

    The LoadLeveler configuration keyword VM_IMAGE_ALGORITHM should be set to the value FREE_PAGING_SPACE_PLUS_FREE_REAL_MEMORY. This keyword specifies which algorithm the Central Manager uses to decide whether a machine has enough virtual memory to meet the requirement of the image_size keyword of a job step.

    The default setting of VM_IMAGE_ALGORITHM is FREE_PAGING_SPACE. Changing the algorithm to FREE_PAGING_SPACE_PLUS_FREE_REAL_MEMORY allows LoadLeveler to consider both the free "regular" memory and the free Large Page memory when deciding if a machine in the cluster has enough virtual memory to run a job step.

     

  2. New LoadLeveler job command file keyword: large_page

    The job command file keyword large_page is used to inform LoadLeveler that a job step requires Large Page support from AIX. The syntax of this keyword is:

    # @ large_page = <value>
    Where <value> can be:
    Y : Use Large Page memory if available, otherwise use regular memory
    M : Use of Large Page memory is mandatory
    N : Do not use Large Page memory (default value)

     

  3. llq and llsummary enhancements

    The "llq -l" and "llsummary -l" commands have been enhanced to display information associated with the large_page keyword for LoadLeveler jobs. The output listings of these commands contain lines similar to the following:

    System Priority: -37
    Notifications: Complete
    Virtual Image Size: 500.000 mb
    Large Page: Y
    Checkpointable: no

     

  4. llstatus enhancement

    The "llstatus -l" command has been enhanced to show information for total memory and free memory for both Large Page memory and regular memory. Below is a fragment of a representative "llstatus -l" command output:

    Max_Starters = 50
    Total Memory = 3583 mb
    Memory = 1536 mb
    FreeRealMemory = 1113 mb
    LargePageSize = 16.000 mb
    LargePageMemory = 2.000 gb
    FreeLargePageMemory = 2.000 gb

    In the above listing, Total Memory refers to the sum of regular and Large Page memory. Memory and FreeRealMemory refer to regular and free regular memory.

     

  5. New requirements and preferences variables: TotalMemory and LargePageMemory

    Two new LoadLeveler variables, TotalMemory and LargePageMemory, have been added. These variables are supported by the requirements and preferences expressions.

    In the following sample job command file, the person submitting this job to LoadLeveler has requested that the job be run on a machine that has at least 1500 MB of Large Page memory configured and that machines having total memory (regular and Large Page) greater than 2800 MB are preferred.

    #!/bin/ksh
    # @ step_name = test_batch_job
    # @ class = small
    # @ restart = no
    # @ arguments = arg_01 arg_02 arg_3
    # @ environment = env_001=0001; LDR_CNTRL=LARGE_PAGE_DATA=Y; env_002=0002
    # @ requirements = (LargePageMemory > 1500)
    # @ preferences = (TotalMemory > 2800)
    # @ executable = test_program
    # @ cpu_limit = 120
    # @ large_page = Y
    # @ image_size = 512000
    # @ input = /dev/null
    # @ output = job1.$(Host).$(Cluster).$(Process).out
    # @ error = job1.$(Host).$(Cluster).$(Process).err
    # @ queue

    Notes:

    1. The units for LargePageMemory and TotalMemory are megabytes and these variables are 64-bit integers.
    2. The environment variable LDR_CNTRL is an AIX variable. LDR_CNTRL=LARGE_PAGE_DATA has the values, Y (Large Page use optional), M (Large Page use mandatory), and undefined (do not use Large Page) as the default value.

      For more information about LDR_CNTRL, please refer to the AIX documentation.

    3. The person submitting this job has requested that Large Page memory should be used to run the test_program if it is available.

     

  6. ll_get_data() enhancements

    The function ll_get_data() of the LoadLeveler API has been enhanced so that Large Page information of machines can be accessed by the specifications:

    • LL_MachineLargePageSize64
    • LL_MachineLargePageCount64
    • LL_MachineLargePageFree64

    The large_page information associated with job steps can be accessed by the specification LL_StepLargePage.

     

  7. Workload Manager

    The AIX Workload Manager (WLM) program does not support Large Page memory. The information associated with memory statistics in the outputs of commands such as "llq -w <job_id>" is not meaningful and should be ignored.

     


Specifying a time limit for the negotiator cycle

The Configuration File keyword, NEGOTIATOR_CYCLE_TIME_LIMIT has been added to LoadLeveler using the form:

NEGOTIATOR_CYCLE_TIME_LIMIT = number

In this expression, "number" specifies the maximum time (in seconds) that LoadLeveler will allow the negotiator cycle to continue. After the specified number of seconds, the negotiator cycle will be ended, even if there are additional jobs to be considered for dispatch. Those jobs will be considered in the next subsequent negotiator cycle.

The NEGOTIATOR_CYCLE_TIME_LIMIT keyword applies only to the BACKFILL and GANG schedulers. The number specified must be a positive integer value, or zero. If a keyword value is not specified, or if the value specified is zero, the negotiator cycle will be unlimited. Prior to the introduction of the NEGOTIATOR_CYCLE_TIME_LIMIT keyword, the negotiator cycle functioned with an unlimited time frame.


Specifying UNIX domain socket file locations

The Configuration File keyword, COMM has been added to LoadLeveler using the form:

COMM = directory (default is /tmp)

In this expression, "directory" specifies a local directory where LoadLeveler keeps special files used for UNIX domain sockets for communicating among LoadLeveler daemons running on the same machine. This keyword allows the administrator to choose a different filesystem than /tmp for these important files.

Note:
If you change the COMM option, you must stop and restart LoadLeveler using llctl.

Additional Checkpoint/Restart Restrictions

The following checkpoint/restart restrictions are in addition to the restrictions listed in the Using and Administering Guide:

  1. SP Switch Communications Adapter - Type 6-9 (Microchannel TB3 adapters) are not supported.
  2. The only DCE function that will be supported with Checkpoint/Restart will be DCE credential forwarding by LoadLeveler, using the DCE_AUTHENTICATION_PAIR configuration keyword, for the sole purpose of DFS access by the application. No other DCE function will be supported with Checkpoint/Restart.
  3. A set of processes is not checkpointable if any of the processes is running a setgid program when a checkpoint occurs.

Specifying policy for enforcing resources

The Configuration File keyword, ENFORCE_RESOURCE_POLICY has been added to LoadLeveler using the form:

ENFORCE_RESOURCE_POLICY = hard | soft | shares

Where:

hard
indicates that WLM classes will be created with hard limits representing the percentage of step requested resources/total machine resources.
soft
indicates that WLM classes will be created with soft limits representing the percentage of step requested resources/total machine resources.
shares
indicates that WLM classes will be created with a resource share representing step requested resources.
Default value if not hard or soft.

This keyword is ignored if ENFORCE_RESOURCE_USAGE is not set.

The configuration keyword can be specified in the LoadL_config.local file to have a different policy per machine.


New query flag for Data Access API

ll_set_request subroutine

  • Parameters
    • query_flags
    • When query_type (in ll_query) is JOBS, query_flags can be the following:

      QUERY_PROCID
      Query by process id.

    • object_filter
    • Specifies search criteria. The value you specify for object_filter is related to the value you specify for query_flags:

      • If you specify QUERY_PROCID, the object_filter must contain a list with a single process id.

      The last entry in the object_filter array must be NULL.

    • data_filter
    • Filters the data returned from the object you query. The value you specify for data_filter is related to the value you specify for query_type:

      • If you specify JOBS and query_flags QUERY_PROCID, you must always specify ALL_DATA.

  • Description
  • The QUERY_PROCID flag should not be used in combination with any other query_flags.

    ll_get_objs subroutine

    • Parameters
      • query_daemon

        The following indicates which daemons respond to which query flags. When query_type (in ll_query) is JOBS, the query_flags (in ll_set_request) listed in the left-hand column are responded to by the daemons listed in the right-hand column:

        QUERY_PROCID              startd (LL_STARTD)


    Modifications to APIs to support preemption under API scheduler

    ll_preempt() and ll_start_job() APIs have been modified in order to support preemption in API scheduler

  • llpreempt(): In addition to the existing PREEMPT_STEP and RESUME_STEP a new option SYSTEM_PREEMPT_STEP has been added to LL_preempt_op enum. This option can be used as a parameter for ll_preempt() to system preempt a step.
  • ll_start_job(): This API can also be used to restart a preempted (system preempted/user unpreempted) step under API scheduler, in addition to its primary use of starting an idle step. Since in restart case CM already knows which nodes this step is running on there is no need to fill in the nodeList element in the LL_start_job_info structure. LL_start_job_info::nodeList should be set to NULL to avoid possible memory errors.

  • Process Tracking Kernel Extension changes

    The Process Tracking Kernel Extension is now changed to never unload. LoadLeveler uses the Kernel extension only when process tracking is set. If process tracking is set and if the LoadLeveler Kernel extension is already loaded when a new kernel extension is applied, then a reboot will be required. Otherwise, the Kernel Extension will detect a different version level and startd will exit.


    New Config File Keyword PREEMPTION_SUPPORT

    PREEMPTION_SUPPORT = full | none

    The value for this keyword determines if preemption is enabled for a cluster. A value of FULL means preemption is supported and NONE means preemption is unsupported. Default value for PREEMPTION_SUPPORT for GANG is FULL and for the rest of the schedulers it is NONE. If this keyword is set to FULL, LoadLeveler will check if other conditions such as MACHINE_AUTHENTICATE = TRUE, PROCESS_TRACKING = TRUE, are met at the start time and preemption requests will be accepted by negotiator. If this keyword is set to NONE, a request to preempt a job will be rejected by negotiator. One of the purposes of the keyword is to allow non-threaded MPI jobs to be scheduled to run under API scheduler by setting this keyword value to NONE.

    on key k2S(b>`22./usr/lpp/LoadL/html/index.htmlIBM LoadLeveler for AIX (5765-E69), Version 3 Release 1 Publications


    IBM LoadLeveler for AIX (5765-E69), Version 3 Release 1 Publications

    >