WebSphere Extended Deployment Compute Grid, Version 6.1
             Operating Systems: AIX, HP-UX, Linux, Solaris, Windows, z/OS


The parallel job manager (PJM)

The parallel job manager ( PJM) provides a facility and framework for submitting and managing transactional batch jobs that execute as a coordinated collection of independent parallel sub-jobs.

PJM architecture and programming model

The following image summarizes the PJM architecture, which shows where the SPI's are called:


PJM architecture

Sequence of a parallel job

The following image showed the order of events in a parallel job:


Sequence of a parallel job

PJM system application and parallel jobs

The PJM is an Enterprise JavaBeans (EJB) application which monitors and manages parallel sub jobs. Parallel sub jobs are batch J2EE applications. The PJM is a one-step batch Java 2 Platform, Enterprise Edition (J2EE) job. The PJM does not process batch data streams, but instead submits or restarts subjobs under the control of step properties which identify the subjob in the job repository and the count of subjobs to process.

A parallel job is composed of a top-level job that runs the ParallelJobManager application, and a set of sub-jobs that run the actual business logic. Sub-jobs run the same job definition, but each with potentially distinct inputs. All sub-jobs are managed together as a single logical job.

Separate xJCL definitions are required for both the top-level and sub-jobs. All sub-jobs run using the same xJCL definition; each sub-job instance can be parameterized with distinct substitutions properties.

A logical transaction is a unit of work demarcation that spans the running of a parallel job. Its lifecycle corresponds to the combined lifecycle of the parallel job’s sub-jobs. An extension mechanism enables customization so that application-managed resources can be controlled in this unit of work scope for commit and rollback purposes.

Job naming convention

The job name of a parallel job is specified in the xJCL that defines a top-level job. The top level job xJCL can originate from the file system or from the Compute Grid job repository. The job name can be parameterized using standard substitution property notation: <job name=”${jobname}” … >.

A job ID is formed by concatenating a job name with a system generated sequence number. A job name and sequence number are colon separated. The job name of a sub-job is the parallel job’s job ID. The xJCL for a sub-job can originate only from the Compute Grid job repository. Sub-job xJCL must supply the job name substitution property: <job name=”${jobname}” … >

The following image summarizes the PJM architecture, which shows where the SPI's are called:

PJM job management

The top-level job submits the sub-jobs and monitors their completion. The top-level job end state is influenced by the outcome of the sub-jobs as follows:
  1. If all sub-jobs complete in the ended state, that is, in a successful completion, then the top-level job will complete in the ended state.
  2. If any sub-job completes in the restartable state and no sub-job has ended in the failed state, then the top-level job will complete in the restartable state.
  3. If any sub-job completes in the failed state, then the top-level job will complete in the failed state.
  4. If the top-level job and sub-jobs are in the restartable state, only the top-level job should be restarted. If any sub-jobs are restarted manually, then the top-level job will not process the logical transaction properly.



Related concepts
Managing Compute Grid jobs and their environment
Other considerations when running the parallel job manager
Related tasks
Installing and configuring the parallel job manager (PJM)
Sample application for the parallel job manager
Related reference
System Programming Interfaces (SPI) and properties
parallelJobManager.py script
Concept topic    

Terms of Use | Feedback

Last updated: Oct 30, 2009 6:22:31 PM EDT
http://publib.boulder.ibm.com/infocenter/wxdinfo/v6r1/index.jsp?topic=/com.ibm.websphere.gridmgr.doc/info/scheduler/ccgparallel.html