The parallel job manager (PJM) provides a facility and framework for submitting and managing transactional batch jobs that execute as a coordinated collection of independent parallel sub-jobs.
The following two images depict the PJM architecture and the sequence of a parallel job. First, a top-level job is submitted to the job scheduler, which determines that it is a parallel job and dispatches it to the PJM. The PJM invokes the parameterizer SPI to help divide the job into sub-jobs. The PJM then invokes the LogicalTX synchronization SPI to indicate the beginning of the logical transaction. The PJM then uses the sub-job xJCL stored in the repository to submit the sub-jobs to the Job Scheduler The job scheduler then dispatches the sub-jobs to the batch container endpoints for execution After the PJM detects that the sub-jobs have begun running, it invokes the life cycle SPI. No context information is available as part of this invocation, unlike the other SPI invocations. Next, the batch container runs the sub-job. When a checkpoint is taken, the sub-job collector SPI is invoked. This SPI collects relevant state information about the sub-job. This data is sent to the sub-job analyzer SPI for interpretation. After all sub-jobs reach a final state, the synchronization SPI beforeCompletion and afterCompletion are invoked. The analyzer SPI is also invoked to calculate the return code of the job.
The following image summarizes the PJM architecture, which shows where the SPI's are called:
The following image showed the order of events in a parallel job:
The PJM is an Enterprise JavaBeans (EJB) application which monitors and manages parallel sub jobs. Parallel sub-jobs are batch J2EE applications. The PJM is a one-step batch Java 2 Platform, Enterprise Edition (J2EE) job. The PJM does not process batch data streams, but instead submits or restarts sub-jobs under the control of step properties which identify the sub-job in the job repository and the count of sub-jobs to process.
A parallel job is composed of a top-level job that runs the ParallelJobManager application, and a set of sub-jobs that run the actual business logic. Sub-jobs run the same job definition, but each with potentially distinct inputs. All sub-jobs are managed together as a single logical job.
Separate xJCL definitions are required for both the top-level and sub-jobs. All sub-jobs run using the same xJCL definition; each sub-job instance can be parameterized with distinct substitutions properties.
A logical transaction is a unit of work demarcation that spans the running of a parallel job. Its lifecycle corresponds to the combined lifecycle of the parallel job’s sub-jobs. An extension mechanism enables customization so that application-managed resources can be controlled in this unit of work scope for commit and rollback purposes.
The job name of a parallel job is specified in the xJCL that defines a top-level job. The top level job xJCL can originate from the file system or from the Compute Grid job repository. The job name can be parameterized using standard substitution property notation: <job name=”${jobname}” … >.
A job ID is formed by concatenating a job name with a system generated sequence number. A job name and sequence number are colon separated. The job name of a sub-job is the parallel job’s job ID. The xJCL for a sub-job can originate only from the Compute Grid job repository. sub-job xJCL must supply the job name substitution property: <job name=”${jobname}” … >