User-level checkpointable jobs

Building a user-level checkpointable job involves re-linking your application object files (.o files) with the LSF checkpoint startup routine and library. LSF also provides a set of replacement linkers that call the standard linkers on your platform with the correct options to build a checkpointable application. LSF provides:

  • libckpt.a, the checkpoint library

  • ckpt_crt0.o, the checkpoint startup routine

  • ckpt_ld the checkpoint linker for C language applications

  • ckpt_ld_f the checkpoint linker for Fortran applications

Library

The checkpoint library replaces low-level system calls such as open(), close(), and dup(), and contains signal handlers and routines to internally implement checkpointing.

Startup routine

The startup routine replaces the language-level module that calls main(), sets the checkpoint signal handler, and initializes internal data structures used to record job information.

Linkers

The checkpoint linkers are used to re-link your application with the checkpoint library and startup routine. They are shell scripts that call the standard linkers on your operating system with the correct options. The scripts are designed to use the native compilers on most platforms. Use ckpt_ld for C language applications and ckpt_ld_f for Fortran applications. The following compilers are supported by the ckpt_ld replacement linker:

Operating System

Compiler

AIX

cc

HP-UX

c89

IRIX 6.2

For IRIX 6.2 you need to use cc with the ‑non_shared ‑mips2 ‑32 compiler options, and ckpt_ld with ‑mips2 ‑32 linker options. For example, to compile and link my_job.c:

% cc -c my_job.c -non_shared -mips2 -32

% ckpt_ld -o my_job my_job.o -mips2 -32

OSF1

cc

Solaris

cc (SUN C compiler) and gcc

SunOS

gcc