If BACKFILL is configured in a queue, and a run limit is specified with ‑W on bsub or with RUNLIMIT in the queue, backfill jobs can use the accumulated memory reserved by the other jobs, as long as the backfill job can finish before the predicted start time of the jobs with the reservation.
Unlike slot reservation, which only applies to parallel jobs, backfill on memory applies to sequential and parallel jobs.
Each of the following sequential jobs requires 400 MB of memory. The first three jobs run for 300 minutes.
The job starts running, using 400M of memory and one job slot.
Submitting a second job with same requirements get the same result.
Submitting a third job with same requirements reserves one job slot, and reserve all free memory, if the amount of free memory is between 20 MB and 200 MB (some free memory may be used by the operating system or other software.)
The job keeps pending, since memory is reserved by job 3 and it runs longer than job 1 and job 2.
The job starts running. It uses one free slot and memory reserved by job 3. If the job does not finish in 100 minutes, it is killed by LSF automatically.
The job keeps pending with no resource reservation because it cannot get enough memory from the memory reserved by job 3.
The job starts running. LSF assumes it does not require any memory and enough job slots are free.
Each process of a parallel job requires 100 MB memory, and each parallel job needs 4 cpus. The first two of the following parallel jobs run for 300 minutes.
The job starts running and use 4 slots and get 400MB memory.
Submitting a second job with same requirements gets the same result.
Submitting a third job with same requirements reserves 2 slots, and reserves all 200 MB of available memory, assuming no other applications are running outside of LSF.
The job keeps pending since all available memory is already reserved by job 3. It runs longer than job 1 and job 2, so no backfill happens.
This job starts running. It can backfill the slot and memory reserved by job 3. If the job does not finish in 100 minutes, it is killed by LSF automatically.