Slurm advanced guide

Job array

Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.), however it is possible to change some of these options after the job has begun execution using the scontrol command specifying the JobID of the array or individual ArrayJobID.

Full documentation here

Possibility:

#SBATCH --array=1,2,4,8
#SBATCH --array=0,100:5 # equivalent to 5,10,15,20...
#SBATCH --array=1-50000%200 # 200 jobs max at the time

Example 1: Deal with a lot of files

Gif file_array

#!/bin/bash
#SBATCH --array=0-29  # 30 jobs for 30 fastq files

INPUTS=(../fastqc/*.fq.gz)
srun fastqc ${INPUTS[$SLURM_ARRAY_TASK_ID]}
  • The process is represented in the figure below:


1

Example 2: Deal with a lot of parameters

Gif param_array

#!/bin/bash
#SBATCH --array=0-4  # 5 jobs for 5 different parameters

PARAMS=(1 3 5 7 9)
srun a_software ${PARAMS[$SLURM_ARRAY_TASK_ID]} a_file.ext
  • The process is represented in the figure below:


2

Job dependency

If you want slurm to wait that a job is finish before launching another, you can use the sbatch flag --dependency

For example, there is a dependency between the fastqc result and multiqc

fastqc.sh

#!/bin/bash
#SBATCH --array=0-29  # 30 jobs

INPUTS=(../fastqc/*.fq.gz)
fastqc ${INPUTS[$SLURM_ARRAY_TASK_ID]}
$ sbatch fastqc.sh
Submitted batch job 3161045

multiqc.sh

#!/bin/bash
multiqc .
sbatch --dependency=afterok:3161045 multiqc.sh

ntasks

In a job, you can lauch multiple task. Each task will be lauch in parrallel with the ressource that you had specify. This is done by the following slurm parameters.

  • --ntasks/-n: number of tasks (default is 1)

If we want to use the srun command :

$ srun hostname
cpu-node-1
$ srun -n 2 hostname
cpu-node-1
cpu-node-1
$ srun --nodes 2 -n 2 hostname
cpu-node-1
cpu-node-2

In this exemple, when the --ntasks slurm lauch two srun task in parrallel.

If we want to use the sbatch command :

test.sbatch :

#!/bin/bash
#SBATCH --ntasks=3
#SBATCH --nodes=3

srun hostname

If we run sbatch test.sbatch, we will optain something like :

cpu-node130
cpu-node131
cpu-node132

In this exemple, when the --ntasks slurm lauch three srun task in parrallel.

MPI

In order to use MPI, you must request a number of tasks with --ntasks. If you use openmpi, see --ntasks as the number of slots available. The command mpirun will automaticly use all the slots available The most simple way to lauch mpi jobs in slurm would be with salloc.

For exemple with openmpi :

$salloc -n 2 mpirun hostname
salloc: Granted job allocation 5230859
cpu-node132
cpu-node133
salloc: Relinquishing job allocation 5230859
salloc: Job allocation 5230859 has been revoked

Lauch 2 task in parrallel on 2 nodes.

If you want to use MPI in sbatch script :

mpi.sbatch :

#!/bin/bash
#SBATCH --ntasks=3
#SBATCH --nodes=3

mpirun hostname

If we run sbatch mpi.sbatch, we will optain something like :

cpu-node132
cpu-node133
cpu-node134