Slurm user guide

Slurm is the queue manager used on the iPOP-UP@RPBS cluster. You must use Slurm to submit jobs to the cluster.

Slurm account specification

Slurm account information is based upon four parameters: "cluster", "user", "account", "partition" that form what is referred as an association. Depending on your lab or scientific projects, you will have access to different partitions and accounts. Partitions refer to a group of nodes which are charaterized by their entity and accounts refer to the scientific projects you are involved in. The ipop-up partition is accessible to the whole iPOP-UP community.

You can check your user's associations with:

sacctmgr show user $USER withassoc

Gif sinfo 1

It is good practice to always specify the partition and the account that will be used when launching your jobs. You can also set your default project account with :

sacctmgr update user $USER set defaultaccount=<project-name>

You don't need to specify the cluster as there is presently only one cluster to submit jobs to (production).

Slurm partitions and nodes

The RPBS cluster is organized into several Slurm partitions. Each partition gathers a set of compute nodes that have similar usage.

To view all partitions available on the cluster, run:

sinfo

Gif sinfo 1

The most common states are:

IDLE: The node is not allocated to any jobs and is available for use.
MIXED: The node has some of its resources ALLOCATED while others are IDLE.
DRAINED: The node is unavailable for use per system administrator request.

For a complete list of all possible states, see this page.

To view only available nodes run:

sinfo -Nl

Gif sinfo 1

Submit jobs to the cluster

You can use multiple commands to submit a job to the cluster:

srun to run jobs interactively in real time,
salloc to run job interactively when resources are available,
sbatch to submit a batch job when resources are available.

Submit a job using `srun`

To learn more about the srun command, see the official documentation.

Usage

The job will start immediately after you execute the srun command. The outputs are returned to the terminal. You have to wait until the job has terminated before starting a new job. This works with ANY command.

Example:

srun hostname

Gif srun_h

Example if an interaction is needed:

module load r
srun --mem 20GB --pty R

Gif srun_R

In this example:

--pty: will keep the interaction possible,
--mem 20GB: will allow 20 GB of memory to your job instead of the 2GB by default.

Submit a job using `salloc`

To learn more about the salloc command, see the official documentation.

Usage

The job starts when resources are available. The outputs are returned to the terminal. This works with ANY command.

Example:

salloc hostname

Gif srun_h

The salloc command is useful for running programs that make use of MPI:

module load openmpi/4.0.4
module load amber/22-rpbs
salloc -ntasks 8 mpirun --mca btl_tcp_if_include 10.0.1.0/24 pmemd.cuda.MPI -groupfile remd.groupfile

In this example:

--ntasks 8: will allocate 8 slots for mpirun.

Submit a job using `sbatch`

To learn more about the sbatch command, see the official documentation.

Usage

The job starts when resources are available. The command only returns the job id. The outputs are sent to file(s). This works ONLY with shell scripts. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.

Gif sbatch

Batch scripts rules

The script can contain srun commands. Each srun is a job step. The script must start with shebang (#!) followed by the path of the interpreter

#!/bin/bash
#!/usr/bin/env python

The execution parameters can be set:

At runtime in the command sbatch

sbatch --mem=40GB bowtie2.sbatch

Or within the shell bowtie2.sbatch itself

#!/bin/bash
#
#SBATCH --mem 40GB
srun bowtie2 -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam

sbatch bowtie2.sbatch

The scripts can contain slurm options just after the shebang but before the script commands → #SBATCH

Note: that the syntax #SBATCH is important and doesn't contain any ! (as in the Shebang)

Advice: We recommend to set as many parameters as you can in the script to keep a track of your execution parameters for a future submission.

Check running jobs informations

List a user's current jobs:

squeue -u <username>

Gif sbatch

The different status are:

PENDING: PD
RUNNING: R
COMPLETED: C

If your job is not displayed, your job is finished (with success or with error).

List a user's running jobs:

squeue -u <username> -t RUNNING

List a user's pending jobs:

squeue -u <username> -t PENDING

View accounting information for all user's job for the current day:

sacct --format=JobID,JobName,User,Submit,ReqCPUS,ReqMem,Start,NodeList,State,CPUTime,MaxVMSize%15 -u <username>

View accounting information for all user's job for the 2 last days (it worth an alias):

sacct -a -S $(date --date='2 days ago' +%Y-%m-%dT%H:%M) --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,State,Start,End,CPUTime,MaxVMSize -u <username>

List detailed job information:

scontrol show -dd jobid=<jobid>

Manage jobs

To cancel/stop a job:

scancel <jobid>

Gif sbatch

To cancel all jobs for a user:

scancel -u <username>

To cancel all pending jobs for a user:

scancel -t PENDING -u <username>

Execution parameters

These parameters are common to the commands srun and sbatch.

Parameters for log

#!/bin/bash
#SBATCH -o slurm.%N.%j.out  # STDOUT file with the Node name and the Job ID
#SBATCH -e slurm.%N.%j.err  # STDERR file with the Node name and the Job ID

Useful parameters

srun	salloc	sbatch	Description
`-p`	`-p`	`--partition`	Request a specific partition for the resource allocation. Like rpbs, ipop-up, cmpli.
`-mem`	`-mem`	`--mem`	Minimum amount of RAM (exp : 50G for 50 GB of RAM), 2GB by default.
`-c`	`-c`	`--cpus-per-task`	Number of cpus required (per task) 1 by default.
`-w`	`-w`	`--nodelist`	Request a specific list of hosts (exp : cpu-node[130-132]).
`-n`	`-n`	`--ntasks`	Number of tasks to run in this jobs (exp : mpi jobs), 1 by default.
`--exclusive`	`--exclusive`	`--exclusive`	Get an entire node for your jobs. Note that if you don't specify `--mem` you will have an entire node but 2GB of RAM.
`--gres=gpu:1`	`--gres=gpu:1`	`--gres=gpu:1`	Request 1 GPU for this job.

Remark:

You can use the variable $SLURM_MEM_PER_NODE in the command line to synchronize the software settings and the resource allocated.
You can use the variable $SLURM_CPUS_PER_TASK in the command line to avoid mistake between the resource allocated and the job.
A lot more parameters exist within slurm, make sure to check the official documentation.

Use the NVIDIA A100 GPU

Nvidia A100 are some really powerful card. In order to use them at their full potential we enable the Multi-Instance GPU (MIG) of those card. MIG allow the slurm user to request only a part of the GPU called a slice. Their is 7 slice available per GPU or 7G. We created slices of 1G, 2G, 4G and 7G. You can access them by specifying then withing your gres parameters.

# Get a full A100
--gres=gpu:a100_7g.80gb:1

# Get 4 slices
--gres=gpu:a100_4g.40gb:1

# Get 2 slices
--gres=gpu:a100_2g.20gb:1

# Get 1 slice
--gres=gpu:a100_1g.20gb:1

Slurm user guide

Slurm account specification

Slurm partitions and nodes

Submit jobs to the cluster

Submit a job using srun

Usage

Submit a job using salloc

Usage

Submit a job using sbatch

Usage

Batch scripts rules

Check running jobs informations

Manage jobs

Execution parameters

Parameters for log

Useful parameters

Use the NVIDIA A100 GPU

Submit a job using `srun`

Submit a job using `salloc`

Submit a job using `sbatch`