Slurm user guide

SLURM is the queue manager used on the RPBS cluster. You must use SLURM to submit jobs to the cluster.

Slurm account specification

Slurm account information is based upon four parameters: "cluster", "user", "account", "partition" that form what is referred as an association. Depending on your lab or scientific projects, you will have access to different partitions and accounts. Partitions refer to a group of nodes which are charaterized by their entity and accounts refer to the scientific projects you are involved in. The ipop-up partition is accessible to the whole iPOP-UP community.

You can check your user's associations with :

sacctmgr show user $USER withassoc

Gif sinfo 1

It is good practice to always specify the partition and the account that will be used when launching your jobs. You can also set your default project account with :

sacctmgr update user $USER set defaultaccount=<project-name>

You don't need to specify the cluster as there is presently only one cluster to submit jobs to (production).

SLURM partitions and nodes

The RPBS cluster is organized into several SLURM partitions. Each partition gathers a set of compute nodes that have similar usage.

To view all partitions available on the cluster run :

sinfo

Gif sinfo 1

The most common states are :

  • IDLE : The node is not allocated to any jobs and is available for use.

  • MIXED : The node has some of its resources ALLOCATED while others are IDLE.

  • DRAINED : The node is unavailable for use per system administrator request.

For a complete liste of all possible states, see this page

To view only available nodes run :

sinfo -Nl

Gif sinfo 1

Submit jobs to the cluster

You can use mutilple commands to submit a job to the cluster:

srun to run jobs interactively in real time salloc to run job interactively when resources are available sbatch to submit a batch job when resources are available

Submit a job using srun

To learn more about the srun command, see the official documentation

Usage

The job will start immediately after you execute the srun command. The outputs are returned to the terminal. You have to wait until the job has terminated before starting a new job. This works with ANY command.

Example:

srun hostname

Gif srun_h

Example if an interaction is needed:

module load r
srun --mem 20GB --pty R

Gif srun_R

In this example:

  • --pty: will keep the interaction possible
  • --mem 20GB: will allow 20GB of memory to your job instead of the 2GB by default

Submit a job using salloc

To learn more about the salloc command, see the official documentation

Usage

The job starts when resources are available. The outputs are returned to the terminal. This works with ANY command.

Example:

salloc hostname

Gif srun_h

Submit a job using sbatch

To learn more about the sbatch command, see the official documentation

Usage

The job starts when resources are available. The command only returns the job id. The outputs are sent to file(s). This works ONLY with shell scripts. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.

Gif sbatch

Batch scripts rules

The script can contain srun commands. Each srun is a job step. The script must start with shebang (#!) followed by the path of the interpreter

#!/bin/bash
#!/usr/bin/env python

The execution parameters can be set:

At runtime in the command sbatch

sbatch --mem=40GB bowtie2.sbatch

Or within the shell bowtie2.sbatch itself

#!/bin/bash
#
#SBATCH --mem 40GB
srun bowtie2 -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam
sbatch bowtie2.sbatch

The scripts can contain slurm options just after the shebang but before the script commands → #SBATCH

Note: that the syntax #SBATCH is important and doesn't contain any ! (as in the Shebang)

Advice: We recommend to set as many parameters as you can in the script to keep a track of your execution parameters for a future submission.

Check running jobs informations

List a user's current jobs:

squeue -u <username>

Gif sbatch

The different status are :

  • PENDING : PD

  • RUNNING : R

  • COMPLETED : C

If your job is not displayed, your job is finished (with success or with error).

List a user's running jobs:

squeue -u <username> -t RUNNING

List a user's pending jobs:

squeue -u <username> -t PENDING

View accounting information for all user's job for the current day :

sacct --format=JobID,JobName,User,Submit,ReqCPUS,ReqMem,Start,NodeList,State,CPUTime,MaxVMSize%15 -u <username>

View accounting information for all user's job for the 2 last days (it worth an alias) :

sacct -a -S $(date --date='2 days ago' +%Y-%m-%dT%H:%M) --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,State,Start,End,CPUTime,MaxVMSize -u <username>

List detailed job information:

scontrol show -dd jobid=<jobid>

Manage jobs

To cancel/stop a job:

scancel <jobid>

Gif sbatch

To cancel all jobs for a user:

scancel -u <username>

To cancel all pending jobs for a user:

scancel -t PENDING -u <username>

Execution parameters

These parameters are common to the commands srun and sbatch.

Parameters for log

#!/bin/bash
#SBATCH -o slurm.%N.%j.out  # STDOUT file with the Node name and the Job ID
#SBATCH -e slurm.%N.%j.err  # STDERR file with the Node name and the Job ID

Usefull parameters

srun salloc sbatch Description
-p -p --partition Request a specific partition for the resource allocation. Like rpbs, ipop-up, cmpli.
-mem -mem --mem Minimum amount of RAM (exp : 50G for 50 GB of RAM), 2GB by default.
-c -c --cpus-per-task Number of cpus required (per task) 1 by default.
-w -w --nodelist Request a specific list of hosts (exp : cpu-node[130-132]).
-n -n --ntasks Number of tasks to run in this jobs (exp : mpi jobs), 1 by default.
--exclusive --exclusive --exclusive Get an entire node for your jobs. Note that if you don't specified -c, --mem you will have an entire node but use 1 cpu and 2GB of RAM
--gres=gpu:1 --gres=gpu:1 --gres=gpu:1 Request 1 GPU for this jobs

Remark :

  • You can use the variable $SLURM_MEM_PER_NODE in the command line to synchronize the software settings and the resource allocated.
  • You can use the variable $SLURM_CPUS_PER_TASK in the command line to avoid mistake between the resource allocated and the job.
  • A lot more parameters exist within slurm, make sure to check the official documentation.

Use the NVIDIA A100 GPU

Nvidia A100 are some really powerfull card. In order to use them at their full potential we enable the Multi-Instance GPU (MIG) of those card. MIG allow the slurm user to request only a part of the GPU called a slice. Their is 7 slice available per GPU or 7G. We created slices of 1G, 2G, 4G and 7G. You can access them by specifying then withing your gres parameters.

# Get a full A100
--gres=gpu:a100_7g.80gb:1

# Get 4 slices
--gres=gpu:a100_4g.40gb:1

# Get 2 slices
--gres=gpu:a100_2g.20gb:1

# Get 1 slice
--gres=gpu:a100_1g.20gb:1