Slurm user guide
SLURM is the queue manager used on the RPBS cluster. You must use SLURM to submit jobs to the cluster.
Slurm account specification
Slurm account information is based upon four parameters: "cluster", "user", "account", "partition" that form what is referred as an association. Depending on your lab or scientific projects, you will have access to different partitions and accounts. Partitions refer to a group of nodes which are charaterized by their entity and accounts refer to the scientific projects you are involved in. The ipop-up partition is accessible to the whole iPOP-UP community.
You can check your user's associations with :
sacctmgr show user $USER withassoc
It is good practice to always specify the partition and the account that will be used when launching your jobs. You can also set your default project account with :
sacctmgr update user $USER set defaultaccount=<project-name>
You don't need to specify the cluster as there is presently only one cluster to submit jobs to (production).
SLURM partitions and nodes
The RPBS cluster is organized into several SLURM partitions. Each partition gathers a set of compute nodes that have similar usage.
To view all partitions available on the cluster run :
sinfo
The most common states are :
-
IDLE : The node is not allocated to any jobs and is available for use.
-
MIXED : The node has some of its resources ALLOCATED while others are IDLE.
-
DRAINED : The node is unavailable for use per system administrator request.
For a complete liste of all possible states, see this page
To view only available nodes run :
sinfo -Nl
Submit jobs to the cluster
You can use mutilple commands to submit a job to the cluster:
srun
to run jobs interactively in real time
salloc
to run job interactively when resources are available
sbatch
to submit a batch job when resources are available
Submit a job using srun
To learn more about the srun
command, see the official documentation
Usage
The job will start immediately after you execute the srun command. The outputs are returned to the terminal. You have to wait until the job has terminated before starting a new job. This works with ANY command.
Example:
srun hostname
Example if an interaction is needed:
module load r
srun --mem 20GB --pty R
In this example:
--pty
: will keep the interaction possible--mem 20GB
: will allow 20GB of memory to your job instead of the 2GB by default
Submit a job using salloc
To learn more about the salloc
command, see the official documentation
Usage
The job starts when resources are available. The outputs are returned to the terminal. This works with ANY command.
Example:
salloc hostname
Submit a job using sbatch
To learn more about the sbatch
command, see the official documentation
Usage
The job starts when resources are available. The command only returns the job id. The outputs are sent to file(s). This works ONLY with shell scripts. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.
Batch scripts rules
The script can contain srun commands. Each srun is a job step. The script must start with shebang (#!) followed by the path of the interpreter
#!/bin/bash
#!/usr/bin/env python
The execution parameters can be set:
At runtime in the command sbatch
sbatch --mem=40GB bowtie2.sbatch
Or within the shell bowtie2.sbatch
itself
#!/bin/bash
#
#SBATCH --mem 40GB
srun bowtie2 -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam
sbatch bowtie2.sbatch
The scripts can contain slurm options just after the shebang but before the script commands → #SBATCH
Note: that the syntax #SBATCH
is important and doesn't contain any !
(as in the Shebang)
Advice: We recommend to set as many parameters as you can in the script to keep a track of your execution parameters for a future submission.
Check running jobs informations
List a user's current jobs:
squeue -u <username>
The different status are :
-
PENDING : PD
-
RUNNING : R
-
COMPLETED : C
If your job is not displayed, your job is finished (with success or with error).
List a user's running jobs:
squeue -u <username> -t RUNNING
List a user's pending jobs:
squeue -u <username> -t PENDING
View accounting information for all user's job for the current day :
sacct --format=JobID,JobName,User,Submit,ReqCPUS,ReqMem,Start,NodeList,State,CPUTime,MaxVMSize%15 -u <username>
View accounting information for all user's job for the 2 last days (it worth an alias) :
sacct -a -S $(date --date='2 days ago' +%Y-%m-%dT%H:%M) --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,State,Start,End,CPUTime,MaxVMSize -u <username>
List detailed job information:
scontrol show -dd jobid=<jobid>
Manage jobs
To cancel/stop a job:
scancel <jobid>
To cancel all jobs for a user:
scancel -u <username>
To cancel all pending jobs for a user:
scancel -t PENDING -u <username>
Execution parameters
These parameters are common to the commands srun
and sbatch
.
Parameters for log
#!/bin/bash
#SBATCH -o slurm.%N.%j.out # STDOUT file with the Node name and the Job ID
#SBATCH -e slurm.%N.%j.err # STDERR file with the Node name and the Job ID
Usefull parameters
srun | salloc | sbatch | Description |
---|---|---|---|
-p |
-p |
--partition |
Request a specific partition for the resource allocation. Like rpbs, ipop-up, cmpli. |
-mem |
-mem |
--mem |
Minimum amount of RAM (exp : 50G for 50 GB of RAM), 2GB by default. |
-c |
-c |
--cpus-per-task |
Number of cpus required (per task) 1 by default. |
-w |
-w |
--nodelist |
Request a specific list of hosts (exp : cpu-node[130-132]). |
-n |
-n |
--ntasks |
Number of tasks to run in this jobs (exp : mpi jobs), 1 by default. |
--exclusive |
--exclusive |
--exclusive |
Get an entire node for your jobs. Note that if you don't specified -c , --mem you will have an entire node but use 1 cpu and 2GB of RAM |
--gres=gpu:1 |
--gres=gpu:1 |
--gres=gpu:1 |
Request 1 GPU for this jobs |
Remark :
- You can use the variable
$SLURM_MEM_PER_NODE
in the command line to synchronize the software settings and the resource allocated. - You can use the variable
$SLURM_CPUS_PER_TASK
in the command line to avoid mistake between the resource allocated and the job. - A lot more parameters exist within slurm, make sure to check the official documentation.
Use the NVIDIA A100 GPU
Nvidia A100 are some really powerfull card. In order to use them at their full potential we enable the Multi-Instance GPU (MIG) of those card.
MIG allow the slurm user to request only a part of the GPU called a slice. Their is 7 slice available per GPU or 7G. We created slices of 1G, 2G, 4G and 7G.
You can access them by specifying then withing your gres
parameters.
# Get a full A100
--gres=gpu:a100_7g.80gb:1
# Get 4 slices
--gres=gpu:a100_4g.40gb:1
# Get 2 slices
--gres=gpu:a100_2g.20gb:1
# Get 1 slice
--gres=gpu:a100_1g.20gb:1