Queueing System - S.L.U.R.M.

Our cluster runs S.L.U.R.M. workload manager for managing batch jobs. It is preferable to use this system for running long batch jobs as interactive calculations are less reliable and require more human work.

The queuing system give you access to computers owned by LCM, LTHC, LTHI, LINX and IC Faculty; sharing the computational resources among as many groups as possible will result in a more efficient use of the resources (including the electric power), you can take advantage of many more machines for your urgent calculations and get results faster. On the other hand, since the machines you are using are not always owned by your group, try to be as fair as possible and respect the needs of other users.

We have configured the system with almost no restriction to access and capabilities because the queuing system can make a more efficient use of the cluster if it does not have to satisfy too many constraints. we are currently using only some constraints:

  1. number of CPU/cores: you must indicate the correct number of cores you're going to use;
  2. Megabytes/Gigabytes of RAM your jobs need to use;
  3. Time for the execution: if your job is not completed by the indicated time, it will be automatically terminated;

here we provide just a fast and dirty guide for the most basic commands/tasks that you're going to use for the day to daily activities, you can find better and more complete guides on how to use S.L.U.R.M. control commands on internet; e.g:

partitions (a.k.a. queues)

If you used other types of cluster management, you will already known the term “queue” to identify the type of computers (nodes) or programs (jobs) you want to use. in S.L.U.R.M. notation, queues are called partitions. The two terms are used to indicate the same entity, even if they are not quite the same.

Mini User Guide

The most used/needed commands are:

  1. squeue for checking the status of the partitions or of your running jobs
  2. sbatch or srun for submitting your jobs
  3. scancel for deleting a running or waiting job
  4. sinfo to discover the availability of nodes and partitions
$ sinfo
PARTITION      AVAIL  TIMELIMIT  NODES  STATE NODELIST
slurm-cluster*    up   infinite      6   idle iscpc88,node[01-02,05,10-11]
slurm-ws          up    1:00:00      4  down* iscpc[85-87,90]
slurm-ws          up    1:00:00      2   idle iscpc[14-15]
$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               550 slurm-clu  sheepit    damir PD       0:00      1 (Resources)
               551 slurm-clu script.s rmarino1 PD       0:00      1 (Priority)
               549 slurm-clu  sheepit    damir  R      11:13      1 node05
               548 slurm-clu  sheepit    damir  R      11:25      1 iscpc88

here you can see that the command provides the ID of the jobs, the PARTITION used to run the jobs (hence the nodes where these jobs will run), the NAME assigned to the jobs, the name of the USER that submitted the jobs, the STATUS of the job (R=Run, PD=Waiting), the execution TIME and the nodes where the jobs are actually running (or the reason why they wait in the queue).

Once a job is submitted (and accepted by the cluster), you'll receive the ID assigned to the job:

$ sbatch sheepit.slurm 
Submitted batch job 552
$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               552 slurm-clu  sheepit    damir PD       0:00      1 (Priority)
               550 slurm-clu  sheepit    damir PD       0:00      1 (Resources)
               551 slurm-clu script.s rmarino1 PD       0:00      1 (Priority)
               549 slurm-clu  sheepit    damir  R      35:54      1 node05
               548 slurm-clu  sheepit    damir  R      36:06      1 iscpc88
$ scancel 552
$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               550 slurm-clu  sheepit    damir PD       0:00      1 (Resources)
               551 slurm-clu script.s rmarino1 PD       0:00      1 (Priority)
               549 slurm-clu  sheepit    damir  R      36:05      1 node05
               548 slurm-clu  sheepit    damir  R      36:17      1 iscpc88

Scripts (used with sbatch)

It is convenient to write the job script in a file not only because in this way the script can be reused, but also because it is also possible to set sbatch options directly inside the script as in the following example (that shows the content of the file sheepit.slurm):

$ cat sheepit.slurm 
#!/bin/bash

#SBATCH --job-name=sheepit
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --time=4:00:00
#SBATCH --mem=16G
#SBATCH --mail-user=damir.laurenzi@epfl.ch
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --error=sheepit/sheepit.%J.err
#SBATCH --output=sheepit/sheepit.%J.out
#SBATCH --partition slurm-cluster
#SBATCH --gres=gpu:1
#SBATCH --constraint=opteron61


echo "$(hostname) $(date)"

cd ${HOME}/sheepit
srun sleep 60
echo "$(hostname) $(date)"


<note> At the beginning of the file, you can read the line #!/bin/bash that is not strictly necessary. It turns out that it's common practice to identify the slurm script as bash scripts so they can be executed also outside of the cluster. in this case the '#SBATCH' lines are interpreted as comments. </note> Inside a script, all the line that starts with the '#' char are comment, but the lines that start with the '#SBATCH' string, are directives for the queuing system. The example above instruct the queuing system to:

At the moment we have defined these resources:

and these properties/constraints:

<note> Please pay attention that resources aren't the same as properties and the two must be indicated using different parameters inside the scripts:

</note>

<note important> It is mandatory to specify at least the estimated run time of the job and the memory needed, so the scheduler can optimize the nodes/cores/memory usage and the overall cluster throughput. If your job will pass the limits you fixed, it will be automatically killed by the cluster manager.

Please keep in mind that longer jobs are less likely to enter the queue when the cluster load is high. Therefore, don't be lazy and do not always ask for infinite run time because your job will remain stuck in the queue. </note>

Here you can find some useful sbatch script that can be used as starting point

Script Execute with
Base example script contains most of the useful optionssbatch [sbatch options] base.slurm
Script example for running matlab computationssbatch [sbatch options] matlab.slurm
Script example for running Mathematica computationssbatch [sbatch options] mathematica.slurm
Script example for windows programs (executed under wine)sbatch [sbatch options] wine.slurm


The shell running the sbatch script will have access to various variables that might be useful, you can find here a complete list.

See the man page for more details.

Tips and Tricks

Delete all queued jobs

squeue -u $(whoami) -h -t RUNNING | awk '{print $1};' | while read a ; do scancel ${a} ; done