User Tools

Site Tools


1slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
1slurm [2020/04/02 09:25] admin1slurm [2020/06/26 10:38] (current) admin
Line 1: Line 1:
  
-====== Queuing System - S.L.U.R.M. ======+====== Queueing System - S.L.U.R.M. ======
 Our cluster runs [[https://slurm.schedmd.com| S.L.U.R.M.]] workload manager for managing batch jobs. It is **preferable** to use this system for running long batch jobs as interactive calculations are less reliable and require more human work. Our cluster runs [[https://slurm.schedmd.com| S.L.U.R.M.]] workload manager for managing batch jobs. It is **preferable** to use this system for running long batch jobs as interactive calculations are less reliable and require more human work.
  
-The queuing system give you access to computers owned by LCM, LTHC, LTHI, LINX and IC Faculty; sharing the computational resources among as many groups as possible will result in a more efficient use of the resources (including the electric power)+The queuing system give you access to computers owned by LCM, LTHC, LTHI, LINX and IC Faculty; sharing the computational resources among as many groups as possible will result in a more efficient use of the resources (including the electric power)you can take advantage of many more machines for your urgent calculations and get results faster. 
-As user you can take advantage of many more machines for your urgent calculations and get results faster. On the other hand, since the machines your are using are not always owned by your group, try to be as fair as possible and respect the needs of other users. +On the other hand, since the machines you are using are not always owned by your group, try to be as fair as possible and respect the needs of other users. 
  
-We have configured the system almost without access restriction because the queuing system can make a more efficient use of the cluster if it does not have to satisfy too many constraints. we are currently using only two constraints:+We have configured the system with almost no restriction to access and capabilities because the queuing system can make a more efficient use of the cluster if it does not have to satisfy too many constraints. we are currently using only some constraints:
   - number of CPU/cores: you must indicate the correct number of cores you're going to use;   - number of CPU/cores: you must indicate the correct number of cores you're going to use;
   - Megabytes/Gigabytes of RAM your jobs need to use;   - Megabytes/Gigabytes of RAM your jobs need to use;
   - Time for the execution: if your job is not completed by the indicated time, it will be automatically terminated;   - Time for the execution: if your job is not completed by the indicated time, it will be automatically terminated;
  
-you can find better and more complete guides on how to use S.L.U.R.M. control commands on internet; e.g:+here we provide just a fast and dirty guide for the most basic commands/tasks that you're going to use for the day to daily activities, you can find better and more complete guides on how to use S.L.U.R.M. control commands on internet; e.g:
   - [[https://slurm.schedmd.com/|Slurm Documentation]]   - [[https://slurm.schedmd.com/|Slurm Documentation]]
   - [[https://scitas-data.epfl.ch/confluence/display/DOC/FAQ#FAQ-BatchSystemQuestions|SCITAS Documentation]]   - [[https://scitas-data.epfl.ch/confluence/display/DOC/FAQ#FAQ-BatchSystemQuestions|SCITAS Documentation]]
   - [[https://slurm.schedmd.com/quickstart.html|Quick Start]]   - [[https://slurm.schedmd.com/quickstart.html|Quick Start]]
  
-here we provide just a fast and dirty guide for the most basic commands/tasks that you're going to use forthe day to day activities. 
  
 ==== partitions (a.k.a. queues) ==== ==== partitions (a.k.a. queues) ====
-If you used other types of cluster management, you will already known the term "queue" to identify the type of nodes/jobs you an use to submit your jobs to the clusters. in S.L.U.R.M. notation, queues are called **partitions**. The two terms are used to indicate the same entity and usage.+If you used other types of cluster management, you will already known the term "queue" to identify the type of computers (nodes) or programs (jobsyou want to use. in S.L.U.R.M. notation, ''queues'' are called **partitions**. The two terms are used to indicate the same entity, even if they are not quite the same.
  
 ===== Mini User Guide ===== ===== Mini User Guide =====
  
-The most used commands are:+The most used/needed commands are:
   - ''squeue'' for checking the status of the partitions or of your running jobs   - ''squeue'' for checking the status of the partitions or of your running jobs
   - ''sbatch'' or ''srun'' for submitting your jobs   - ''sbatch'' or ''srun'' for submitting your jobs
Line 29: Line 28:
   - ''sinfo'' to discover the availability of nodes and partitions   - ''sinfo'' to discover the availability of nodes and partitions
  
-  * ''sinfo'' show the list of partitions and nodes and their availability: here you can see that the default partition of the cluster is called "slurm-cluster' (the * indicate the default), the time limit imposed on he partitions and the nodes that are associated with them.+  * ''sinfo'' show the list of partitionsnodes and their availability: here you can see that the default partition of the cluster is called "slurm-cluster' (the * indicate the default), the time limit imposed on he partitions and the nodes that are associated with them and what is their activity status.
 <code> <code>
 $ sinfo $ sinfo
Line 49: Line 48:
 here you can see that the command provides the ID of the jobs, the PARTITION used to run the jobs (hence the nodes where these jobs will run), the NAME assigned to the jobs, the name of the USER that submitted the jobs, the STATUS of the job (R=Run, PD=Waiting), the execution TIME and the nodes where the jobs are actually running (or the reason why they wait in the queue). here you can see that the command provides the ID of the jobs, the PARTITION used to run the jobs (hence the nodes where these jobs will run), the NAME assigned to the jobs, the name of the USER that submitted the jobs, the STATUS of the job (R=Run, PD=Waiting), the execution TIME and the nodes where the jobs are actually running (or the reason why they wait in the queue).
  
-  * ''sbatch'' is used to submit and run jobs on the cluster. Jobs are nothing else than short scripts that contain some directive about the specific requests of the programs that need to be executed. The output of the program will be written by default in two files called xxx.out and xxx.err  respectively for standard output (any message that would be printed on the screen) and standard error (any error message that would be printed on the screen). ''xxx'' stands for the job id. You can change the output file names by setting the ''--output='' for standard output and ''--error='' for standard error. +  * ''sbatch'' is used to submit and run jobs on the cluster. Jobs are nothing else than short scripts that contain some directive about the specific requests of the programs that need to be executed. The output of the program will be written by default in two files called xxx.out and xxx.err  respectively for standard output (any message that would be printed on the screen) and standard error (any error message that would be printed on the screen). ''xxx'' stands for the job id. You can change the output file names by setting the directives ''--output='' for standard output and ''--error='' for standard error. 
-Once a job is submitted (and accepted by the cluster, you'll receive the ID assigned to the job:+Once a job is submitted (and accepted by the cluster), you'll receive the ID assigned to the job:
 <code> <code>
 $ sbatch sheepit.slurm  $ sbatch sheepit.slurm 
 Submitted batch job 552 Submitted batch job 552
 </code> </code>
-  * ''srun'' is used to launch immediately your program inside the cluster (interactive mode): it can be used interactivelybut it **must** be used inside the sbatch/slurm scripts, so the cluster always knows what resources are allocated.+  * ''srun'' is used to launch your program inside the cluster: it can be used to provide a an interactive session of a nodeor to launch parallel computer programs. used inside the sbatch/slurm scripts the cluster always knows what resources are allocated.
  
   * ''scancel'' is used to remove your job from the queue or to kill your program when it's already running on the cluster:   * ''scancel'' is used to remove your job from the queue or to kill your program when it's already running on the cluster:
Line 75: Line 74:
 </code> </code>
  
-=== Scripts (used with **sbatch**) ===+=== Scripts (used with sbatch) ===
  
-It is convenient to write the job script in a file not only because in this way the script can be reused, but also because it is also possible to set ''sbatch'' options directly inside the script as in the following example:+It is convenient to write the job script in a file not only because in this way the script can be reused, but also because it is also possible to set ''sbatch'' options directly inside the script as in the following example (that shows the content of the file sheepit.slurm):
 <code> <code>
 $ cat sheepit.slurm  $ cat sheepit.slurm 
Line 102: Line 101:
 srun sleep 60 srun sleep 60
 echo "$(hostname) $(date)" echo "$(hostname) $(date)"
 +
 +
  
 </code> </code>
  
 +<note>
 +At the beginning of the file, you can read the line ''#!/bin/bash'' that is not strictly necessary. It turns out that it's common practice to identify the slurm script as bash scripts so they can be executed also outside of the cluster. in this case the '#SBATCH' lines are interpreted as comments.
 +</note>
 Inside a script, all the line that starts with the '#' char are comment, but the lines that start with the '#SBATCH' string, are directives for the queuing system. Inside a script, all the line that starts with the '#' char are comment, but the lines that start with the '#SBATCH' string, are directives for the queuing system.
 The example above instruct the queuing system to: The example above instruct the queuing system to:
Line 111: Line 115:
   * ''#SBATCH --cpus-per-task=8'' inform the cluster that the program will need/use 8 cores to run   * ''#SBATCH --cpus-per-task=8'' inform the cluster that the program will need/use 8 cores to run
   * ''#SBATCH --time=4:00:00'' the job will run for 4 hours   * ''#SBATCH --time=4:00:00'' the job will run for 4 hours
-  * ''#SBATCH --mem=16G'' the job will require 16GB of RAM to be executed+  * ''#SBATCH --mem=16G'' the job will require 16GB of RAM to be executed. Different units can be specified using the suffixes [K|M|G|T]
   * ''#SBATCH --mail-*'' are parameters to indicate the email address that will receive the email messages from the cluster and when these messages are to be sent (when the job start and when the job ends)   * ''#SBATCH --mail-*'' are parameters to indicate the email address that will receive the email messages from the cluster and when these messages are to be sent (when the job start and when the job ends)
-  * ''#SBATCH --error'', ''#SBATCH --output'' these two directives indicate where to write the messages from the program+  * ''#SBATCH --error'', ''#SBATCH --output'' these two directives indicate where to write the messages from the program(s) you execute
   * ''#SBATCH --partition'' indicate the partition that must be used to run the program   * ''#SBATCH --partition'' indicate the partition that must be used to run the program
   * ''#SBATCH --gres=gpu:1'' this parameter inform the cluster that the program must be run only inside nodes that provides the "gpu" resource and that **1** of these resources is needed.   * ''#SBATCH --gres=gpu:1'' this parameter inform the cluster that the program must be run only inside nodes that provides the "gpu" resource and that **1** of these resources is needed.
-  * ''#SBATCH --constraint='' this directive indicate the the program must be run **only** on those nodes that provide this property+  * ''#SBATCH --constraint='' this directive indicate the the program must be run **only** on those nodes that provide this property. constraints can be combined using AND, OR or combinations (as in --constraint="intel&gpu" or --constraint="intel|amd"). it's better to refer to the {{https://slurm.schedmd.com/sbatch.html|sbatch manual}} to better understand the possibilities.
  
 At the moment we have defined these resources: At the moment we have defined these resources:
Line 126: Line 130:
     * ''mathematica'' nodes that can run Mathematica simulations     * ''mathematica'' nodes that can run Mathematica simulations
     * ''tensorflow'' nodes that can run tensorflow 2.x     * ''tensorflow'' nodes that can run tensorflow 2.x
-    * ''xeon26, xeon41, xeon56'' nodes that have different version of Intel Xeon CPUs+    * ''xeon26''''xeon41''''xeon56'' nodes that have different version of Intel Xeon CPUs 
 +    * ''epyc7302'' nodes that provide the AMD Epyc CPUs
     * ''gpu'' nodes that provide GPU capabilities     * ''gpu'' nodes that provide GPU capabilities
  
Line 136: Line 141:
  
 <note important> <note important>
-It is **mandatory** to specify at least the estimated run time of the job and the memory needed by so that the scheduler can optimize the machines usage and the overall cluster throughput. If your job will pass the limits you fixed, it will be automatically killed by the cluster manager.+It is **mandatory** to specify at least the estimated run time of the job and the memory neededso the scheduler can optimize the nodes/cores/memory usage and the overall cluster throughput. If your job will pass the limits you fixed, it will be automatically killed by the cluster manager.
  
 Please keep in mind that longer jobs are less likely to enter the queue when the cluster load is high. Therefore, don't be lazy and do not always ask for //infinite// run time because your job will remain stuck in the queue. Please keep in mind that longer jobs are less likely to enter the queue when the cluster load is high. Therefore, don't be lazy and do not always ask for //infinite// run time because your job will remain stuck in the queue.
Line 159: Line 164:
 <code> <code>
  
-scancel -u $(whoami) -h -t RUNNING | awk '{print $1};' | while read a ; do scancel ${a} ; done +squeue -u $(whoami) -h -t RUNNING | awk '{print $1};' | while read a ; do scancel ${a} ; done 
 </code> </code>
  
  
1slurm.1585819533.txt.gz · Last modified: 2020/04/02 09:25 by admin