slurm-dummies
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
slurm-dummies [2020/05/09 16:54] – admin | slurm-dummies [2023/10/09 15:17] (current) – admin | ||
---|---|---|---|
Line 12: | Line 12: | ||
# SBATCH...... : this is a comment\\ | # SBATCH...... : this is a comment\\ | ||
- | ====The mandatory==== | + | ====The mandatory |
| | ||
- Your email address. the official epfl address or another, but valid (worldwide), | - Your email address. the official epfl address or another, but valid (worldwide), | ||
- How much time your job must run (if the job runs over this limit the cluster manager will kill it). the minimum is 1 minute and there' | - How much time your job must run (if the job runs over this limit the cluster manager will kill it). the minimum is 1 minute and there' | ||
- | - How much memory (RAM) your job will use. Please remember that if your job use more memory than the limit you put here, then the cluster manager will kill the job. the minimum is 512 Mbyte, currently (as for Feb 2020) the maximum is 64 Gbyte. | + | - How much memory (RAM) your job will use. Please remember that if your job use more memory than the limit you put here, then the cluster manager will kill the job. the minimum is 512 Mbyte, currently (as for Feb 2020) the maximum is 250 Gbyte. |
- How many nodes (computers) you're going to use with your script. | - How many nodes (computers) you're going to use with your script. | ||
- How many cores/cpu must be reserved for your job. If you don't include this parameter only one core/cpu will be assigned to your job and you cannot run more than a single threaded job. | - How many cores/cpu must be reserved for your job. If you don't include this parameter only one core/cpu will be assigned to your job and you cannot run more than a single threaded job. | ||
- | - **the name of the queue/ | + | - **the name of the queue/ |
==== partitions (a.k.a. queues) ==== | ==== partitions (a.k.a. queues) ==== | ||
If you used other types of cluster management, you will already known the term '' | If you used other types of cluster management, you will already known the term '' | ||
+ | The ' | ||
+ | - slurm-cluster: | ||
+ | - slurm-gpu: this includes computers that have a gpu (nvidia, mostly) that can be used for HPC. | ||
+ | - slurm-ws: this includes all the workstations that are sitting under your desks, programs that run a very shor time (1 hour top) can take advantage of the workstation cpus not used by the users. | ||
The beginning of your script will be: | The beginning of your script will be: | ||
Line 33: | Line 37: | ||
#SBATCH --mem=1G | #SBATCH --mem=1G | ||
</ | </ | ||
- | If your job is running a simulation that is multi-threaded, | + | If your job is running a simulation that is multi-threaded |
< | < | ||
#Numer of cores needed by the application (8 in this example) | #Numer of cores needed by the application (8 in this example) | ||
#SBATCH --cpus-per-task=8 | #SBATCH --cpus-per-task=8 | ||
- | #and of course | + | #and the number of nodes (physical computers) your program is supposed to use (you need at least 1) |
#SBATCH --nodes=1 | #SBATCH --nodes=1 | ||
</ | </ | ||
- | After this //prolog// you can add directives for instructing the system about the messages you | + | After this //prolog//, you can add directives for instructing the system about the messages you |
want to receive: | want to receive: | ||
Line 50: | Line 54: | ||
</ | </ | ||
- | Also you can tell the SLURM where you want to put the output and errors messages.\\ | + | You can also tell SLURM where you want to put the output and errors messages.\\ |
By default the cluster will put the output and errors messages in 2 separate files (<name of the job> | By default the cluster will put the output and errors messages in 2 separate files (<name of the job> | ||
Line 66: | Line 70: | ||
</ | </ | ||
- | Another mandatory parameter is the queue (called partition in SLURM terminology) you want to use: at the moment we have only the queue '' | + | Another mandatory parameter is the queue (called partition in SLURM terminology) you want to use: to start always use the queue '' |
< | < | ||
# queue to be used | # queue to be used | ||
Line 91: | Line 95: | ||
</ | </ | ||
- | It's better to use the command srun to launch the executable command (just prefix srun to you normal command line), so SLURM can better manage the scheduling of the jobs | + | It's better to use the command srun to launch the executable command (just prefix srun to you normal command line), so SLURM can better manage the scheduling of the jobs. |
+ | the use of '' | ||
< | < | ||
Line 108: | Line 113: | ||
#SBATCH --mail-user=dummy.epfl@epfl.ch | #SBATCH --mail-user=dummy.epfl@epfl.ch | ||
#SBATCH --time=04: | #SBATCH --time=04: | ||
- | #SBATCH --mem=1024mb | + | #SBATCH --mem=1024M |
#SBATCH --cpu-per-task=8 | #SBATCH --cpu-per-task=8 | ||
#SBATCH --mail-type=begin | #SBATCH --mail-type=begin | ||
Line 124: | Line 129: | ||
</ | </ | ||
- | Now you just need to tell the cluster system that you want to run this job, but how you do that? pretty simple, you use the command sbatch | + | Now you just need to tell the cluster system that you want to run this job, but how you do that? pretty simple, you use the command |
< | < |
slurm-dummies.1589036087.txt.gz · Last modified: 2020/05/09 16:54 by admin