This is an old revision of the document!

Batch Queuing System

for Dummies

1st thing you need to know: Using a pbs script it's like if you're typing the commands from a shell. Therefore if after the login you need to change directory to launch the job from the shell, you'll need to do it even from the script.

Every instruction line for the queue manager start with #PBS, so
#PBS …… : this is a directive for the cluster and a comment for the shell
##PBS ….. : this is a comment
# PBS…… : this is a comment for the shell, NOT a directive for the cluster

The mandatory directives that you must always include in the scripts are:

Your email address. the official epfl address or something else, but must be a valid (worldwide) email address. This address mus be always present, no matter if you instruct the system to send or not the email messages.
How much time your job must run (if the job runs over this limit the cluster manager will kill it). the minimum is 1 minute and there's no maximum limit.
How much memory (RAM) your job will use. Please remember that if your job use more memory than the limit you put here the cluster manager will kill the job. the minimum is 512 Mbyte, currently (as for Dec 2010) the maximum is ~64 Gbyte.

The beginning of your script will be:

# you email address
#PBS -M <my email address that everyone can use to send email messages to me>
# how much time this process must run (hours:minutes:seconds)? 4 hours for this example
#PBS -l cput=04:00:00
# how much memory it needs ? 1 GB for the example
#PBS -l mem=1024mb

After this prolog you can add directives for instructing the system about the messages you want to receive:

# this line instruct the PBS to send a mail when the job: (b) start (e) finish
#PBS -m be
# you can substitute the previous directive with the following.
# this line means: do not send email messages
#PBS -m n

Also you can tell the PBS where you want to put the output and errors messages. By default the cluster will put the output and errors messages in 2 separate files (<name of the job>.e<jobID> for errors and <name of the job>.o<jobID> for the output), but maybe you prefer to have all these messages in one single file (<name of the job>.o<jobID>)

#Output and Error streams are redirected to a single stream (output file)
#PBS -j oe

And then you want to assign a name to your job, so you will know what the cluster is doing for you when you look at the list of running jobs (using the command qstat -an1).

#Name of the job
#PBS -N exit_coupled

Now you can start the bash shell script commands :

# go to the directory where your job is
cd $HOME/.....

# print the name of the machine this job is running on and when the
# process start/finish (useful information during tests and initial debug)
echo "executed on $HOSTNAME"
echo "execution started at:  $(date)"

./name of the program and parameters you want to launch

echo "execution finished at: $(date)"

Another thing to remember is that the output files (the …o<jobID> and ….e<jobID>) created by the PBS system are placed inside the directory from where you submitted the job, not inside the directory from where the program is launched by the script (in other words all the “cd …” directives inside the script aren't considered by the queue manager).

Once we attach all the lines from above we'll have a script that will look like this:

#PBS -M <my email address that everyone can use to send emails>
#PBS -l cput=04:00:00
#PBS -l mem=1024mb
# you want to receive an email messages when your job is started and when it's
finished (or blocked)
#PBS -m be
# all the messages (output and errors) must go in a single file
#PBS -j oe
# the name you want to assign to this job
#PBS -N exit_coupled_test


cd $HOME/.....
echo "executed on $HOSTNAME"
echo "execution started at:  $(date)"

./name of the program and parameters you want to launch

echo "execution finished at: $(date)"

Now you just need to tell the cluster system that you want to run this job, but how you do that? pretty simple, you use the command qsub (short for queue submit) followed by the name of the script you just created. If you saved the previous example script as test1.pbs in the current directory, you will want to launch this command from the shell:

$ qsub test1.pbs

<note> If you like, you can use the absolute path to indicate the script to launch, but remember that the output files will be written inside the directory from where you executed the qsub program. </note>

After all this work, you just need to relax and wait until you receive the email messages from the queuing manager. At this point you return to the directory where the output files are saved and check the results.
If you browse the the documentation we have on Batch Queuing System you'll find examples on how to use Matlab or Mathematica and some explanation about the directives and the commands available for the queuing system.