User Tools

Site Tools


sge

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
sge [2009/08/12 11:11] cangianisge [2011/01/26 09:48] damir
Line 124: Line 124:
     * ''nodes'' for giving a list of nodes (hostnames or //properties//) to consider.     * ''nodes'' for giving a list of nodes (hostnames or //properties//) to consider.
 The properties available on the various nodes can be listed with the ''pbsnodes -a'' command.\\ The properties available on the various nodes can be listed with the ''pbsnodes -a'' command.\\
-For the moment we have defined only the properties:+For the moment we have defined these properties:
     * ''bit64'' on 64 bit machines.     * ''bit64'' on 64 bit machines.
     * ''bit32'' on 32 bit machines (mainly needed because of matlab).     * ''bit32'' on 32 bit machines (mainly needed because of matlab).
     * ''matlab'' for nodes that can launch matlab simulations.     * ''matlab'' for nodes that can launch matlab simulations.
     * ''mathematica'' for nodes that can launch Mathematica simulations; follow [[sge:mathematica_batch|How to generate Mathematica scripts]], if you need an hint.     * ''mathematica'' for nodes that can launch Mathematica simulations; follow [[sge:mathematica_batch|How to generate Mathematica scripts]], if you need an hint.
 +    * ''magma'' for [[http://magma.maths.usyd.edu.au/|MAGMA]] Computational Algebra System
 +    * ''cuda'' for nodes with CUDA Hardware.
     * ''f10'' for nodes with Linux Fedora 10 installed.     * ''f10'' for nodes with Linux Fedora 10 installed.
 +    * ''f12'' for nodes with Linux Fedora 12 installed.
 Example **qsub -l nodes=1:bit64** (the string ''1:'' is mandatory and means: //I need at least one node with property 64bit//). To specify more than one property use the colon ":" to separate the properties. a job that require 64 bit cpu and matlab should be called using **qsub -l nodes=1:bit64:matlab <name of the pbs script>**. Example **qsub -l nodes=1:bit64** (the string ''1:'' is mandatory and means: //I need at least one node with property 64bit//). To specify more than one property use the colon ":" to separate the properties. a job that require 64 bit cpu and matlab should be called using **qsub -l nodes=1:bit64:matlab <name of the pbs script>**.
  
Line 146: Line 149:
 Here you can find some useful pbs script that can be used as starting point Here you can find some useful pbs script that can be used as starting point
 ^Script  ^  Execute with  ^ ^Script  ^  Execute with  ^
-|{{base.pbs|Script example that send mail messages when the program start/end running}}|qsub [qsub options] base.pbs| +|{{base2.pbs|Base example script}} contains most of the useful options|qsub [qsub options] base.pbs|
-|{{nomail.pbs|base Script example}}|qsub [qsub options] nomail.pbs|+
 |{{matlab.pbs|Script example for running matlab computations}}|qsub -l nodes=1:matlab [qsub options] matlab.pbs| |{{matlab.pbs|Script example for running matlab computations}}|qsub -l nodes=1:matlab [qsub options] matlab.pbs|
 |{{mathematica.pbs|Script example for running Mathematica computations}}|qsub [qsub options] mathematica.pbs| |{{mathematica.pbs|Script example for running Mathematica computations}}|qsub [qsub options] mathematica.pbs|
Line 154: Line 156:
 \\ \\
  
 +The shell running the pbs script will have access to various variables that might be usefull:
 +  * ''PBS_O_WORKDIR'' : the directory where the qsub command was issued
 +  * ''PBS_QUEUE    '' : the actual queue where the job is running
 +  * ''PBS_JOBID    '' : the internal job identification name
 +  * ''PBS_JOBNAME  '' : the name of the job. Unless specified with the -N option, this is usually the name of the pbs script or STDIN
 +  * ''HOSTNAME     '' : the name of the machine where the job is running
 + 
 See the man page for more details.  See the man page for more details. 
 +
 +==== Making your script cross platform ====
 +Presently, we have both 32 and 64 bit compute nodes. In principle, 64 bit nodes can run 32 bit code out of the box. In reality, there might be problems due to missing or incompatible library.
 +An easy solution for taking advantage both of the full set of machines and also of the optimized 64 code on 64 bit machines is the following (suggested by Masoud):
 +
 +  - Compile two version of your code (32 and 64 bit);
 +  - name the two executables 32 and 64 bit as ''WHATEVER.i686'' and ''WHATEVER.x86_64'' respectively (replace ''WHATEVER'' with what you want);
 +  - in your pbs script use ''./WHATEVER.`arch`'' to select the good executable and run it. 
 +
 +If your workstation is a 32bit machine, then you can compile the 64 bit version of your code on ''iscsrv13''
 +
  
 ==== qdel ==== ==== qdel ====
Line 175: Line 195:
  
 There is a bug in pbs that appears some time when the server would like to stop a running job but the node where the job is running does not respond (e.g. it did crash). When this happens, the server starts to send you a lot o identical mail messages telling you that it had to kill your job because it exceeded the time limit. If you start to receive the same message over and over about the same JOB ID, please contact a sys admin. Thanks. There is a bug in pbs that appears some time when the server would like to stop a running job but the node where the job is running does not respond (e.g. it did crash). When this happens, the server starts to send you a lot o identical mail messages telling you that it had to kill your job because it exceeded the time limit. If you start to receive the same message over and over about the same JOB ID, please contact a sys admin. Thanks.
 +===== Tips and Tricks =====
 +=== Delete all queued jobs ===
 +<code>
 +qstat -u $(whoami) -n1 | grep "  --" | awk '{print $1;}' | while read a ; do qdel ${a%%.*} ; done 
 +</code>
 +
 +=== A script that run as long as possible ===
 +Here is a short script that can be useful in those cases where you have the same 
 +calculation to run many times (e.g. for collecting statistics). 
 +
 +Since the machines are different and take different time to run the program, one usually
 +allocates the time needed by the slowest machine even if on the fastest machine the actual
 +running time would be 1/10 of the requested one. As you know, the queueing system does 
 +not like when it is provided with wrong informations.
 +
 +The following script will keep running your program until there is time left. It will use
 +the time needed to run 1 iteration to decide if another one can be ran.
 + 
 +<code>
 +qstat=/usr/bin/qstat
 +jobid=${PBS_JOBID%%.*}
 +
 +# check how much time is left and set the "moretime" variable accordingly 
 +checktime() {
 +  if [ -x $qstat ] ; then
 +    times=$(qstat -n1 $jobid | tail -n 1)
 +    let tend=$(echo $times | awk '{print $9}' | awk -F : '{print $1*3600+$2*60;}')
 +    let tnow=$(echo $times | awk '{print $11}' | awk -F : '{print $1*3600+$2*60;}')
 +    let trem=$tend-$tnow
 +    let tmin=$tnow/$niter
 +    if [ $trem -ge $tmin ] ; then
 +      moretime="yes"
 +    else
 +      moretime="no"
 +    fi
 +  else
 +    # cannot say => random guess
 +    moretime="yes"
 +  fi
 +}
 +
 +# Execute a task as many times as possible. 
 +let niter=0;
 +moretime="yes"
 +while [ "$moretime" == "yes" ] ; do
 +  # run your program here
 +  ./my_program.x
 +  let niter=$niter+1
 +  checktime
 +done
 +</code>
sge.txt · Last modified: 2015/11/16 11:18 by 127.0.0.1