|
||||||||||
Section Contents
|
GEOS SGE Cluster Frequently Asked Questions.What is a job?A job is a single command that can be run on the command line, for example 'date' or 'date | tr : . > datefile' What is a queue?A queue is a list of jobs to be run, jobs are submitted by users on a submission host to be added to the queue. How do I submit a job?The best way to submit a job is as a script, for example 'qsub jobscript.sh' where the jobscript.sh looks as follows: # This is an example job script that can be used to submit a job to the GEOS SGE cluster # Arguments to pass to _qsub_ whenever this script is used: # Request Bourne shell as shell for job: #$ -S /bin/bash # Set the maximum run time to 1 hours, 0 minutes, 0 seconds: #$ -l h_rt=01:00:00 # Redirect stdout/stferr files to the directory the job is submitted from: #$ -cwd # setup a temporary directory local to the machine to store working data export TMPDIR=/disk/scratch/local/$USER # set a variable, print variable into a file and transfer that file to backed up home directory export VAR=hello echo $VAR > $TMPDIR/hello cp -a $TMPDIR/hello ~/hello Notice that arguments can be passed to qsub from within the script; these lines must start '#$' - ('#' flags them as comments in the bash script, '$' flags them as arguments to be passed to (and parsed by) qsub). The most important arguments are '-cwd' as this redirects the stdout and stderr files that SGE produces to the directory the job is submitted from, and '-l h_rt=HH:MM:SS' which specifies the maximum length of time that the job should take to run - after this period the job will be killed automatically, if something goes wrong with a job this stops it going into an infinite loop and blocking use of that CPU core without the intervention of IT support. Which machines should I submit my jobs on?there are a number of submission hosts including fleet. You may submit jobs to the SGE cluster queue only from a submit host. Additional submit hosts may be added as required How many jobs can I submit?A queue can hold up to two hundred thousand jobs - but if you are submitting more than a handful you should submit them as array jobs. What is an Array Job?An array job is a special type of job that the queueing system can deal with. Sometimes, you want to run a number of mostly identical jobs with the only difference being input parameters or data sets. Rather than submitting each as an independent job you can submit an Array Job. Array jobs only have one job-ID (making them easier to handle and keep a track of) and place a significantly lower load on the system than would otherwise be the case. Rather than submitting a whole group of jobs: qsub job.sh data.1 you can use an Array Job to submit, stop and delete all the jobs with just one command. To submit an array job use the '-t <x-y>' argument where x and y are integers and a higher level control script, for example: qsub -t 1-100 job.array.sh data and where 'job.array.sh' looks like ('data' is an argument to 'job.array.sh'): # This is an example job array script that can be used to submit an array job to the GEOS SGE cluster # Arguments to pass to 'qsub' whenever this script is used: # Request Bourne shell as shell for job: #$ -S /bin/bash # Set the maximum run time to 1 hours, 0 minutes, 0 seconds: #$ -l h_rt=01:00:00 # Redirect stdout/stferr files to the directory the job is submitted from: #$ -cwd # the job to be run 'x' times job.sh $1.$SGE_TASK_ID This will schedule 100 jobs, with each one being identical except for the data input being data.number, with number counting up from 1 to 100. $1 represents the string passed on the command line (in this case 'data'), and $SGE_TASK_ID represents the counter. The job.sh could be exactly the same as the earlier jobscript.sh example but be aware that the arguments passed to qsub will be the ones in job.sh. Are there any other useful arguments for qsub?Some of the more useful ones are:
A complete list can be found on the qsub man page ('man qsub' while logged in on cyclone). |
|||||||||
|
© School of GeoSciences ---
Privacy & Cookies ---
Last modified: 27 Feb, 2009 --- Page contact:
|
||||||||||