This page is about using multiple CPUs to run to run SPM single subject analyses in parallel.
Our specific application uses a DT-12 multi-CPU machine from Orion Multisystems. We call this machine Beno.
The beno
has 12 CPUs. In fact, it contains 12 small computers, or nodes, with one CPU per computer. The easiest way to distribute your SPM jobs across these nodes is to run one subject on each.
The beno
is attached via a gigabyte link to a server, called Hoover
(it is rather loud). It contains a terabyte RAID array we use for
primary storage. Here's a diagram: the lines indicate NFS mounts.
The most convenient way to run parallel SPM analyses on the beno
would be to put all our data on the hoover
and get each of the 12
nodes to read data and write data on the hoover
disks. The problem
is that having 12 processes trying to do high-volume reading and
writing onto the server would slow the whole thing down considerably.
One partial way round this is to do the following:
hoover
hoover
to the local node disk
hoover
This means that there is some bottleneck as the archives are copied over, but there is just one set of read and writes for each job, which greatly reduces the use of the network. This page describes how to go about making and testing such a batch script.
We have version 6.0u1.
Do what? | Example command | Comment |
---|---|---|
Submit jobs to the queue | qsub my_job.sh | calls an SGE script 'my_job.sh' |
Submit job, several tasks | qsub -t 1-23 my_job.sh | Calls my_job.sh 23 times, SGE_TASK_ID set to the task number (1-23) |
See status of jobs | qstat | Shows jobs currently in queue |
Delete job | qdel 246 | Deletes job number 246 |
In fact the SGE on beno
only uses 11 of the 12 nodes
for processing. The SGE uses the first node to distribute the jobs to
the other 11 nodes. The first node is also called the head node.
On beno
:
hoover
home directory system: /import/hoover
hoover
imagers space: /import/hoover/home/imagers
When you first log into the beno
, you are running on the nead node, called n0
. If you want to log into the other nodes, you can just ssh with (e.g) ssh n3
. If you log in as user matlab
with su - matlab
, this happens automatically, without you needing to retype your password.
SGE script files are just the same as normal bash or csh script files,
but can have pseudo-comments embedded in them. These start with #$
and then have the same format as the qsub
command line options. See the qsub man page for a list of those options.
You can write SGE scripts in other shells, even Python, but it's safer not to - I've had rather silly problems calling system commands from an SGE Python batch script for example. So, the easiest is to write a shell script which calls - say - a python script. Here's the one I use:
#!/bin/sh # SGE batch script wraps a python script call # SGE options in pseudo-comments below # request Bourne shell as shell for job #$ -S /bin/sh # join stdout and stderr #$ -j y # redefine output file #$ -o imagers/sge/output_files if [ $# -ne 0 ] then python $* fi
This script is in the /home/imagers/sge/scripts directory. So, to call the Python script that I actually want to run with the SGE, I run:
cd /home/imagers/sge/scripts qsub run_python.sh /home/imagers/sge/scripts/my_python_script.py
Note the full path the the Python script, even though I am in the same directory.
You will often need to check what happened to your SGE jobs when they totally, like, failed. For this you will need the output and errors from your jobs. By default the SGE stores these as seperate output and error files named after your job and job number, in your home directory. The run_python.sh script above redirects the script output and error to the same file, and stored it in the /home/imagers/scge/output_files directory.
Here are just some terms to make the next discussion easier:
server_root
: the directory where your imaging data is stored.
This is where you are going to copy the data from before doing
your processing
parameter_root
: directory where parameter files are stored for
your analysis, such as batch scripts, normalization parameters,
reference files etc. These are files you can't be bothered to copy down
to your local disk, because they are small enough not to take up
much network traffic space when you access them on a network
disk.
fdata_root
: directory on the local disk of you node that you
are going to use to store the functional data copied across from the
server_root
.
server_root
to the fdata_root
server_root
fdata_root
to save space for everyone else
Start on your own machine. Make a matlab file like this:
% Test batch file for later transfer to the =beno= % Location of the directory tree containing parameters parameter_root = '/import/hoover/imagers/choice'; % Location of directory tree containing functiona data fdata_root = '/home/imagers/choice'; % String with subject's directory name subject_sdir = '05AM'; % Subdirectory in parameter_root containing batch files batch_sdir = 'groove_1'; % Subdirectory in subject's directory in which to put analysis files ana_sdir = 'spm2_groove1'; % addpath(fullfile(parameter_root, batch_sdir)); choice_one_subject;
Now you need to write the choice_one_subject
matlab file, which can take the information in the subno
, subject_sdir
, parameter_root
and fdata_root
variables, and run the processing on a single subject.
The GroovyBatch system is one way to get this running - in fact it was designed to do this.
Done that? Good, now almost all the hard work is over.
Next, try running your batch script from within a python script that does the rest of the work:
server_root
to fdata_root
Here is an example (included in the GroovyBatch archive). Note that the script generates your matlab batch script above on the fly.
The script assumes that you have packed each subject's data into a
bzipped archive, named after the subject, and put this in an archive
directory, here /import/hoover/imagers/choice/archives
.
Don't forget to change the server_root
and fdata_root
and parameter_root
variables at the top of the batch file.
#!/usr/bin/env python # # Script to fetch archives, unpack to temporary directory, set # up matlab batch run, and copy results back to main file space import os, sys, shutil # Directory where archived (bz2) data is stored for each subject server_root = '/import/hoover/imagers/choice/archives' # Directory containing tree of subject info - typically (say) # anatomical images, files giving stimulus times, batch files. parameter_root = '/import/hoover/imagers/choice' # Temporary directory to unpack functional data into, and do processing from fdata_root = '/var/tmp/choice' # Name of matlab batch file to run processing for single subject one_sub_mfile = 'choice_one_subject' # Name of the analysis subdirectory for SPM; this will become a # subdirectory in each subject's fdata_root subdirectory ana_sdir = 'spm2_humm_ana' # Name of batch file directory in parameter_root batch_sdir = 'groove_1' # We need the list of all subjects to work out which subject to use here. subjects = """ 03FR 04AL 05AM 06AW 07CS 09NM 10PL 11KS 12KK 13AM 14AD 15BD 16NK 18RL """.split() # Documenting subject exclusion # 17HI - massive artefacts - subject excluded # Path that the bz2 archives unpack the data to. This is useful if the # archive you packed was from a different directory tree from the one # you want to use on the local disk. If the archive just unpacks to # give the subject directory as you want it, set this variable to '' # (empty) tar_unpack_path = '' # Task ID from environment variable task_id = os.environ.get('SGE_TASK_ID') if not task_id: task_no = 1 else: task_no = int(task_id) subject = subjects[task_no-1] # make the fdata_root directory if it doesn't exist # then go there. subj_dir = "%s/%s" % (fdata_root, subject) try: os.makedirs(subj_dir) except OSError: pass os.chdir(fdata_root) # Download and unpack the bz2'ed data tar_file = "%s.bz2" % (subject,) # tar file for this subject shutil.copy("%s/%s" % (server_root, tar_file), '.') os.system("tar jxvf %s" % tar_file) os.unlink(tar_file) # If the archive unpacked somewhere other than to the subject's # directory, we have some more work to do if tar_unpack_path: # Move tar output subject directory to subject directory here shutil.move("%s/%s" % (tar_unpack_path, subject), subject) # If the tar unpacked to a directory tree, we should delete this tree if not os.path.isabs(tar_unpack_path): # get root directory of tar_unpack_path rt = tar_unpack_path; while rt: (rt, e) = os.path.split(rt) if os.path.isdir(e): shutil.rmtree(e) # cd to the analysis directory, ready to run the matlab scripts ana_dir_full = "%s/%s" % (subj_dir, ana_sdir) try: os.mkdir(ana_dir_full) except OSError: pass os.chdir(ana_dir_full) # make matlab startup batch file, and run batch job start_file = "startup.m" f = open(start_file, 'wt') f.write("""%% Python generated matlab startup file parameter_root = '%s'; fdata_root = '%s'; subno = %d subject_sdir = '%s'; ana_sdir = '%s'; batch_sdir = '%s'; addpath(fullfile(parameter_root, batch_sdir)); %s; exit """ % (parameter_root, fdata_root, task_no, subject, ana_sdir, batch_sdir, one_sub_mfile)) f.close() os.system('matlab -nojvm') os.remove(start_file) # Pack model directory into tar archive, and copy back to server os.chdir(fdata_root) ana_tar = "%s_%s.tar.gz" % (subject, ana_sdir) os.system("tar zcvf %s %s/%s" % (ana_tar, subject, ana_sdir)) shutil.move(ana_tar, server_root) # Delete analysis directory to save space. DON'T FORGET THIS otherwise # you will fill up the temporary space on the cluster machines shutil.rmtree(subject, ignore_errors=True)
Run this script with python my_script.py
SWE-06/20/05, MB-06/23/05: Note on testing the above script with a single subject.
You need to take care with a single subject. Lets say that the subject you want to test the script on is '03FR'. If the list of subjects contains only one entry, like this:
subjects = ('03FR')
then, in python, the variable subjects
will become a string "03FR"
. This is because the parentheses are ambiguous; they could mean paratheses-for-expressions, as in (3+4)*2
,
or they could mean "this is a tuple" (a tuple being a term in python
for a list that cannot be changed once initialised). By default python assumes they are
parenthesis for expressions, and you get a string rather than a tuple.
When you ask for subject[0]
you will therefore get 0
,
which is not what you want. In this situation, you need to indicate to
python that this is not a string, but a tuple with only one value. You
do this with an extra comma:
subjects = ('03FR',)
Copy your archived data to a data directory in the /home
filesystem on the hoover
.
Copy your parameter tree to a directory on the hoover
, including your matlab batch files.
Make sure your script is visible from beno
- for example it could be in the hoover
/home system (mounted on beno
as /import/hoover
) - or you could copy your python script to the beno
.
Change the paths in the Python script to match the beno
:
fdata_root
should be in /var/tmp
(which is on the nodes' local disk)
parameter_root
and server_root
should be somewhere in /import/hoover/
SWE-06/20/05: Make sure that matlab can write to server_root
. I would recommend
settting server_root
to somewhere in /import/hoover/imagers
.
Test your script on the head node of the beno
:
ssh beno su - matlab python my_script.py
Check you get no obvious errors.
First make sure the Python script is executable:
chmod u+x /path/to/your/my_script.py
Then:
su - matlab cd /home/imagers/sge/scripts qsub -t 1-18 run_python.sh /path/to/your/my_script.py
Of course, despite everything, it will crash. Check the output and
error logs. If you used the run_python.sh script above, the error and
output logs are combined and written to /home/imagers/sge/output_files
Last Refreshed: Sun Dec 25 12:13:48 GMT 2005