Parallel SPM batch scripting on the Beno
The overview
- Our system
- Simple parallel SPM
The Sun Grid Engine
Using the Sun Grid Engine
The SGE on Beno
Where stuff is
Getting to the nodes on the Beno
SGE script files
Output and error logs for SGE scripts
Some terminology
What your Beno SGE script will do
Steps in getting a subject by subject batch script to work

Parallel SPM batch scripting on the Beno

The overview

This page is about using multiple CPUs to run to run SPM single subject analyses in parallel.

Our system

Our specific application uses a DT-12 multi-CPU machine from Orion Multisystems. We call this machine Beno.

The beno has 12 CPUs. In fact, it contains 12 small computers, or nodes, with one CPU per computer. The easiest way to distribute your SPM jobs across these nodes is to run one subject on each.

The beno is attached via a gigabyte link to a server, called Hoover (it is rather loud). It contains a terabyte RAID array we use for primary storage. Here's a diagram: the lines indicate NFS mounts.

hoover_beno.png:

Simple parallel SPM

The most convenient way to run parallel SPM analyses on the beno would be to put all our data on the hoover and get each of the 12 nodes to read data and write data on the hoover disks. The problem is that having 12 processes trying to do high-volume reading and writing onto the server would slow the whole thing down considerably.

One partial way round this is to do the following:

Pack your raw subject data on into archives, one per subject
Store the archives on the hoover
Use a batch script to
- copy the data archive from the hoover to the local node disk
- unpack the archive
- run the analysis on the local data
- copy the results back to the hoover

This means that there is some bottleneck as the archives are copied over, but there is just one set of read and writes for each job, which greatly reduces the use of the network. This page describes how to go about making and testing such a batch script.

The Sun Grid Engine

Sun Grid Engine home page.

We have version 6.0u1.

SGE Howtos

SGE utiltiy man pages

Using the Sun Grid Engine

Do what?	Example command	Comment
Submit jobs to the queue	`qsub my_job.sh`	calls an SGE script 'my_job.sh'
Submit job, several tasks	`qsub -t 1-23 my_job.sh`	Calls my_job.sh 23 times, SGE_TASK_ID set to the task number (1-23)
See status of jobs	`qstat`	Shows jobs currently in queue
Delete job	`qdel 246`	Deletes job number 246

The SGE on Beno

In fact the SGE on beno only uses 11 of the 12 nodes for processing. The SGE uses the first node to distribute the jobs to the other 11 nodes. The first node is also called the head node.

Where stuff is

On beno:

Our current directory containing Sun Grid Engine scripts: /home/imagers/sge/scripts
The hoover home directory system: /import/hoover
The hoover imagers space: /import/hoover/home/imagers
Directory guaranteed to be local disk for every node: /var/tmp

Getting to the nodes on the Beno

When you first log into the beno, you are running on the nead node, called n0. If you want to log into the other nodes, you can just ssh with (e.g) ssh n3. If you log in as user matlab with su - matlab, this happens automatically, without you needing to retype your password.

SGE script files

SGE script files are just the same as normal bash or csh script files, but can have pseudo-comments embedded in them. These start with #$ and then have the same format as the qsub command line options. See the qsub man page for a list of those options.

You can write SGE scripts in other shells, even Python, but it's safer not to - I've had rather silly problems calling system commands from an SGE Python batch script for example. So, the easiest is to write a shell script which calls - say - a python script. Here's the one I use:

#!/bin/sh
# SGE batch script wraps a python script call

# SGE options in pseudo-comments below
# request Bourne shell as shell for job
#$ -S /bin/sh
# join stdout and stderr
#$ -j y
# redefine output file
#$ -o imagers/sge/output_files

if [ $# -ne 0 ]
then
    python $*
fi

This script is in the /home/imagers/sge/scripts directory. So, to call the Python script that I actually want to run with the SGE, I run:

cd /home/imagers/sge/scripts
qsub run_python.sh /home/imagers/sge/scripts/my_python_script.py

Note the full path the the Python script, even though I am in the same directory.

Output and error logs for SGE scripts

You will often need to check what happened to your SGE jobs when they totally, like, failed. For this you will need the output and errors from your jobs. By default the SGE stores these as seperate output and error files named after your job and job number, in your home directory. The run_python.sh script above redirects the script output and error to the same file, and stored it in the /home/imagers/scge/output_files directory.

Some terminology

Here are just some terms to make the next discussion easier:

server_root: the directory where your imaging data is stored. This is where you are going to copy the data from before doing your processing
parameter_root: directory where parameter files are stored for your analysis, such as batch scripts, normalization parameters, reference files etc. These are files you can't be bothered to copy down to your local disk, because they are small enough not to take up much network traffic space when you access them on a network disk.
fdata_root: directory on the local disk of you node that you are going to use to store the functional data copied across from the server_root.

What your Beno SGE script will do

Work out which subject's data to work with
Copy that subject's data from the server_root to the fdata_root
Possibly untar the data if you copied a tar archive
Change into some empty directory
Start up a new matlab session to run batch processing on this subject's data
Copy the results that you want back to the server_root
Delete the files from the fdata_root to save space for everyone else

Steps in getting a subject by subject batch script to work

Get a test matlab batch job running

Start on your own machine. Make a matlab file like this:

% Test batch file for later transfer to the =beno=

% Location of the directory tree containing parameters
parameter_root = '/import/hoover/imagers/choice';

% Location of directory tree containing functiona data 
fdata_root = '/home/imagers/choice';

% String with subject's directory name
subject_sdir = '05AM';

% Subdirectory in parameter_root containing batch files
batch_sdir = 'groove_1';

% Subdirectory in subject's directory in which to put analysis files
ana_sdir = 'spm2_groove1';  % 

addpath(fullfile(parameter_root, batch_sdir));

choice_one_subject;

Now you need to write the choice_one_subject matlab file, which can take the information in the subno, subject_sdir, parameter_root and fdata_root variables, and run the processing on a single subject.

The GroovyBatch system is one way to get this running - in fact it was designed to do this.

Done that? Good, now almost all the hard work is over.

Get the copy files, run batch script sequence running

Next, try running your batch script from within a python script that does the rest of the work:

Works out which subject to work on from a Unix environment variable (SGE_TASK_ID)
Copies the data from server_root to fdata_root
Maybe untars it
Starts the matlab job
Copies back the results

Here is an example (included in the GroovyBatch archive). Note that the script generates your matlab batch script above on the fly.

The script assumes that you have packed each subject's data into a bzipped archive, named after the subject, and put this in an archive directory, here /import/hoover/imagers/choice/archives.

Don't forget to change the server_root and fdata_root and parameter_root variables at the top of the batch file.

#!/usr/bin/env python
#
# Script to fetch archives, unpack to temporary directory, set
# up matlab batch run, and copy results back to main file space

import os, sys, shutil

# Directory where archived (bz2) data is stored for each subject
server_root = '/import/hoover/imagers/choice/archives'

# Directory containing tree of subject info - typically (say)
# anatomical images, files giving stimulus times, batch files.
parameter_root = '/import/hoover/imagers/choice'

# Temporary directory to unpack functional data into, and do processing from
fdata_root = '/var/tmp/choice'

# Name of matlab batch file to run processing for single subject
one_sub_mfile = 'choice_one_subject'

# Name of the analysis subdirectory for SPM; this will become a
# subdirectory in each subject's fdata_root subdirectory
ana_sdir = 'spm2_humm_ana'

# Name of batch file directory in parameter_root
batch_sdir = 'groove_1'

# We need the list of all subjects to work out which subject to use here.
subjects = """
03FR
04AL
05AM
06AW
07CS
09NM
10PL
11KS
12KK
13AM
14AD
15BD
16NK
18RL
""".split()

# Documenting subject exclusion
# 17HI   - massive artefacts - subject excluded


# Path that the bz2 archives unpack the data to. This is useful if the
# archive you packed was from a different directory tree from the one
# you want to use on the local disk.  If the archive just unpacks to
# give the subject directory as you want it, set this variable to ''
# (empty)
tar_unpack_path = ''

# Task ID from environment variable
task_id = os.environ.get('SGE_TASK_ID')
if not task_id:
    task_no = 1
else:
    task_no = int(task_id)
subject = subjects[task_no-1]

# make the fdata_root directory if it doesn't exist
# then go there.
subj_dir = "%s/%s" % (fdata_root, subject)
try:
    os.makedirs(subj_dir)
except OSError:
    pass
os.chdir(fdata_root)

# Download and unpack the bz2'ed data
tar_file =  "%s.bz2" % (subject,)  # tar file for this subject
shutil.copy("%s/%s" % (server_root, tar_file), '.')
os.system("tar jxvf %s" % tar_file)
os.unlink(tar_file)

# If the archive unpacked somewhere other than to the subject's
# directory, we have some more work to do
if tar_unpack_path:
    # Move tar output subject directory to subject directory here
    shutil.move("%s/%s" % (tar_unpack_path, subject), subject)

    # If the tar unpacked to a directory tree, we should delete this tree
    if not os.path.isabs(tar_unpack_path):
        # get root directory of tar_unpack_path
        rt = tar_unpack_path;
        while rt:
            (rt, e) = os.path.split(rt)

        if os.path.isdir(e):
            shutil.rmtree(e)

# cd to the analysis directory, ready to run the matlab scripts
ana_dir_full = "%s/%s" % (subj_dir, ana_sdir)
try:
    os.mkdir(ana_dir_full)
except OSError:
    pass
os.chdir(ana_dir_full)

# make matlab startup batch file, and run batch job
start_file = "startup.m"
f = open(start_file, 'wt')
f.write("""%% Python generated matlab startup file
parameter_root = '%s';
fdata_root = '%s';
subno = %d
subject_sdir = '%s';
ana_sdir = '%s';
batch_sdir = '%s'; 

addpath(fullfile(parameter_root, batch_sdir));
%s; 

exit
""" % (parameter_root, fdata_root, task_no, subject,
       ana_sdir, batch_sdir, one_sub_mfile))
f.close()
os.system('matlab -nojvm')
os.remove(start_file)

# Pack model directory into tar archive, and copy back to server
os.chdir(fdata_root)
ana_tar = "%s_%s.tar.gz" % (subject, ana_sdir) 
os.system("tar zcvf %s %s/%s" % (ana_tar, subject, ana_sdir))
shutil.move(ana_tar, server_root)

# Delete analysis directory to save space. DON'T FORGET THIS otherwise
# you will fill up the temporary space on the cluster machines
shutil.rmtree(subject, ignore_errors=True)

Run this script with python my_script.py

SWE-06/20/05, MB-06/23/05: Note on testing the above script with a single subject.

You need to take care with a single subject. Lets say that the subject you want to test the script on is '03FR'. If the list of subjects contains only one entry, like this:

subjects = ('03FR')

then, in python, the variable subjects will become a string "03FR". This is because the parentheses are ambiguous; they could mean paratheses-for-expressions, as in (3+4)*2, or they could mean "this is a tuple" (a tuple being a term in python for a list that cannot be changed once initialised). By default python assumes they are parenthesis for expressions, and you get a string rather than a tuple. When you ask for subject[0] you will therefore get 0, which is not what you want. In this situation, you need to indicate to python that this is not a string, but a tuple with only one value. You do this with an extra comma:

subjects = ('03FR',)

Test on the Beno

Copy your archived data to a data directory in the /home filesystem on the hoover.

Copy your parameter tree to a directory on the hoover, including your matlab batch files.

Make sure your script is visible from beno - for example it could be in the hoover /home system (mounted on beno as /import/hoover) - or you could copy your python script to the beno.

Change the paths in the Python script to match the beno:

fdata_root should be in /var/tmp (which is on the nodes' local disk)
parameter_root and server_root should be somewhere in /import/hoover/

SWE-06/20/05: Make sure that matlab can write to server_root. I would recommend settting server_root to somewhere in /import/hoover/imagers.

Test your script on the head node of the beno:

ssh beno
su - matlab
python my_script.py

Check you get no obvious errors.

Try the SGE on the Beno

First make sure the Python script is executable:

chmod u+x /path/to/your/my_script.py

Then:

su - matlab
cd /home/imagers/sge/scripts
qsub -t 1-18 run_python.sh /path/to/your/my_script.py

Of course, despite everything, it will crash. Check the output and error logs. If you used the run_python.sh script above, the error and output logs are combined and written to /home/imagers/sge/output_files

Last Refreshed: Sun Dec 25 12:13:48 GMT 2005

Table of Contents