|
|
|
|
|   |
GOLD and PBS Pro
1. About
1.1 CCDC
Profile
The CCDC was founded in 1965 to record the results of small-molecule crystal structure analyses. The Cambridge Structural Database (CSD) was one of the first numerical databases created anywhere in the world.
The CCDC also develops CSD access software, knowledge bases of structural information, and applications software that uses crystal structure information to solve problems in structural chemistry and the life sciences.
CCDC products are widely used in industry and academia, particularly for basic research in structural chemistry, rational molecular design and pharmaceutical materials development. CCDC products are firmly based on scientific quality and relevance. We collaborate and publish widely, and several of our products arose from these collaborations.
Originating in the Chemistry Department of the University of Cambridge, the CCDC is now a fully independent non-profit company with charitable status situated on the University's Chemistry Campus.
Contact Information
Web: http://www.ccdc.cam.ac.uk
Email: support@ccdc.cam.ac.uk
Telephone: +44 1223 3360221.2 Altair Engineering
Profile
Altair Engineering, Inc. strengthens client innovation and decision-making through technology that optimizes the analysis, management and visualization of business and engineering information. Privately held with more than 900 employees, Altair has offices throughout North America, Europe and Asia/Pacific. With a 20-year-plus track record for product design, advanced engineering software and grid computing technologies, Altair consistently delivers a competitive advantage to customers in a broad range of industries.
Contact Information
Web: http://www.pbspro.com (Troy, Michigan USA)
Email: pbssupport@altair.com
Telephone: +1 248 614 2425PBS Professional manuals and binaries for specific operating systems are available for download at http://www.pbspro.com/UserArea. If you do not have a valid license of PBS Professional please contact Altair Engineering's Grid Works Group (sales@pbspro.com).
2. Installing Software
2.1 PBS Professional
The installation of PBS Professional can vary in complexity depending upon many factors. PBS Professional should be installed according to instructions provided in the Quick Start Guide.
In general what is needed is a Linux/Unix cluster system with a single head node. Cluster members must have the PBS MOM (execution node) files installed with their configurations set to point to the same PBS Professional Server. It is advisable (but not necessary) to configure a shared file system available to the server and each execution node. It is also advised that some form of passwordless login (ssh host based authentication for example) be employed on the cluster.
A separate job submission system may also be setup if desired. This is option 3 during the installation of PBS Professional.
Questions regarding the installation of PBS Professional should be directed to the support group at Altair Engineering.
Altair Engineering, Inc.
http://www.pbspro.com
Troy, Michigan USA
pbssupport@altair.com
+1 248 614 24252.2 GOLD
GOLD must be installed on each execution node (MOM) system in the cluster on which you intend to run GOLD jobs, or may be installed to a shared directory which is accessible to all of the execution nodes. Please see the GOLD installation documentation for details on installing GOLD: http://www.ccdc.cam.ac.uk/support/documentation/#gold.
3. Examples
GOLD jobs can be launched under PBS Professional in a variety of ways. The following is a listing of examples that might be useful.
- Launching a single GOLD job on a single MOM (see Section 3.1)
- Lauching multiple GOLD jobs (see Section 3.2)
3.1 Launching a Single GOLD job on a Single MOM
Provided you have a working gold.conf file the facility for launching a GOLD job is a script called
gold_auto. This script can be submitted to be PBS Professional in a variety of ways:
- via the command line (see Section 3.1.1)
- via a PBS Professional batch job script (see Section 3.1.2)
3.1.1 Command line
qsub -j oe -N jobname $GOLD_DIR/bin/gold_auto gold.confThis submits a GOLD job to a PBS Professional server with a job name of jobname and the output/error files merged into one file using a configuration file called
gold.conf.3.1.2 PBS Professional Batch Job Scripts
PBS Professional can also use job scripts. They look and feel very similar to shell scripts which are then executed as inputs to the qsub command. See the PBS Professional 7.1 Users guide for more information.
1) A single job
#!/bin/sh
#PBS -N single_run
#PBS -l walltime=00:20:00
#PBS -l group=1
#PBS -j oe
date
hostname
$GOLD_DIR/bin/gold_auto $HOME/screentest/gold.conf
dateThis job would be launched using the command line qsub filename.
3.2 Launching Multiple GOLD Jobs
Launching multiple GOLD jobs can be accomplished in a variety of ways in combination with PBS. Both serial submission of individual jobs as well as parallel execution using the PVM system are possible. Below are a few examples of how PBS and GOLD can work together in these ways:
- PBS Professional Batch Job Scripts (see Section 3.2.1)
- PBS Submission Script (Job Arrays) (see Section 3.2.2)
- Shell Scripts (see Section 3.2.3)
3.2.1 PBS Professional Batch Job Scripts
A GOLD job using PVM on 2 nodes. Please note that PVM must be correctly configured and running prior to launching any jobs from the command line or PBS scripts.
#!/bin/sh
#PBS -N parallel_run
#PBS -l walltime=1:00:00
#PBS -j oe
#PBS -V
date
indir=$HOME/screentest
#start Gold job
$GOLD_DIR/bin/parallel_gold_auto 2 $indir/gold.conf $indir gold.hosts $indir
date3.2.2 PBS Submission Script (Job Arrays)
Job Arrays are a PBS programming contruct for submitting jobs that allows for the use of a numerical varible for multiple job submission. Job submission can be coded in such a way that multiple similar jobs or jobs with mulitple similar gold.conf file names can be launched from a single PBS script. This could also be easily adapted to run a series of non PVM jobs as well. The script below will submit 2 jobs (gold.conf.2 and gold.conf.4) to a 4 host PVM job. Please see chapter 9 of the PBS Professional User's Guide for more detail on Job Array use.
#!/bin/sh #PBS -N tut6test #PBS -l walltime=1:00:00 #PBS -j oe #PBS -J 2-4:2 date tutdir=$GOLD_DIR/examples/test/pvm_test echo "add " `cat $PBS_NODEFILE` | pvm echo "conf" | pvm$GOLD_DIR/bin/parallel_gold_auto 4 $tutdir/ gold.conf.$PBS_ARRAY_INDEX $tutdir/gold.hosts $tutdirdate3.2.3 Shell scripts
Shell scripts can be written to customize more elegant solutions for input to qsub. Below is an example of a script used to divide a single GOLD job into multiple parts for submission across multiple hosts. This script does not require a subsequent submission step to qsub.
#!/bin/sh # Split a multi mol2 GOLD docking job into # batches for parallel processing using multiple # serial GOLD processes on a PBS cluster GOLD_DIR=/GOLD/gold_v3.0.1 export GOLD_DIR n_hosts=3 # number of execution nodes to use dir="$HOME/screentest" file="$dir/multi.mol2" ## PBS qsub_args="-j eo" ## GOLD n_docks=5 # number of dockings per ligand s="start_at_ligand" f="finish_at_ligand" n_mols=`grep MOLECULE $file | wc -l` split=`expr $n_mols / $n_hosts` echo "$n_hosts hosts, $n_mols ligands: $split each" cd $dir h=1 # host counter x1=1 # start ligand x2=$split # end ligand while [ $h -le $n_hosts ]; do [ $h -eq $n_hosts ] && x2=$n_mols # last host gets whatever's left sed -e "s;^\(protein_datafile\).*;\1 = $dir/protein.mol2;" \ -e "s;^\(ligand_data_file\).*;\1 $file $n_docks $s $x1 $f $x2;" \ -e "s;^\(directory\).*;\1 = $dir/output_${x1}_${x2};" \ gold.conf > conf_${x1}_${x2} echo "$GOLD_DIR/bin/gold_auto $dir/conf_${x1}_${x2}" | qsub qsub_args -N "run_${x1}_${x2}" h=`expr $h + 1` x1=`expr $x2 + 1` x2=`expr $split \* $h` sleep 3 done
|
Cambridge Crystallographic Data Centre Web: http://www.ccdc.cam.ac.uk Support Email: support@ccdc.cam.ac.uk Support Phone: +44 1223 336022 |