CCDC       Altair Engineering
GOLD
 


GOLD and PBS Pro

1. About

1.1 CCDC

1.2 Altair Engineering

1.1 CCDC

Profile

The CCDC was founded in 1965 to record the results of small-molecule crystal structure analyses. The Cambridge Structural Database (CSD) was one of the first numerical databases created anywhere in the world.

The CCDC also develops CSD access software, knowledge bases of structural information, and applications software that uses crystal structure information to solve problems in structural chemistry and the life sciences.

CCDC products are widely used in industry and academia, particularly for basic research in structural chemistry, rational molecular design and pharmaceutical materials development. CCDC products are firmly based on scientific quality and relevance. We collaborate and publish widely, and several of our products arose from these collaborations.

Originating in the Chemistry Department of the University of Cambridge, the CCDC is now a fully independent non-profit company with charitable status situated on the University's Chemistry Campus.

Contact Information

Web: http://www.ccdc.cam.ac.uk
Email: support@ccdc.cam.ac.uk
Telephone: +44 1223 336022

1.2 Altair Engineering

Profile

Altair Engineering, Inc. strengthens client innovation and decision-making through technology that optimizes the analysis, management and visualization of business and engineering information. Privately held with more than 900 employees, Altair has offices throughout North America, Europe and Asia/Pacific. With a 20-year-plus track record for product design, advanced engineering software and grid computing technologies, Altair consistently delivers a competitive advantage to customers in a broad range of industries.

Contact Information

Web: http://www.pbspro.com (Troy, Michigan USA)
Email: pbssupport@altair.com
Telephone: +1 248 614 2425

PBS Professional manuals and binaries for specific operating systems are available for download at http://www.pbspro.com/UserArea. If you do not have a valid license of PBS Professional please contact Altair Engineering's Grid Works Group (sales@pbspro.com).

2. Installing Software

2.1 PBS Professional

2.2 GOLD

2.1 PBS Professional

The installation of PBS Professional can vary in complexity depending upon many factors. PBS Professional should be installed according to instructions provided in the Quick Start Guide.

In general what is needed is a Linux/Unix cluster system with a single head node. Cluster members must have the PBS MOM (execution node) files installed with their configurations set to point to the same PBS Professional Server. It is advisable (but not necessary) to configure a shared file system available to the server and each execution node. It is also advised that some form of passwordless login (ssh host based authentication for example) be employed on the cluster.

A separate job submission system may also be setup if desired. This is option 3 during the installation of PBS Professional.

Questions regarding the installation of PBS Professional should be directed to the support group at Altair Engineering.

Altair Engineering, Inc.
http://www.pbspro.com
Troy, Michigan USA
pbssupport@altair.com
+1 248 614 2425

2.2 GOLD

GOLD must be installed on each execution node (MOM) system in the cluster on which you intend to run GOLD jobs, or may be installed to a shared directory which is accessible to all of the execution nodes. Please see the GOLD installation documentation for details on installing GOLD: http://www.ccdc.cam.ac.uk/support/documentation/#gold.

3. Examples

GOLD jobs can be launched under PBS Professional in a variety of ways. The following is a listing of examples that might be useful.

3.1 Launching a Single GOLD job on a Single MOM

Provided you have a working gold.conf file the facility for launching a GOLD job is a script called gold_auto. This script can be submitted to be PBS Professional in a variety of ways:

3.1.1 Command line

qsub -j oe -N jobname $GOLD_DIR/bin/gold_auto gold.conf

This submits a GOLD job to a PBS Professional server with a job name of jobname and the output/error files merged into one file using a configuration file called gold.conf.

3.1.2 PBS Professional Batch Job Scripts

PBS Professional can also use job scripts. They look and feel very similar to shell scripts which are then executed as inputs to the qsub command. See the PBS Professional 7.1 Users guide for more information.

1) A single job

#!/bin/sh
#PBS -N single_run
#PBS -l walltime=00:20:00
#PBS -l group=1
#PBS -j oe

date
hostname
$GOLD_DIR/bin/gold_auto $HOME/screentest/gold.conf
date

This job would be launched using the command line qsub filename.

3.2 Launching Multiple GOLD Jobs

Launching multiple GOLD jobs can be accomplished in a variety of ways in combination with PBS. Both serial submission of individual jobs as well as parallel execution using the PVM system are possible. Below are a few examples of how PBS and GOLD can work together in these ways:

3.2.1 PBS Professional Batch Job Scripts

A GOLD job using PVM on 2 nodes. Please note that PVM must be correctly configured and running prior to launching any jobs from the command line or PBS scripts.

#!/bin/sh
#PBS -N parallel_run
#PBS -l walltime=1:00:00
#PBS -j oe
#PBS -V

date
indir=$HOME/screentest

#start Gold job
$GOLD_DIR/bin/parallel_gold_auto 2 $indir/gold.conf $indir gold.hosts $indir

date

3.2.2 PBS Submission Script (Job Arrays)

Job Arrays are a PBS programming contruct for submitting jobs that allows for the use of a numerical varible for multiple job submission. Job submission can be coded in such a way that multiple similar jobs or jobs with mulitple similar gold.conf file names can be launched from a single PBS script. This could also be easily adapted to run a series of non PVM jobs as well. The script below will submit 2 jobs (gold.conf.2 and gold.conf.4) to a 4 host PVM job. Please see chapter 9 of the PBS Professional User's Guide for more detail on Job Array use.

#!/bin/sh
#PBS -N tut6test
#PBS -l walltime=1:00:00
#PBS -j oe
#PBS -J 2-4:2

date
tutdir=$GOLD_DIR/examples/test/pvm_test

echo "add " `cat $PBS_NODEFILE` | pvm
echo "conf" | pvm
 
$GOLD_DIR/bin/parallel_gold_auto 4 $tutdir/
gold.conf.$PBS_ARRAY_INDEX $tutdir/gold.hosts $tutdir 
date 

3.2.3 Shell scripts

Shell scripts can be written to customize more elegant solutions for input to qsub. Below is an example of a script used to divide a single GOLD job into multiple parts for submission across multiple hosts. This script does not require a subsequent submission step to qsub.

#!/bin/sh

# Split a multi mol2 GOLD docking job into 
# batches for parallel processing using multiple 
# serial GOLD processes on a PBS cluster

GOLD_DIR=/GOLD/gold_v3.0.1
export GOLD_DIR

n_hosts=3 # number of execution nodes to use
dir="$HOME/screentest"
file="$dir/multi.mol2"

## PBS
qsub_args="-j eo"

## GOLD
n_docks=5 # number of dockings per ligand
s="start_at_ligand"
f="finish_at_ligand"

n_mols=`grep MOLECULE $file | wc -l`
split=`expr $n_mols / $n_hosts`
echo "$n_hosts hosts, $n_mols ligands: $split each"

cd $dir
h=1 # host counter
x1=1 # start ligand
x2=$split # end ligand

while [ $h -le $n_hosts ]; do

 [ $h -eq $n_hosts ] && x2=$n_mols # last host gets whatever's left

 sed  -e "s;^\(protein_datafile\).*;\1 = $dir/protein.mol2;" \
  -e "s;^\(ligand_data_file\).*;\1 $file $n_docks $s $x1 $f $x2;" \
  -e "s;^\(directory\).*;\1 = $dir/output_${x1}_${x2};" \
  gold.conf > conf_${x1}_${x2}

 echo "$GOLD_DIR/bin/gold_auto $dir/conf_${x1}_${x2}" | qsub qsub_args -N "run_${x1}_${x2}"

 h=`expr $h + 1`
 x1=`expr $x2 + 1`
 x2=`expr $split \* $h`

 sleep 3

done 
 

Cambridge Crystallographic Data Centre
Web: http://www.ccdc.cam.ac.uk
Support Email: support@ccdc.cam.ac.uk
Support Phone: +44 1223 336022