GOLD Validation: Results on Initial Test Set of 100 Protein-Ligand Complexes
You can download co-ordinates of the individual test systems and view the GOLD solutions by
clicking on the PDB codes in the table below. You will need to download the Chime plug-in from
the US MDL Web site or the
UK site. The GOLD prediction is shown, together
with the crystallographically observed position and conformation, both encoded in MDL MOL file format.
Coordinates of the test systems are available for downloading, so other groups are free to test their
docking software on the same complexes. All file formats are SYBYL MOL2. Thanks are due to the Brookhaven
PDB Group for allowing distribution of these data.
The Data Set
100 PDB protein-ligand complexes.
- "interesting" ligands.
- "drug-like" ligands (but not many available in the PDB!).
- chosen at the Cambridge Crystallographic Data Centre, independently of GOLD's author.
Initially, GOLD failed on a number of the test complexes because, for example,
the ligand had insufficient hydrogen-bonding atoms or the protein included
a metal ion. These problems were solved, and GOLD eventually produced an answer for
99 of the test complexes, failing only on 1ACL, where the ligand has no
H-bond donors or acceptors at all.
Results of GOLD's Predictions
GOLD achieved a 71% rate of successful predictions.
In summarising the results, the GOLD prediction is defined as the best of the 20 GA dockings
according to the GOLD fitness score and not the docking that is closest to the experimental result.
Position of top ranked molecule (for those complexes where the GA produced an answer).
Each GOLD prediction was assigned to one of 4 subjective categories: good,
close, errors or wrong.
To find information on a test-system, view the GOLD prediction, or download
the test-system click on the PDB code in the table.
Results by RMSD
The table below shows the relationship between the subjective classification
used above and a more objective measure: the RMSD between the GOLD prediction and the
crystallographically determined coordinates.
First set of validation tests: Summary of RMSD results
| RMS
| #Total
| #Good
| #Close
| #Errors
| #Wrong
|
| <=0.5
| 8
| 8
| 0
| 0
| 0
|
| >0.5, <=1.0
| 27
| 24
| 3
| 0
| 0
|
| >1.0, <=1.5
| 20
| 7
| 13
| 0
| 0
|
| >1.5, <=2.0
| 11
| 2
| 9
| 0
| 0
|
| >2.0, <=2.5
| 2
| 0
| 2
| 0
| 0
|
| >2.5, <=3.0
| 3
| 0
| 2
| 1
| 0
|
| >3.0
| 28
| 0
| 1
| 8
| 19
|
Click here to view the full table of RMSD results
Results by ligand composition
This table shows how the subjective result varies with the number of
ligand atoms, the percentage of polar atoms in the ligand, and the number
of ligand rotatable bonds and free corners.
First set of validation tests: Ligand characterisation
| Subjective Result
| Number of heavy atoms
| % of heavy atoms which can form hydrogen bonds
| Number of torsions
|
| Max
| Avg
| Min
| Max
| Avg
| Min
| Max
| Avg
| Min
|
| Good and close
| 52
| 20.4
| 6
| 66.7
| 31.9
| 8.7
| 28
| 7.9
| 0
|
| Errors and wrong
| 55
| 24.3
| 9
| 53.9
| 25.1
| 4.8
| 40
| 11.4
| 0
|
Results by resolution of protein structure
This table shows that GOLD is more likely to fail if the protein
structure is of poor resolution. This is an interesting result, since it suggests that
some of the discrepancies between the GOLD predictions and the observed ligand
positions are due to experimental errors in the PDB.
First set of validation tests: Resolution of PDB complexes
| Resolution
| #Total
| #Good + #Close
| #Errors + #Wrong
|
| >1.0, <=1.5
| 2
| 2
| 0
|
| >1.5, <=2.0
| 44
| 34
| 10
|
| >2.0, <=2.5
| 32
| 24
| 8
|
| >2.5, <=3.0
| 20
| 11
| 9
|
| >3.0
| 1
| 0
| 1
|
Problems with the GOLD Algorithm
The above tests highlighted a number of problems, most of which have now been
solved:
- Some cases where the algorithm would not give any result at all.
- Insufficient weight given to large areas of hydrophobic contact.
- Insufficient probability of charged ligand groups being solvent accessible.
One ongoing problem is speed: GOLD is not the fastest algorithm available.
However, there is, as always, a trade-off between speed and reliability.
A number of faster GA parameter settings have been developed for those who wish
to sacrifice some reliability for increased speed.
|