CCDC




There are three validation sets available for download:


Astex Non-native Set

The Astex Non-native Set consists of the following:
Sixty-five directories, one for each native protein selected from the Astex Diverse Set. Each "native" directory contains directories for each non-native structure. For example Astex Diverse Set entry 1gm8 has four non-native structures: 1fxh, 1fxv, 1gkf and 1gm7. The top level directory is called 1gm8 and has 4 subdirectories called 1fxh (non-apo), 1fxv (non-apo), 1gkf (apo), 1gm7 (non-apo). Each subdirectory will contain some or all of the following:

  • protein.mol2 file

  • ligand001.pdb file
    The extracted co-crystallised active site ligand from the non-native protein, only for non-apo structures. Note that the ligand files have not been prepared for docking i.e. bond types and protonation states are likely to be incorrect.

  • ligand_other.pdb file
    Any other non active site ligands from the protein file.

  • other.pdb file
    Contains disordered atom coordinates, any other co-factors other than ligands or water.

  • water.pdb file
    Contains the extracted water atoms.

To download the Astex Non-native Set simply click on the link below:

astex_non_native_set.tar.gz (~185Mb)

Note: to obtain the native protein and ligand files for the Non-native Set you will also need to download the Astex Diverse Set below.

Astex Diverse Set

The Astex Diverse Set consists of the following files:

  • protein.mol2 file

  • ligand.mol file
    This is both the input and the reference file.

  • protein_opt_h_gs.mol2 file
    A SYBYL MOL2 file for the protein for which the flexible hydrogen atoms on Ser/Thr/Tyr/Lys residues have been optimised with the GoldScore function.

  • protein_opt_h_cs.mol2 file
    A SYBYL MOL2 file for the protein for which the flexible hydrogen atoms on Ser/Thr/Tyr/Lys residues have been optimised with the ChemScore function.

To download the Astex Diverse Set simply click on the link below:

astex_diverse_set.tar.gz (~56Mb)

CCDC/Astex Validation Set

The CCDC/Astex set consists of the following files:

  • protein.mol2 file
  • ligand_reference.mol2 file
    This contains the ligand pose as found in the PDB entry. Entries with multiple binding modes, such as 1abe, are stored as follows: ligand_reference1.mol2, ligand_reference2.mol2, with the accompanying protein files protein1.mol2 and protein2.mol2.
  • ligand_reference_min.mol2 file
    This file contains a 'normalised' version of the ligand_reference; a short minimisation run was performed to clean up bond lengths and bond angles. It is the input file used for the docking experiments.
  • gold.conf file
    The GOLD configuration file can be used with the GOLD docking program. It also contains the centre and radius of the binding site. For covalently-bound ligands, a flag is set in this file and atom numbers of the link are stored.
  • water.mol2 file
    This file is available for those PDB entries that include a water set; it is currently only available for entries that were not included in the previous GOLD validation set.

To download the CCDC/Astex validation set simply click on the link below:

ccdc_astex_set.tar.gz (~32Mb)

Original GOLD Validation Set

GOLD was originally validated on a two phase test, initially on a set of 100 complexes and later on an additional 34 complexes as a check against over-training. A file containing coordinates for all 134 test complexes is also available to download:

original_set.tar.gz (~9mb)




| privacy policy |
| browser support |
Follow CCDC:

Copyright © 2004-2012 The Cambridge Crystallographic Data Centre
12 Union Road, Cambridge, CB2 1EZ, UK, +44 1223 336408
Registered in England No.2155347 Registered Charity No.800579