CCDC




SuperStar can be used to generate property maps for ligand binding sites in proteins. For a given binding site and a probe group, SuperStar gathers the necessary interaction data from the IsoStar knowledge base and calculates a three-dimensional map. This map highlights regions in the cavity where the probe has a high probability of occurring. For a set of protein-ligand structures, it is assessed whether or not the experimentally-observed positions of ligand groups are those predicted by SuperStar.

SuperStar was re-validated using both the full set and the filtered sets of the CCDC/Astex Validation Test Set. Results are shown in Table I. The validation procedure was the same as that reported previously, and assesses whether SuperStar is able to predict the chemical type of ligand groups using fields for alcohol oxygen, carbonyl oxygen, NH3 nitrogen and methyl carbon.

Success rates (i.e. percentage of correctly predicted ligand groups) are given for the following types of source data, originating from either the Cambridge Structural Database (CSD) or the Protein Data Bank (PDB):

raw: this is the raw crystallographic data as stored in IsoStar
parameterised: fitted representations of raw crystallographic
hybrid: a combination of the above, with fitted data being used if reliable, and raw data in all other cases.

Results are shown for two ranges of solvent-accessibility. The 0.0-1.0 range includes all predicted ligand groups regardless of accessibility; the 0.00-0.02 range covers buried ligand groups only.

Table 1. SuperStar validation results for full and filtered lists. Errors were determined using a bootstrapping procedure and are given as s.e. in the same units.

Prediction rates for the new validation set are very similar to the ones obtained for the smaller original GOLD validation set. Both CSD and PDB data give similar results. The difference between results for the full set and the filtered subset (clean list, all) is negligible, indicating that in this case, exclusion of dubious complexes does not have a large effect. A possible reason for this is that the errors in these structures (e.g. clashes) are very localised and effect only a small part of the ligand.

A trend can be observed in the success rates for the filtered subsets of limited resolution (clean lists, R<2.0Å, R<2.5Å). As expected, SuperStar predictions tend to be more reliable for well-resolved protein-ligand complexes.

Bootstrapping analysis was performed to estimate the error in the prediction rates. As can be seen, the standard error varies considerably with the size of the test sets used (size of full set: 305 entries; clean list: 224; clean, R<2.5:180; clean, R<2.0A: 92). The smaller the set is, the larger the confidence interval for the success rates derived from it.

SuperStar Results for Different Types of Protein

SuperStar success rates for different classes of proteins are shown in Table 2. Results are largely comparable to those observed in the above. Given the small size of these subsets, these results should be carefully analysed to see whether or not they actually differ from those seen for the whole set of 305 complexes.

Statistical analysis (chi-squared) revealed that only a few subsets display results that differ significantly from those expected: for buried ligand groups in metalloproteases and aspartic proteases SuperStar performs better and worse than expected, respectively, from the performance of the full set.

All other protein groups do not yield significantly different results from what is expected based on the full set (buried ligand groups), showing that SuperStar performs satisfactorily across the whole range of protein types.

Table 2. SuperStar success rates for protein subsets. Chi-squared values were calculated for the null hypothesis that both subset and full set come from the same distribution.




| privacy policy |
| browser support |
Follow CCDC:

Copyright © 2004-2012 The Cambridge Crystallographic Data Centre
12 Union Road, Cambridge, CB2 1EZ, UK, +44 1223 336408
Registered in England No.2155347 Registered Charity No.800579