SuperStar can be used to generate property maps for ligand binding sites in
proteins. For a given binding site and a probe group, SuperStar gathers
the necessary interaction data from the IsoStar knowledge base and calculates
a three-dimensional map. This map highlights regions in the cavity where the
probe has a high probability of occurring. For a set of protein-ligand structures,
it is assessed whether or not the experimentally-observed positions of ligand
groups are those predicted by SuperStar.
SuperStar was re-validated using both the full
set and the filtered sets of the
CCDC/Astex Validation Test Set. Results are shown
in Table I. The validation procedure was the same as that reported previously,
and assesses whether SuperStar is able to predict the chemical type
of ligand groups using fields for alcohol oxygen, carbonyl oxygen, NH3
nitrogen and methyl carbon.
Success rates (i.e. percentage of correctly predicted
ligand groups) are given for the following types of source data, originating
from either the Cambridge Structural Database (CSD) or the Protein Data Bank (PDB):
raw: this is the raw crystallographic
data as stored in IsoStar
parameterised: fitted representations
of raw crystallographic
hybrid: a combination of the above,
with fitted data being used if reliable, and raw data in all other cases.
Results are shown for two ranges of solvent-accessibility.
The 0.0-1.0 range includes all predicted ligand groups regardless of accessibility;
the 0.00-0.02 range covers buried ligand groups only.

|
Table 1. SuperStar validation results for full
and filtered lists. Errors were determined using a bootstrapping procedure
and are given as s.e. in the same units.
|
Prediction rates for the new validation set are very similar to the
ones obtained for the smaller original GOLD validation set. Both CSD and PDB data give similar results. The difference
between results for the full set and the filtered subset (clean list, all)
is negligible, indicating that in this case, exclusion of dubious complexes
does not have a large effect. A possible reason for this is that the errors
in these structures (e.g. clashes) are very localised and effect only a
small part of the ligand.
A trend can be observed in the success rates for the filtered subsets
of limited resolution (clean lists, R<2.0Å, R<2.5Å). As expected,
SuperStar predictions tend to be more reliable for well-resolved protein-ligand
complexes.
Bootstrapping analysis was performed to estimate
the error in the prediction rates. As can be seen, the standard error varies
considerably with the size of the test sets used (size of full set: 305
entries; clean list: 224; clean, R<2.5:180; clean, R<2.0A: 92). The
smaller the set is, the larger the confidence interval for the success
rates derived from it.
SuperStar Results for Different
Types of Protein
SuperStar success rates for different classes of proteins are shown
in Table 2. Results are largely comparable to those observed in the above.
Given the small size of these subsets,
these results should be carefully analysed to see whether or not they actually differ from those seen for the whole set of 305 complexes.
Statistical analysis (chi-squared) revealed that only a few subsets display
results that differ significantly from those expected: for buried ligand
groups in metalloproteases and aspartic proteases SuperStar performs better
and worse than expected, respectively, from the performance of the full
set.
All other protein groups do not yield significantly different results
from what is expected based on the full set (buried ligand groups), showing
that SuperStar performs satisfactorily across the whole range of protein
types.

|
Table 2. SuperStar success rates for protein subsets. Chi-squared
values were calculated for the null hypothesis that both subset and full
set come from the same distribution.
|
|