CSD Molecular Complementarity Tool Domain of Applicability
November 29, 2022
The Molecular Complementarity component is used to assess the likelihood of two molecules forming a co-crystal. We have recently validated this approach with a new dataset, and have found limitations to the model’s applicability. Here we explain the validation process, and advise when this model should be used.
What Does the Molecular Complementarity Analyser Do?
This means that the list of possible co-formers for a given target can be reduced before running more effective but more computationally-demanding approaches like motif search or multi-component hydrogen bond propensity calculations.
How Was the Molecular Complementarity Component Developed?
The original research and validation on the Molecular Complementarity component in CSD-Materials were performed on small, neutral molecules with a dataset containing only positive observations (i.e. experimentally observed co-crystals), rather than negative observations (i.e. experimentally observed failures to co-crystallize). For co-crystallization to be likely, five key molecular descriptors were identified for which the difference between the values for the two co-crystal components should fall below determined threshold values. These descriptors are; the fraction of nitrogen and oxygen atoms, the dipole moment and three simple shape descriptors based on a molecular bounding box – the length of the short axis, the short/long axis ratio, and the medium/long axis ratio.
Full details are available in Laszlo Fabian’s original (2009) co-crystal research into molecular complementarity here: https://doi.org/10.1021/cg800861m
The CCDC later incorporated this Molecular Complementarity approach into a workflow of knowledge-based approaches to co-crystal design. This recommended workflow primarily utilizes the molecular complementarity component within CSD-Materials as an early stage in the co-former screening process to remove those co-formers which are highly unlikely to form co-crystals.
Learn more in this 2014 publication on a workflow of knowledge-based approaches to co-crystal design using CSD software and data: https://doi.org/10.1039/C4CE00316K
How Was the Latest Validation Performed?
For the latest validation exercise, CCDC researchers identified a dataset of approximately 2.5K co-crystal experiment observations from the literature. From this full list of observations, the team created a dataset of 45 APIs (Active Pharmaceutical Ingredients), which all had >15 co-formers in the screen, totalling about 1,500 observations overall. This list included both positive and negative observations. The 45 APIs were then used to validate several different co-crystal screening methods, including molecular complementarity.
As seen in the table below, 39 of the APIs predicted experimental forms in at least one of the co-crystal screening methods. Precision was below 0.5 for most screens. Accuracy was variable, with some highly accurate screenings and some with low accuracy, particularly in APIs with a molecular weight greater than 300 Da.
% screens with Accuracy >0.5
(at least half the observed co-crystals are predicted)
% of screens with Precision >0.5
(at least half predicted co-crystals are correct)
|% with F1 > 0.5|
|39 screens that predicted experimental forms||64%||38%||38%|
|8 screens that had MW > 300 screens||0%||50%||12.5%|
The team investigated how the Molecular Complementarity statistical model’s 5 molecular descriptors discriminated between True Positives and True Negatives in the updated validation dataset. They found that the descriptors showed a high degree of overlap between the True Positives and True negatives, indicating the descriptors are not reliable for discriminating between the two groups.
What Conditions Can the Molecular Complementarity Component Be Used In?
Recent new validation work by researchers at the CCDC, taking into account a dataset containing both positive and negative experimentally-validated co-crystallisation outcomes, has now shown that the Molecular Complementarity approach has variable accuracy for small molecules, but overall has poor precision.
The Molecular Complementarity component could be used with molecules that are similar to the original validation dataset, which are neutral and have a molecular weight between 60 and 245, or to calculate the 5 molecular descriptors.
If you have any questions about use of the molecular complementarity component, or any other features of CCDC software, please contact our technical support team on firstname.lastname@example.org.