Using Crystallographic Structures and Data-Driven Solutions To Advance Drug Design
This blog is based on the October 11th Chemistry World Webinar, where Dr Jason Cole, Senior Research Fellow at the CCDC, introduced the CCDC and the Cambridge Structural Database (CSD), and talked about how structural data can be used to empower molecular discovery.
Who Are the CCDC? And What Is the CSD?
The CCDC was established in 1965 by Dr Olga Kennard, a pioneering crystallographer who had “a passionate belief that the collective use of data will lead to the discovery of new knowledge, which transcends the results of individual experiments”.
The CCDC are a not-for-profit, registered charity, having as objective the advancement of chemistry and crystallography for the public benefit through providing high quality information services and software.
Alongside the CCDC, Dr Kennard also established in the same year the CSD, a database of validated and curated small molecule organic and metal–organic crystal structures that contains nowadays over 1.26 million structures.
With more than 55,000 depositions and around 75,000 new structures per year, the CSD puts together the work of more than 6,000 crystallographers who contribute with their crystallographic data across 80 countries worldwide.
Why Are the CSD and the CCDC Tools Important?
The CSD is a source of data that is fundamental for creating knowledge and novel insights, which can be used for applications such as developing new drugs, agrochemicals, and new functional materials. While industry focuses on producing new products, academics are interested in fundamental research, and the CSD crosses all these boundaries.
Different portfolios are hence presented to the community. CSD-Community is a collection of free products which includes tools to deposit crystallographic data, as well as to access and visualize crystal structures. CSD-Core includes the CSD and software that enables powerful search tools, advanced 2D/3D visualisation, and interactions analysis on crystal structures. CSD-Discovery and CSD-Materials provide applications for respectively designing new molecules and assessing solid form stability and propensity. Finally, CSD-Particle and CSD-Theory are the most recently developed tools, and they are mostly used towards the end of the drug discovery pipeline. While CSD-Particle can help anticipating particle properties and behaviour, CSD-Theory gives insights from predicted structure landscapes.
How Can Small Molecule Crystal Structures Help in the Drug Discovery Pipeline?
The drug discovery pipeline starts with the target selection, and hence the binding site characterization and pocket searching, to then proceed to the hit-identification using assays or structure-based virtual screening. Hit to lead and lead optimization are then performed to optimize the compound geometry and to check the impact of changes with docking pose prediction, before moving to the drug development phase.
But what do crystal structures tell us? And where does a crystal structure fit in the drug discovery pipeline? Crystal structures can give valuable information regarding the conformation that the molecule occupy, and the crystallographic packing of the molecules can tell us which interactions they form. To understand where crystal structures fit in the drug discovery pipeline, a summary of the characteristics of the CCDC software will now be presented, alongside case studies where the tools are applied.
How To Perform Crystal Structure Searches
With the CCDC products, it is possible to perform both onsite and desktop crystal structure searches.
Onsite searching can be done through the WebCSD portal, which allows users to do classical searches such as identifiers and compound names, but also to do structure searches and 3D parameter searches.
Desktop searching can instead be done through ConQuest, which enables more advanced searches than WebCSD including context and criteria matching, and CSD-CrossMiner, which allows users to perform pharmacophore-based searches in the CSD, PDB and any in-house database simultaneously.
A typical workflow for a pharmacophore search starts by taking a protein-ligand complex, and then defining the ligand and protein pharmacophore points, alongside the constraints between those points. The matching hits within the CSD and PDB are finally overlayed to allow the user to explore each pharmacophore and interpret how they behave.
An example of a work where CSD-CrossMiner was used, was recently published in J. Med. Chem [1]. In this work, the scientists performed a pharmacophore search to find evidence, searching both the CSD and PDB, to confirm whether they could move from the para-substituted lead compound to the meta-substituted ones. Interestingly, the CSD hits from the pharmacophore search suggested that a meta-linked linker system would work, and this opened up to the investigation of a new series of compound for the treatment of breast cancer. Follow the link to read more about this case study.
Another example was reported in J. Med. Chem. in 2022 [2], where a pharmacophore search was performed to investigate the nature of zinc-binders. Using CSD-CrossMiner, the scientists could search the PDB for binding site pockets that were similar to the one in the original model and could look at how those pockets were interacting with other systems. The results of the search showed that a rotation of the sulphonamide residue into a different conformation would sensibly improve the model, allowing the scientists to gain more predictivity for their system in the interaction with zinc ions. Follow the link to read more about this case study.
How Crystal Structure Help Drug Hunters
It is important to consider that the conformations of molecules within a crystal structure are not necessarily the bioactive conformations. They will only be the conformations that molecules crystallized in. For this reason, the CCDC started considering the information that could be gained by looking at the entire collection of 1.26 million crystal structures found in the CSD and mining it for trends. Some CCDC tools that allow users to look at this collection are IsoStar, SuperStar and Mogul.
IsoStar is a knowledge-based library of intermolecular interactions, which provides thousands of interactive 3D scatterplots that show the probability of occurrence and spatial characteristics of interactions between pairs of chemical functional groups.
SuperStar allows the users to visually probe protein-ligand structures and understand their binding sites by providing knowledge-based pharmacophore generation and prediction of intermolecular interactions. An interesting example was reported in Eur. J. Med. Chem [3], where SuperStar was used to validate docking poses generated by GOLD for a new class of aldose reductase inhibitors that could help with the treatment of diabetes. Follow the link to read more about this case study.
Moving the focus from interactions to conformations, Mogul is a tool that mines millions of chemically classified bond lengths, angles, torsion angles and ring conformations in the CSD, and provides precise information on favourable molecular geometries. For example, in the case reported in Figure 1, the group was interested in understanding if the ligand conformation in a particular bound system (PDB Code 7qdl) was reliable [4]. The carboxamide fragment was in fact slightly twisted rather than being planar, but the Mogul search in the CSD showed that the observed angle was within the torsion angle distribution, and hence was acceptable. Follow the link to access the article.
Applying Knowledge: How Information Are Used in Software
The CCDC portfolio includes software that directly access and use the wealth of information and knowledge in the CSD. An example is the CSD-Conformer Generator, a tool that allows the users to quickly generate and visualize possible solid forms, including alternative polymorphs and stoichiometries.
CSD-Conformer Generator can now generate a starting 3D structure from SMILES using CSD Python API and uses probability density functions based on the CSD distributions to drive optimization and sampling.
An interesting example in which this tool was used was reported in 2019 in PLoS Pathogen [5]. The work involved a conformational analysis which started with the generation of conformations for each of the investigated molecules using the CSD-Conformer Generator. Each conformation was then optimized with DFT, and the energies of the conformers were compared. This analysis allowed the scientists to understand that the addition of a third chlorine atom in the aromatic ortho position of the phenyl ring of pydiflumetofen influenced the molecule’s conformational preferences. In particular, it allowed the ring to be rotated in a way that reduced the steric hindrance of the compound in altC-SQR, a succinate-ubiquinone oxidoreductase (SQR) enzyme, leading to a decrease of the resistance factor for pydiflumetofen. Follow the link to access the article.
The CCDC is not the only organization that makes use of these knowledge. The wwPDB uses Mogul directly in its validation reports. When searching for a PDB entry, for example, there are information about the likelihood of a particular bond length, bond angle and torsions, parameters that give an indication of how well refined the particular structure is (Figure 2).
Finally, the CCDC is well known as a world-wide provider of a protein-ligand docking program called GOLD, another example of a program that benefits from having access to the wealth of information contained in the CSD. An extensive overview of GOLD, alongside four case studies that use this versatile tool, can be found at this link.
What About Drug Development?
The CCDC have also developed tools that can be used in the drug development steps, focused on the particle community. CSD-Particle can in fact rapidly analyse the mechanical and chemical properties of crystalline particles using a suite of visual and statistical tools that help solving the formulation issues encountered in drug development.
With CSD-Particle, slip planes can be visualized in crystal structures, and the most likely slip planes can be identified. The surface properties can also be analysed, delivering information about topology, H-bond acceptors and donors, and aromatic areas in structures. All this are valuable information that allow the users to get a deep understanding of the nature of the surfaces. As the crystal surfaces can strongly influence how a particular crystal behaves in a given tablet, CSD-Particle provides fundamental knowledge that can help choosing the best solid form.
Next Steps
To discuss further and/or request a demo with one of our scientists, please contact us via this form or .
References
[1] J. Med. Chem., 2023, 66, 4, 2918–2945.
[2] J. Med. Chem., 2022, 65, 24, 16234–16251.
[3] Eur. J. Med. Chem., 2017, 125, 965-974.
[4] J. Med. Chem., 2022, 65, 3, 2262–2287.
[5] PLoS Pathog., 2019, 15(12), e1007780.