Following a welcome from Carmen Nitsche, General Manager at CCDC’s US arm, Dr Pete Wood, Senior Product Manager at CCDC, explained how the data from one million crystal structures in the CSD is being leveraged to accelerate science through AI and machine learning. The CSD is a true data lake – a vast collection of experimental data gathered in a systematic way and curated for over 50 years. The aggregation of experimental datasets provides a foundation for resources that enable structural knowledge to be applied to scientific challenges across sectors and domains.
Pete provided examples of where the CSD has been used in AI/machine learning models including the planning and design of chemical synthesis, to build predictive models and in the generation of new ideas. Whilst there are some challenges in machine learning (for example, the lack of negative data) the scientific community is just scratching the surface in this space and there’s lots more to come.
A number of practitioners described how they’ve used the CSD in their professional lives in differing scenarios. Dr Neysa Nevins of GSK gave a fascinating overview of her history with the CSD starting in 1995 checking torsion angles whilst postdocing at Emory University through the last 17 years in her time at the pharma giant. In addition to her fascinating case studies, Neysa has some great ideas for future developments that will help pharmaceutical scientists in areas such as solubility prediction, conformer prediction, tautomer prediction, interaction hotspots and the like.
From the academic community, Jen Werner of Georgetown University provided a very powerful demonstration of the utility of the CSD in pharmaceutical sciences. About a third of pharmaceutical compounds can form hydrates and transformation between the hydrate and anhydrate forms can occur during manufacture and storage. Using the CSD knowledge base Jen showed how it is possible to identify classes of compounds that have greater or lesser propensity to undergo transformation.
An update on the CCDC product roadmap provided detail on the developments that are being made across the key product streams including discovery, materials and particle science in addition to foundational developments such as the new database architecture. This is particularly exciting as it will not only support the growth of the CSD with additional entries, but also the extension to new descriptors (meta-data) and physical properties and allow consolidation and expansion of small molecule, protein structure, protein-ligand binding sites and calculated structures in a single environment.
The final sessions of the day saw delegates participate in workshops to review and prioritise future developments in CCDC tools covering discovery, materials and integration with other software tools and workflows. The outputs of these sessions will now be fed into the product management process to support the design of new product and improvements to existing features.
As one delegate commented, “Travel budgets are tight but the opportunity to meet with practitioners from across industry and academia and catch up on the latest developments in the field means the user group meeting more than justifies the investment”.
We are looking forward to next year's user group meetings, with plans to host in several locations across the US. Keep an eye out for details coming soon!
If you would like a copy of the slides from this year's UGM, please contact us at firstname.lastname@example.org.
You might also be interested in our upcoming CCDC update webinar taking place on November 21st. In this month's webinar we will be taking a sneak peak at the upcoming 2020.0 release so don't miss out! Find out more and register here.