Big data leads the way for structural chemistry

Back To Discover

Written by


Posted on

June 6, 2019

The Cambridge Structural Database reaches one million structures, leading the way in structural data to inform drug discovery and materials development


Cambridge UK, 6 June 2019.  CCDC (The Cambridge Crystallographic Data Centre), world-leading experts in structural chemistry data, software and knowledge for materials and life science research and application, today announced a huge milestone for structural chemistry with the addition of the millionth structure into the Cambridge Structural Database (CSD).

The CSD is the world's repository of highly curated experimentally determined organic and metal-organic crystal structures.  It is used globally by scientists in over 70 countries to understand how molecules behave and interact in three dimensions in the solid form and ultimately how this affects physical properties.

As the interest in ‘Big Data’ continues to grow in a time where machine learning and automation are changing the way pharmaceutical, agrochemical and many other industries work, reaching such a significant milestone is a huge achievement for the CCDC and the wider scientific community that contribute to and rely on this resource.

Large volumes of data such as this enable scientists to generate more replete answers from a more complete and diverse volume of information, ensuring confidence in the insights being drawn from the data.  Furthermore, CCDC’s focus on ensuring the integrity of the data within the CSD through stringent quality assurance and control steps adds even more value and confidence that scientists are obtaining the highest quality information to inform their research.

This rich data resource, alongside advanced search, 3-D data mining, analysis and visualisation software from CCDC enables scientists from both industry and academia to further their research and predict new outcomes. In addition, knowledge derived from the CSD underpins computational chemistry and molecular modelling and is relied on by industry for the development and manufacturing of new drugs and within academia to teach chemistry.

Dr Jürgen Harter, CEO of CCDC commented, ‘This is truly an important milestone not only for CCDC but also for the wider scientific community.  In addition to the value that lies in large sets of data like this to help scientists inform their research and decision making, we also pride ourselves on the high quality of the data, a result of the hard work of our expert in-house database team. Maintaining a policy of strict data interrogation ensures the value of the plentiful insights that can be drawn from the CSD, avoiding misinformation that can lead to wasted time, resources and ultimately cost.’

CCDC have announced the 1,000,000th structure to be a N-heterocycle produced by a chalcogen bonding catalyst activating multiple reactions steps sequentially. In the paper the authors describe a class of extraordinary chalcogen-bonding catalysts which enable the assembly of discrete small molecules leading to the construction of N-heterocycles in a highly efficient manner. The structure was determined by Yao Wang and co-authors from Shandong University in China and published in the Journal of the American Chemical Society (JACS).

Image of the millionth structure

CSD Refcode XOPCAJ (DOI 10.5517/ccdc.csd.cc20vdhs) the million structure added to the CSD

‘We’d like to congratulate Yao Wang and all of his co-authors, for publishing the millionth structure and we are so grateful to the 350,000 plus scientists from around the world that have contributed their data, enabling us to reach this milestone and placing CSD as the go-to resource for structural information within the scientific community’, said Suzanna Ward, Head of the CSD.

Dr Wang commented ‘We are delighted to hear that our structure (1-(7,9-diacetyl-11-methyl-6H-azepino[1,2-a]indol-6-yl)propan-2-one; CSD Refcode XOPCAJ) is the millionth structure to enter the CSD! We have used the CSD for over ten years because it is an excellent platform to report new crystal structures and an outstanding database to find inspirable chemical structures.  It is a valuable resource to us and to many other scientists around the world so we are very proud to be associated with this milestone for the community.’

Peter Stang, Editor-in-Chief, JACS, said “We are delighted to hear that the millionth structure in the CSD was published in JACS. We know our readers value the CSD as a trusted repository of structural data and some of our authors have demonstrated how this rich resource can accelerate scientific research.  Our continued collaboration with the CCDC helps make this wealth of data more accessible to the community as well as helping us ensure the integrity of data published in our journals and we are proud to be associated with such a significant milestone in structural chemistry.”

When asked what’s next for the CSD, Dr Harter commented that although the use of the CSD in the pharmaceutical and agrochemical industries is already well-established, it is now fast becoming a fundamental resource for research into new materials such as batteries, paints, pigments and dyes, and in particular the development of gas storage frameworks and tailored catalysts. As environmental contamination and sustainability become increasingly important there is considerable potential on a global scale.

CCDC have noted a consistent rise in deposits from research taking place in China over recent years.

“It is an exciting time for life science and materials development research with markets such as China leading the way in scientific discovery.  We are excited to see what insights we obtain from this market going forward” Dr Harter commented.

CCDC also have plans to further draw on insights and trends from the data to inform the direction of future research across different industries

For more information visit: /csd-1-million


About CCDC

CCDC are world-leading experts in structural chemistry data, software and knowledge for materials and life science research and application.

They are dedicated to the advancement of chemistry and crystallography for the public benefit.  They specialise in the collation, preservation and application of scientific structural data for use in pharmaceutical discovery, materials development and research and education.

CCDC compile and distribute the Cambridge Structural Database (CSD), a certified trusted database of fully curated and enhanced organic and metal-organic structures, used by researchers across the globe.

Their cutting-edge software empowers scientists to extract invaluable insights from the vast dataset, informing and accelerating their research & development.