Reviewing the impact of one million crystal structures

In a new review, coinciding with the millionth structure being added to the CSD, researchers from the Cambridge Crystallographic Data Centre (CCDC) have highlighted the incredible impact this database has had on the scientific fields of Physics, Chemistry and Biology since its inception. This success of the crystallographic community in sharing data has been fundamental to advances in the structural sciences. The future of the database also looks to be strong, as the demand for big data, machine learning and the design of new materials with tailored properties continues to increase.

The CSD is one of the oldest scientific databases in the world, but fundamental to its beginnings was the vision of its originators, Olga Kennard and J. D. Bernal, that the "collective use of data would lead to the discovery of new knowledge which transcends the results of individual experiment". This kind of worldwide collaboration to share and curate experimental data, beginning in 1965 at the height of the Cold War and when computers were in short supply, was a bold endeavour and one that required much determination and passion to get off the ground.

  • Research applications of the CSD really do transcend the individual experiment and, in fact, the field of crystallography entirely. The use of the CSD has been critical in understanding the size of atoms (van der Waals radii), the geometry and shapes of chemicals (standard bond lengths and angles) and the nature of fundamental interactions such as hydrogen-bonds, which drive the structure and behaviour of proteins.
  • The data in the CSD is often used by research scientists worldwide through the suite of knowledge-based CSD software; applications which convert the raw data into applied knowledge and provide insights that can help scientists innovate and discover new drugs or materials faster. Use of the CSD to design new drugs and agrochemicals has been commonplace since the 1980s and this has further expanded in the last decade to include analysis and design of crystalline materials by industrial companies.

Each structure in the CSD is meaningful, but the real value is in the collective knowledge and informatics derived from the database as a whole

  • Even apparently inconsequential details of research methodology can matter and lead to incorrect results and misinformed conclusions. The expert Scientific Editors at the CCDC use automated and manual processes to ensure only accurate data is available with the CSD.

  • The growing trend in online data collation and storage, combined with the use of artificial intelligence and machine deep learning to draw knowledge and insights from the data, presents exciting opportunities for the future of the CSD. However, collaboration with new and existing scientific databases is required to ensure the CSD is accessible to new areas of research and client applications.


“Reaching the one million mark is a great achievement for CCDC and the whole community and so it seemed a perfect time for us to delve into the research and applications using the data within CSD.” said Peter Wood, Senior Product Manager and Research Scientist at CCDC, as well as corresponding author on the paper published in Chemical Reviews. “This review has allowed us to reflect on how the CSD has contributed to research across many fields of science over the past 54 years, and what is required from us at CCDC to ensure that the scientific community is able to keep building on this foundation of knowledge into the future.”


Robin Taylor, CCDC Emeritus Research Fellow and co-author of the paper commented “This review means a lot to me - the CSD had only 30,000 structures when I first started working with it and it's incredible to me that it's now reached a million. It's such a powerful research tool, and Pete and I wanted to exemplify this. I rarely get emotional about science but when the millionth structure was added on June 6th I just looked at the number on the CCDC website for a few minutes. Of the many thoughts I had, the dominant one was this: what a remarkable example of scientific cooperation. We should all be proud.”


Jürgen Harter, CCDC CEO, commented “This review clearly emphasises the importance of the CSD and big data within many research applications. It is very interesting to see the different types of research using the CSD. With the continued growth of the database, alongside collaboration with new and complementary databases, we are excited to see what value the CSD can offer researchers across emerging applications in the future”.


“The addition of the one millionth structure to the Cambridge Structural Database (CSD) represents an important landmark for structural science” said Paul Raithby, former Trustee of the CCDC. ‘This review is highly informative and entertaining, outlining the development of the CSD from its early days to the present. It confirms, if confirmation was needed, that the CSD has become an invaluable tool in the armoury of structural scientists from right across the physical and life sciences. The review highlights the range of scientific developments that the CSD has underpinned and shows how it will be of increasing importance in the future as machine learning and artificial intelligence techniques grow. The future for the CSD is indeed bright and we look forward to the exciting new science represented by the next million structures to be added to the CSD.”


The full review is published in the journal Chemical Reviews produced by the American Chemical Society (ACS), https://doi.org/10.1021/acs.chemrev.9b00155