Celebrating the 1.25 Millionth Structure
August 18, 2023
In 2019, the Cambridge Structural Database (CSD) reached 1 million structures, leading the way in structural data to inform drug discovery and materials development. Since then, the wealth of information available through the CSD has continued to grow, and this week the Cambridge Crystallographic Data Centre (CCDC) announces that the CSD has passed the milestone of 1.25 million expert-curated experimental crystal structures!
This achievement was reached with the electron diffraction structure of ibuprofen, published in Acta Cryst. A by L. Palatinus et al. (CSD Entry: JEKNOC16). This work examines various models for the correction of estimated errors of reflection intensities in electron diffraction data, proving how one of those could notably enhance the accuracy of atomic positions and covalent bond lengths, and improve R factors.
In response to achieving the 1.25 millionth structure, Dr Lukáš Palatinus had this to say:
“It is an unexpected privilege and pleasure to have helped to achieve such an important milestone for CSD and for the whole crystallographic community.
I always illustrate the importance of crystallography to my students by reminding them that the number of published structures grows with the pace of about one structure every 10 minutes, day and night, 24/7. 1.25 million published structures is an amazing number, and I want to congratulate the whole team behind the CSD as well as to all fellow crystallographers who made this possible.
I think it is also symptomatic that the 1.25 millionth CSD structure is a structure determined by 3D electron diffraction. I believe that the number of structures determined by 3D ED in the CSD will quickly grow in the near future.”
The number of structures in the CSD measured using electron diffraction techniques is increasing rapidly. Today the CSD contains over 300 electron diffraction structures. During the data curation process, entries identified as having been measured using electron diffraction techniques are flagged and labelled accordingly. These structures are then easily accessible through the electron diffraction subset.
The addition of the 1.25 millionth structure to the CSD is a great achievement for the community. Being the world’s largest database of small-molecule organic and metal-organic crystal structure data, the CSD is used in over 70 countries by scientists and crystallographers working both in academia and in pharmaceutical companies. Each of the structures deposited by scientists from around the globe is curated by our expert scientific editors prior to entry in the database. It represents a fundamental platform to access structural data, to understand how molecules behave and interact in three dimensions in the solid state, and ultimately to investigate how their structure influences their physical properties.
The CSD represents a rich data resource that continues to expand and grow while maintaining high quality. Big data is becoming widely used in many industries to help develop and accelerate the R&D processes. To this regard, the CSD is a trusted resource that is relied upon by many.
With the quick spread of Artificial Intelligence (AI) that the world is facing nowadays, it is important for big data to be easily accessible and readable. The CSD continues working on enhancing the data with metadata such as names, diagrams, and properties to make them Findable, Accessible, Interoperable, and Reusable, as indicated by the FAIR data principles.