Celebrating the 1.1 millionth structure
February 1, 2021
Last week, the Cambridge Structural Database (CSD) has passed a total of 1,100,000 unique crystal structures within the database, marking the first milestone on the way to 2,000,000 structures! It wasn’t too long ago we were celebrating reaching 1 million structures (just 20 months!) and despite a year of global challenges and difficulties we are excited to start 2021 with this new achievement. A massive thank you to everyone who has contributed, not only to this feat, but to the continual growth of the CSD.
You might ask why another 100,000 structures is important when we already had a million? Making the entire collection of published organic and metal-organic structures available helps advance science in multiple ways. Firstly, the continuous daily updates enable scientists worldwide to comprehensively search published structures. This means you can easily find out if a compounds solid-state structure has been determined and discover additional details like how the molecules pack together, what the melting point of the crystal is and how was it recrystallised.
Another reason to value an additional 100k structures is because it enables scientists to generate more replete answers from the more complete and diverse volume of information, ensuring confidence in the insights being drawn from the data. An example of this can be seen in our recent release of new IsoStar functional group interactions. New groups include group I ions, selenium and new heterocycles. When IsoStar was first released the data coverage of these groups was not sufficient to build up a knowledge base for them but as the CSD has grown the availability of data for these groups has grown too. This is just one example of how our knowledge from the expanding database can grow.
As the number of entries in the CSD has increased, so too has the diversity of the structures and the discoveries behind them. The latest 100k includes many groundbreaking structures that stretch our understanding of chemistry, here are just a few examples:
- A structure with the shortest Be-N on record (CSD refcode OLAGEQ)
- The first series of planar tetracoordinated silicon compounds with an anti-van’t Hoff/Le Bel ptSi centre (CSD refcode UJETEL)
- A rare and stable Fe(vi) complex synthesised using reactions between a dicarbene iron complex and an organic azide (CSD refcode ZACXAG)
- A trinuclear [Pd2Ru]+ complex that revealed a ruthenium atom simultaneously contributing to two different aromatic systems (CSD refcode KAHRAQ)
- A di-cobalt macrocycle structure which helps us to understand the pathway of C–C bond breaking in acetonitrile with the formation of unusual methyl and cyanide bridged complexes (CSD refcode TUSYAK)
It took over 50 years to reach 1 million structures and we are rapidly leaving this number behind, adding over 50,000 new structures annually since 2014, leading to a steady and consistent growth of the CSD. In the past year, we recorded over 54,000 new entries. This is a remarkable figure when you stop and consider the lab closures and disruption of 2020 and is a testament to the efforts of researchers and crystallographers globally to adapt to the challenges of the pandemic. Behind the scenes at the CCDC (with remote working of course), the database team have continued to process and curate new structures throughout this period. As part of the science editorial team, it is our responsibility to check every incoming structure and ensure we represent the crystallography and chemistry accurately and clearly. Between us, the curation of 50,000+ structures a year can be challenging. However, we are supported by cutting edge automation from deposit through to curation and this speeds up the editorial process for smaller, simple organic molecules and gives us more time to deal with more complicated (often disordered) structures.
A graph to show the growth of the CSD since 1973. Dark blue area represents yearly increase.
Without further ado, I present the 1.1 millionth CSD structure:
The 1.1 millionth CSD Structure CSD Refcode ELOFUJ (10.5517/ccdc.csd.cc2736q9).
For this milestone, the honours have gone to Bill Clegg and Ross Harrington, from Newcastle University, for their contribution of a phosphadiborinane. This adds to the relativity small number borinane structures, of which there are 201 currently in the CSD. When you specifically consider the class of phosphaborinanes (with one, two or three boron atoms), there are just 44 CSD entries. You may also notice that this structure is not associated with a publication but instead is a CSD Communication entry. This is the first time we’ve had the opportunity to recognise a CSD Communication entry as a milestone structure and highlights the demand and growth of this medium for crystal structure dissemination.
A graph showing the increase in CSD Communication structures.
In response achieving to the 1.1 millionth structure, Bill had this to say:
“I'm very pleased that one of our results from Newcastle has been recognised as the 1.1 millionth structure added to the CSD. I've had connections with CCDC for nearly 50 years, doing my PhD research in Cambridge on the same corridor as the CCDC offices in the early 1970s and making early use of graphics software developed by Sam Motherwell. The CSD has made important contributions to my research and teaching ever since, as I described in a CCDC-sponsored event at ECM32 in Vienna and subsequently in the first 2020 issue of the IUCr Newsletter.
The story of this (PBC)2 ring structure is an all too familiar one. Synthesised in 2007, it was not what the chemists expected and wanted, so they had little interest in it and moved on to other products, making a journal publication unlikely. It is, however, a good quality structure and belongs to only a small family of structurally characterised compounds with such a central ring, so its availability in the public domain is valuable. The same is true of many other structures, and the CSD Communication facility is ideal for providing a publication route, one I have been using increasingly over the last few years to deal with a backlog of unpublished structures.”
You can read up more about Bill’s views about using the CSD for research and teaching, in the article he mentioned, here.
In recent years, the CCDC has been encouraging the deposition of crystal structures which are not intended for publication to be submitted as a CSD Communication to expand the database and share knowledge which could otherwise remain unknown. As Bill has highlighted, whether it’s a structure that didn’t quite fit with the “story” of a publication or a “one-hit wonder” which was never followed up on, we are keen to curate and share these structures with the community as these are just as important and valuable as structures in publications. After all, a greater availability of crystallographic data can never be a bad thing can it? After achieving 1.1 million structures, I think it’s safe to say the general consensus is a resounding no!
Join us for our Educators User Group Meeting on the 16th and 17th of March to learn and reflect on Digital, shared global learning. Find out more about this meeting and save your place here.
CSD Communications (20)
Data Update (11)