750,000 structures and counting…

Back in 2012, when the number of structures in the Cambridge Structural Database (CSD) reached 600,000, Seth Wiggin, a Senior Scientific Editor at the CCDC suggested in his blog that we should put the champagne on ice for the 750,000 structure in 2015. Well, I am pleased to say that thanks to the hard working crystallographic community, the CSD has reached this milestone earlier than predicted, with this week’s update taking the CSD to 752,200 entries. You can read about entry number 750,000 here.
In May this year, CSD editors processed a startling 1,003 structures in a single day, illustrating the tremendous output of the crystallographic community. To add further perspective, this is more structures than were published during all of 1968. Adapting to such productivity has been a tremendous challenge and these processing rates are only possible because of a new computational infrastructure system, generated by our in-house software developers and introduced over the past year. This system allows the rapid assignment of CCDC numbers for our depositors and ensures our scientific editors are able to employ their expertise effectively without devoting their evenings and weekends to keep up with all your structures!
The value of the CSD system is directly linked to the contributions of dedicated scientists who publish and deposit crystallographic data every day. A diverse range of scientific fields are indebted to them for sharing this data – take a bow, the particularly prolific crystallographers depicted in this infographic.


WordleA wordle of the top 100 authors in the CSD
As well as an increasing numbers of structures, you may also be interested to learn that the number of authors per structure has risen over the years too, from fewer than two on average when the CSD began in 1965 to more than five today. Not only are you increasingly sharing your data through the CSD, you are also collaborating and sharing your data more with co-authors.
Another factoid of note is the number of atoms per structure in the CSD is also on the rise. Currently, there are over 60 million atoms in the CSD and with this growing at a faster pace than the number of structures it will be very interesting to see which milestone is achieved first; the CSD containing a million entries or the CSD containing a billion atoms!  At this time, the prize for the most atom types in a single structure goes to LIMSUW with a massive quantity of eleven different atoms.
Refcode LIMSUW- The structure with 11 different atom types and a formula of C36H70Ag2Cl4Co2F6N2O26P6Ru2S2
The collection of structures also allows us to look at trends in chemistry and see how techniques in crystallography have changed over the years. For example, the number of structures containing atoms modelled over multiple sites is also on the rise. If this growth continues at the current rate then we will find 50% of the deposited structures are disordered by 2055 and all of the structures deposited are disordered by 2155!
A graph showing the rise of the number of structures published that are modelled over multiple sites


We are often asked to speculate as to when the count of structures in the CSD will reach one million entries? Well, if the number of published structures continues to grow at the current rate then we will be celebrating this remarkable achievement in 2017. However, we all know every crystallographer has structures that for one reason or another don’t get published. They may be awaiting a few tweaks or their creator may not yet have got around to awaking them from a dusty hard drive somewhere. So, if this sounds familiar and you have any unpublished crystal structures in your archive, why not help us reach a million structures earlier?  Adding them as a Private Communication to the CSD will further increase the impact of crystallography on to the scientific community. With this being the International Year of Crystallography and with next year marking 50 years of the CSD, I think you would agree that it would be a very fitting year to celebrate sharing a million structures!