To Infinity and Beyond? The CSD reaches over 600,000 structures

When I first joined the CCDC almost five years ago the CSD had just passed the 420,000 mark. The half millionth structural determination was included in late November 2009, and the 600,000 milestone was reached just last month: a little over two years later. The growth of the CSD since 1972 is recorded on the website, and shows a continual and ever-increasing rate of growth. This rate of increase has actually been used by researchers to show the development of specific areas of study, I recently attended a talk where it was shown the number of metal-organic frameworks (MOFs) deposited in the CSD was doubling at approximately twice the rate of the CSD as a whole.

As part of the database group at the CCDC I must admit to having slightly mixed feelings about this  ever-growing rate of deposits to the CSD. Of course it’s good news, making the CSD an increasingly valuable tool for researchers. On the other hand, it does mean more work for me, and larger numbers of increasingly complex structures does raise questions on how the working practices of the database group here will have to adapt. In a future blog I intend to give a brief description of the processes that occur to turn your structures into the CSD, and hopefully you'll appreciate it’s not as easy as simply collecting together a group of cifs. If in another five years the number of structures deposited a year doubles, does that mean twice as many employees or will I simply have to work twice as hard? Even if you subscribe to the Stakhanovite philosophy it’s pretty clear there is a potential problem here. So my question here is just how big will the CSD get? Will the number of crystal structures determined increase indefinitely? In these times of financial downturn the disclaimer ‘past performance is no indication of future results’ perhaps raises a few ironic smiles, but it’s equally aptly applied to both the stock market and the CSD.

I’m reminded of a talk by Dr Jonathan Goodman, of the Unilever Centre for Molecular Informatics next door to us here in Cambridge. He described a study published in Nature (and widely reported, including by the BBC) that looked at the world record times of men and women in the 100m. It stated that at their current rate of improvement, women should overtake men (as in have a faster 100m record time) within 150 years. Of course following this trend for a few more hundred years would presumably have sprinters breaking the sound barrier, illustrating the danger of extrapolating too far!

So when will the rate of structures deposited at the CCDC begin to level off? Personally, I can’t see it happening any time soon, for several reasons. The main one being where data deposited with the CCDC comes from. Although the CCDC welcomes Private Communications (i.e. researchers depositing data that has not been published elsewhere) the vast majority of depositions come from published articles, and of course only a tiny fraction of structures determined by X-ray or neutron diffraction ever make it to a published paper. I’m sure most chemists (and I include myself in this) have a whole bunch X-ray data sitting somewhere, metaphorically gathering dust. The rate-determining step in depositions to the CSD is publishing, not data collection. There are already enough cifs sitting on hard drives around the world to keep us busy for the foreseeable future. On top of this are factors such as the increase in speed and availability of diffraction equipment, and of course the increase in published papers. Five years ago almost 800 structures came from the RSC’s Chem Comm Journal when there were 48 issues a year. In 2012 Chem Comm began publishing 100 issues a year.

So how do I see the CSD developing in the next few years? I don’t have a crystal ball, but I might just keep some champagne on ice for the 750,000th structure in 2015!