The recent August update to the Cambridge Structural Database (CSD) brought the total number of entries in the database to over 950,000, meaning the next big milestone will be 1 million. This is a huge achievement of the crystallographic community, and in the months leading up to this milestone we’ll be demonstrating the value that can be gained from this crystal data and looking to what can be accomplished in the future.Continue reading…
One of the benefits of my role at the CCDC is the chance to look at some of the latest scientific research taking place, as I review structures before they are added to the Cambridge Structural Database (CSD). Occasionally I come across a structure that looks quite unusual at first glance, so much so that it’s hard to resist taking a closer look.Continue reading…
01 July 2015, Cambridge, UK and Piscataway, NJ, USA
The Cambridge Crystallographic Data Centre (CCDC) celebrates 50 years of the Cambridge Structural Database (CSD) with a Scientific Symposium and the launch of new software in the CSD-System suite. CSD-System provides full web-based access (WebCSD) to the world’s most comprehensive collection of crystal structure data, plus 3D searching, new advanced visualisations, intermolecular interaction analysis, geometry analysis, and support for tailored application building through the new CSD Python API.Continue reading…
I recently found a blog post from regular Chemistry World contributor Derek Lowe, highlighting an Early View Angewandte Chemie communication (doi: 10.1002/anie.201406886) in which the authors determined the crystal structures of two new polymorphs of the amino acid L-Phenylalanine. The paper also helps to clarify the relationship between several other Phenylalanine structures published over the last 20 years. Although Derek was surprised that determining the structure of a seemingly simple molecule had proved such a challenge for small-molecule crystallography, this type of challenge is not unusual. A notable example is the case of the two polymorphs of D-Ribose which evaded full determination for over 50 years (see ZZZFEE in the CSD from 1956!) until Jack Dunitz and co-workers published an article triumphantly exclaiming “The Crystal Structure of D-Ribose—At Last!” in 2010 (doi: 10.1002/anie.201001266).
The challenges involved in obtaining good quality single crystals to determine a structure should not be underestimated. Prior to the findings of this latest paper, the Cambridge Structural Database (CSD) contained five determinations of the structure of L-Phenylalanine (QQQAUJ-QQQAUJ04), from four different groups of researchers, all proposing different polymorphic forms based on the crystal structure data that they obtained.
I wrote a blog early in 2012, noting the fact that the Cambridge Structural Database (CSD) had grown to include over 600,000 entries. CSD users that visit our online portal to the CSD (known as WebCSD) will see that the latest update (released on 25th March) pushes the size of the CSD to over 700,000 entries.
Crystallography is unique amongst scientific disciplines in that so many data are published and available for others to investigate and utilise. Databases containing records and data from crystallographic experiments (including the CSD, ICSD and PDB) were founded early when the number of experiments being carried out was small. This undoubtedly helped the culture of the community (most crystallographers take it for granted that such databases exist and provide useful data) and has provided comprehensive collections of crystal data. If such databases did not exist and had to be started now, the task would appear overwhelming; the task of keeping the CSD up to date with the number of crystal structures currently published is by no means trivial!
However, every crystallographer has structures that for one reason or another don’t get published, and many scientists have structures in a PhD thesis that they never quite got round to writing up. After all the time, effort (and money) it takes to synthesise a compound, collect crystal data and complete refinement, many structures end up languishing on a hard drive or in a PhD thesis on a shelf. You may not consider your crystal structure(s) to be significant to your research, but aspects of the structure determination experiment or the structure itself (e.g. bonds, angles, torsions, ring geometries), whether conventional or novel, may be significant to another researcher. Adding the structure to the CSD will help others identify the compound (using our free CellCheckCSD software), and add to the body of knowledge about molecular geometry and crystal packing.
In the last week or so we passed another milestone at the CCDC in the building of the Cambridge Structural Database (CSD), by issuing the reference number CCDC 900000. This type of reference number is probably familiar to many people from scientific papers describing X-ray data, and corresponds to a set of X-ray experimental data. This number is issued when data is first sent to us, and stays with the dataset even if undergoes revisions before or during the publishing process. It’s also worth mentioning that structures in the database are not normally referred to by the CCDC number at all, but rather using the six letter CCDC refcode. We’ll talk about refcodes in more detail in an upcoming blog!
This week heralds a major step in the process of producing the new Cambridge Structural Database System (CSD System) software ready for release at the end of the year, with the initial beta release going out to external testers. The annual CSD System release is probably one of the main tasks at the CCDC, involving almost every member of staff in one way or another. The organisation of the new CSD System software release actually began a couple of months ago in July, but it’s around this time that new features really start to take shape.
This year, in response to feedback from users we meet at events, or who contact our CCDC support staff, a lot of effort has been put into improving visualisation and image generation in Mercury (see below for examples of what to expect - we’ll go into more detail about the software enhancements in another blog as the release gets closer). There’s plenty more to do though, with development continuing for at least another month or so – hopefully including lots of feedback from our beta testers! Of course we don’t just rely on external beta testers; lots of testing also goes on-house. This includes testing both the technical and scientific aspects of the new release. Technical testing includes jobs such as making sure the new software installers work on all the Windows, Linux and Mac machines we support. Scientific testing mainly involves CCDC researchers using the new software releases in much the same way our users do, and trying to make sure any new enhancements are intuitive and easy to use.
When I first joined the CCDC almost five years ago the CSD had just passed the 420,000 mark. The half millionth structural determination was included in late November 2009, and the 600,000 milestone was reached just last month: a little over two years later. The growth of the CSD since 1972 is recorded on the website, and shows a continual and ever-increasing rate of growth. This rate of increase has actually been used by researchers to show the development of specific areas of study, I recently attended a talk where it was shown the number of metal-organic frameworks (MOFs) deposited in the CSD was doubling at approximately twice the rate of the CSD as a whole.