• Data improvements in the 2020.0 CSD Release

    The CSD - The world’s essential database of crystal structures

    After celebrating the huge milestone for structural chemistry with the addition of the millionth structure into the CSD in June 2019, the 2020.0 CSD Release now contains 1,034,174 entries and 1,016,168 unique structures. That means an increase of more than 60,000 entries, and we are well on our way to the next million!

    Continue reading…
  • Insights into drug-like compounds from crystal data

    As the size of the Cambridge Structural Database (CSD) has just passed one million structures, it seems an appropriate time to look at some of the applications of this ever-growing resource. Whilst the CSD is certainly useful as a central record of past data collections, perhaps the more significant benefits are the insights that can be gained from looking at this mass of data as a whole. In this blog, I’ll show some examples of what can be discovered from statistics generated from the CSD when looking at drug-like compounds. A paper written by CCDC colleagues with researchers from Pfizer and AstraZeneca (Journal of Pharmaceutical Sciences, Volume 108, Issue 5, 2019, Pages 1655-1662, https://doi.org/10.1016/j.xphs.2018.12.011) gives an in-depth statistical analysis of drug compounds in the CSD.

    Continue reading…
  • CSD Data Curation – the challenge of a million structures

    In a recent blog - CSD Data Curation - The Human Touch -  we’ve described the work that goes on when new structures are added to the Cambridge Structural Database (CSD). However, It’s important to realise that this isn’t the end of the story -  as we get close to adding the one millionth structure to the CSD, it seems like an appropriate time to describe some of the processes we undertake at the CCDC to ensure that the data we make available to scientists continues to empower and inform their research long after it’s initial deposition.

    Continue reading…
  • Countdown to 1 million

    The recent August update to the Cambridge Structural Database (CSD) brought the total number of entries in the database to over 950,000, meaning the next big milestone will be 1 million. This is a huge achievement of the crystallographic community, and in the months leading up to this milestone we’ll be demonstrating the value that can be gained from this crystal data and looking to what can be accomplished in the future.

    Continue reading…
  • Not so Weird and Wonderful?

    One of the benefits of my role at the CCDC is the chance to look at some of the latest scientific research taking place, as I review structures before they are added to the Cambridge Structural Database (CSD). Occasionally I come across a structure that looks quite unusual at first glance, so much so that it’s hard to resist taking a closer look.

    Continue reading…
  • Scientific Symposium and New Software for Crystallographers and Scientists Worldwide to Celebrate 50 Years of the Cambridge Structural Database

    01 July 2015, Cambridge, UK and Piscataway, NJ, USA

    The Cambridge Crystallographic Data Centre (CCDC) celebrates 50 years of the Cambridge Structural Database (CSD) with a Scientific Symposium and the launch of new software in the CSD-System suite. CSD-System provides full web-based access (WebCSD) to the world’s most comprehensive collection of crystal structure data, plus 3D searching, new advanced visualisations, intermolecular interaction analysis, geometry analysis, and support for tailored application building through the new CSD Python API.

    Continue reading…
  • Problematic Polymorphs

    ​I recently found a blog post from regular Chemistry World contributor Derek Lowe, highlighting an Early View Angewandte Chemie communication (doi: 10.1002/anie.201406886) in which the authors determined the crystal structures of two new polymorphs of the amino acid L-Phenylalanine. The paper also helps to clarify the relationship between several other Phenylalanine structures published over the last 20 years. Although Derek was surprised that determining the structure of a seemingly simple molecule had proved such a challenge for small-molecule crystallography, this type of challenge is not unusual. A notable example is the case of the two polymorphs of D-Ribose which evaded full determination for over 50 years (see ZZZFEE in the CSD from 1956!) until Jack Dunitz and co-workers published an article triumphantly exclaiming “The Crystal Structure of D-Ribose—At Last!” in 2010 (doi: 10.1002/anie.201001266).

    The challenges involved in obtaining good quality single crystals to determine a structure should not be underestimated. Prior to the findings of this latest paper, the Cambridge Structural Database (CSD) contained five determinations of the structure of L-Phenylalanine (QQQAUJ-QQQAUJ04), from four different groups of researchers, all proposing different polymorphic forms based on the crystal structure data that they obtained.

    Continue reading…
  • 700,000 high quality crystal structures now at CSD users’ disposal!

    ​I wrote a blog early in 2012, noting the fact that the Cambridge Structural Database (CSD) had grown to include over 600,000 entries. CSD users that visit our online portal to the CSD (known as WebCSD) will see that the latest update (released on 25th March) pushes the size of the CSD to over 700,000 entries.

    What users may not be aware of is that the CSD has actually held over 700,000 entries since December 2013, but in a way that’s a bit harder to spot. WebCSD is frequently updated to include the additional crystal data as it reaches us here in Cambridge. The last update of 2013 brought the total number of structures in the CSD to over 687,000. Not a particularly round number you may think. However, the update also included the most recently published data available to the CCDC - over 19,000 structures - which we make available alongside the CSD as CSD X-Press. That brought the total number of structures available to CSD users to over 700,000!
    Continue reading…
  • Make Private Communications your New Year’s resolution!

    ​Crystallography is unique amongst scientific disciplines in that so many data are published and available for others to investigate and utilise.  Databases containing records and data from crystallographic experiments (including the CSD, ICSD and PDB) were founded early when the number of experiments being carried out was small. This undoubtedly helped the culture of the community (most crystallographers take it for granted that such databases exist and provide useful data) and has provided comprehensive collections of crystal data. If such databases did not exist and had to be started now, the task would appear overwhelming; the task of keeping the CSD up to date with the number of crystal structures currently published is by no means trivial!

    However, every crystallographer has structures that for one reason or another don’t get published, and many scientists have structures in a PhD thesis that they never quite got round to writing up.  After all the time, effort (and money) it takes to synthesise a compound, collect crystal data and complete refinement, many structures end up languishing on a hard drive or in a PhD thesis on a shelf.   You may not consider your crystal structure(s) to be significant to your research, but aspects of the structure determination experiment or the structure itself (e.g. bonds, angles, torsions, ring geometries), whether conventional or novel, may be significant to another researcher. Adding the structure to the CSD will help others identify the compound (using our free CellCheckCSD software), and add to the body of knowledge about molecular geometry and crystal packing.

    Continue reading…
  • Who wants to be a millionaire? The CCDC issues number CCDC 900000.

    In the last week or so we passed another milestone at the CCDC in the building of the Cambridge Structural Database (CSD), by issuing the reference number CCDC 900000. This type of reference number is probably familiar to many people from scientific papers describing X-ray data, and corresponds to a set of X-ray experimental data. This number is issued when data is first sent to us, and stays with the dataset even if undergoes revisions before or during the publishing process. It’s also worth mentioning that structures in the database are not normally referred to by the CCDC number at all, but rather using the six letter CCDC refcode. We’ll talk about refcodes in more detail in an upcoming blog!

    Continue reading…