Unpacking the 800,000th crystal structure

As I write, the CSD now contains 801,590 entries and you can see from our recent announcement, that the 800,000th entry to be added is a di-copper paddle wheel structure containing a uracil derivative published by colleagues in Spain. 

The addition of this milestone structure inspired me to look at how the database is evolving and how representative the 800,000th structure is of its current content.


The 800,000th entry in the CSD, refcode TUWMOP  published in DOI: 10.1021/acs.cgd.5b01110


Let’s start by looking at how the balance of organic and metal organic structures has changed over the years. If we delve into the depths of the CSD we can see that the first organic structure with 3D coordinates was determined back in 1935 ( PHTHCY01, METALD) followed just one year later by the first metal organic structure with 3D coordinates (NIPHTC). In the early days the CSD was clearly dominated by organic structures but today things look a little different with metal organic structures now accounting for 57% of the entries.


Graph of balance of metal organic to organic entries in the CSD


The first copper structure was reported back in 1933 (refcode ZZZPMY) and the first Cu structure with 3D coordinates was in 1952 with refcode CUAPRO.


Refcode CUAPRO: The first Cu structure in the CSD determined with 3D coordinates
 

Copper is popular in the world of X-ray structures, with over 7% of CSD entries containing Cu atom(s) and this rises to over 12% if we restrict ourselves to just looking at metal organic structures. And copper isn’t just popular with crystallographers determining structures it is also a regular search term for scientists searching for experimentally determined 3D structures.
 

A wordle of the most common compound names searched for through WebCSD


If we next look for carboxylate paddle wheel structures and restrict our search to carboxylates with four paddles we find 4,553 examples, with the publication in Crystal Growth and Design accounting for six of these structures. The first of these paddle wheels was determined way back in 1953 (refcode CUAQAC) and today paddle wheels make up just 1% of all metal organic structures. Once again copper is clearly the metal of choice, being involved in a third of published paddle wheels.


A pie chart showing the most common metals found in paddle wheel structures in the CSD
 

The CSD also enables us to examine properties of the crystal structures themselves. For example there are over 159,000 structures with an associated melting point data. In the publication describing the 800,000th entry the authors investigated the magnetic properties of eight compounds. The CSD has well over 7,000 structures described in associated publications as having magnetic properties and we can even delve further to see how many are antiferromagnetic.

Techniques in crystallography have also changed over the years. For example one of the structures published alongside the 800,000th entry has SQUEEZED solvent molecules. This structure, refcode TUWMIJ  demonstrates just how commonplace it is these days to use the SQUEEZE or MASK techniques for predominately highly disordered solvent molecules.  Over 7% of all CSD entries published have used SQUEEZE or MASK to account for disordered electron density in the structure.


A graph showing the rise of the use of SQUEEZE/MASK in the CSD


So let’s raise a glass to Hassanein et al for such an interesting structure no. 800,000 and say “thank you” to everyone who is contributing to the ongoing exponential growth of the CSD, including rapidly increasing numbers of Private Communications.  Now that every CSD entry gets a data Digital Object Identifier and is listed in resources such as the Thompson Reuters Data Citation Index we can make these otherwise unpublished structures, alongside those from journals like Crystal Growth and Design, available to the world.