Glossary of Cambridge Structural Database (CSD) Terms
October 24, 2022
What is a CSD refcode? What is a CSD deposition number? What’s the difference between a CIF and a GCD file? Here we present a quick glossary of key terms to help your work with the Cambridge Structural Database, or CSD.
This collection of over 1.2 million small-molecule organic and metal-organic crystal structures, curated for use in cheminformatics and computational chemistry work, is the result of the global scientific community’s contributions since 1965. The data are used by scientists around the world, in commercial and academic research.
Specialist terms are used to reference individual structures, talk about data at different stages, or define properties of structures in the CSD. This glossary defines many of the technical terms used when working with the CSD. If you think of a term that we haven’t covered, contact us here and we can update the list!
Top tip: a lot of technical terms and definitions are also included in the ConQuest User guide.
CSD Technical Terms Definitions
The Cambridge Crystallographic Data Centre. A non-profit organization responsible for curating and disseminating the Cambridge Structural Database (the world’s largest repository of small molecule crystal structures).
Deposition number assigned to a dataset when it is deposited at the CCDC. See Deposition number for more details.
A service run by the International Union of Crystallography (IUCr) that checks the consistency and integrity of CIFs. It is available to run during deposition and from the IUCr website. A list of the tests carried out, and further information about what they mean, can be found here.
Crystallographic Information File – A standard format for capturing and reliably exchanging the results of a crystal structure determination.
CIF is also sometimes referred to as a Crystallographic Information Framework, which reflects that a CIF has dictionaries and rules that enable many aspects of an experiment to be meaningfully (i.e. semantically) captured to enable reuse by researchers and machines.
Compounds may have alternative or so-called trivial names – referred to as Synonyms.
The Cambridge Structural Database. A collection of over one million curated experimental organic and metal-organic crystal structures that have been determined by researchers worldwide.
A structure/dataset published directly through the Cambridge Structural Database (CSD) without an accompanying scientific article.
This is defined in CQ user guide as:
Density of the crystal, calculated from the reported chemical formula and unit cell data, using the relationship: Density = (1.66 x formula weight x Z) / unit cell volume where Z is the number of molecules in the unit cell.
Uniquely identifies a specific dataset/structure deposited by a researcher with the CCDC.
Used to connect datasets with articles.
Remains the same if the dataset is updated up to the point of publication.
The Digital Object Identifier (DOI) is a unique string of numbers, letters, or symbols used to identify objects online. The CCDC uses DOIs to provide links to the data, e.g. DOI: 10.5517/ccspp8d or the associated publication DOI: 10.1107/S0021889809008450 . More information on DOIs can be found here.
DOIs can be resolved through the Digital Object Identifier System and when a DOI is minted, relevant metadata is shared through the DOI provider.
Specific determination of a particular crystal structure.
An Open Research and Contributor ID is a unique, persistent identifier for individuals to use as they engage in research, scholarship, and innovation activities. More information can be found from ORCID.
Database identifier for entries in the CSD.
Each CSD entry is assigned a unique identifier comprising of 6 letters, sometimes followed by an additional 2 digits (see refcode family).
Provides a way to quickly find an entry within the CSD. For example, the structure of acetaminophen or paracetamol has the refcode COTZAN.
Early Refcodes aimed to reflect the Compound Name associated with the structure, but new substances are now assigned a new randomly generated refcode. See our blog – a potted history of the CSD refcode about how refcodes have evolved.
Reseachers are encouraged to quote CSD Refcodes when referencing entries in the CSD.
The group of CSD Entries that share the same 6 letters in a Refcode. These group together different determinations of the same chemical substance e.g.
Families do not group together:
Families can provide a convenient way to identify different polymorphs of the same structure.
Editorial information and comments that relate to the structure/entry of interest. They can include information on cross references to identical structures and editorial comment resulting from CCDC validation i.e. additional information, errors and discrepancies and that may be of interest to CSD users.
Rather than searching the whole of the CSD, it is possible to search a subset of the database.
CSD subsets are targeted collections of structures that are a convenient starting point for research into a particular field.
Refers to the structure of molecules and materials, primarily at an atomic level. In the CSD, structures are specific determinations of the 3D chemical nature of a crystal. Different determinations, refinements or substances are classed as different structures. The same determination of a structure with an identical dataset published multiple times will be classed as a single structure that has been republished.
A free collection of over 750 experimental crystal structures carefully selected to enhance chemical learning.