Celebrating 10 years of Structure Factors at the CCDC
This month marks 10 years since the CCDC began formally accepting experimental structure factor data alongside crystallographic information files (CIFs) for deposition of structures to the CSD. The change came about following the lead and advice of the International Union of Crystallography (IUCr), whose journals have long required authors to provide structure factor data for published crystal structures. Structure factors, as well as other experimental information, are important as they can be used to describe the distribution of electron density in the structure and enable users to further validate a structural model. This also helps make your data more FAIR (Findable, Accessible, Interoperable and Reusable).
In this blog, I will describe the prevalence of structure factor and other information (such as reflection intensity data and raw diffraction images) over the last 10 years. The CCDC accepts both structure factor and reflection intensity data for entries and strongly encourages authors to provide them during the deposition process. Structure factor and reflection intensity data can either be stored within the CIF or can be uploaded as separate files (.fcf and .hkl respectively). If this information is not provided, then depositors are asked to provide a reason why the information is not available, with the response stored in the CIF. For more information about structure factors, please visit our new information page. Structure factors for structures (if they are available) alongside other information can be downloaded from WebCSD and Access Structures. Please see this FAQ for more information.
Availability of reflection data for entries in the CSD
Graph to show the rise of CSD entries with available reflection data by publication year since 2011
Over 83% of the CSD entries submitted over the past 4 years have accompanying reflection data (either structure factor or reflection intensity information). Since CCDC formally started accepting this information in 2011, the percentage of structures added to the CSD with available structure factor information has generally increased, although in recent years it has begun to fall slightly. In contrast, the number of structures with reflection intensity data has rapidly risen. This increase is most likely due to changes in refinement software, as many programs now include reflection intensity information in the CIF by default, as well as new practises for data sharing required by funders and journals.
In 2020, over 90% of structures added to the database had accompanying reflection information (either structure factors or reflection data). Of the remaining 10% of structures the largest proportion were refined using older crystallographic programs – where this information isn’t added to the CIF by default. Some of these structures could be more historic data that is only now being published and the reflection information has since been lost. Another category of structures where no reflection data was available is structures which have been manually added to the CSD when no CIF information was available – the CCDC undertakes projects to type up older research data to prevent it from being lost to the community and occasionally creates entries from published crystallographic data where no CIF is available.
The majority of reflection intensity data is obtained directly from the CIF rather than being deposited as a separate hkl file – while structure factors are mostly deposited as separate files. This could suggest that including reflection information within the CIF file makes it easier for this information to be shared. Some crystallography programs do include structure factor information within the CIF, but this is not as widespread as reflection intensity information. However, in the presence of refinement instructions (.res), which are also included in CIF files by default in many crystallographic programs, structure factors can be recreated by re-refining the data using the reflection intensities – meaning structure factor information could be recreated from the information available even when structure factors are not provided during deposition. Further investigation is taking place to more fully understand the reasons why reflection information is not always available in order to help more depositors provide this data.
Raw diffraction data
As well as reflection information, the CCDC has also made changes to enable raw diffraction data (diffraction frames collected during a crystallography experiment) to be shared. Although the CCDC does not currently store raw diffraction data itself – if the information has been deposited in another repository the DOI link to the dataset can be associated with the entry during deposition. This link or ‘Raw Data DOI’ can be seen in the entry on WebCSD and Access Structures (for example CSD Entry BISGAO).
Graph showing radiation used in structures with Raw Data DOIs
Currently 60 structures in the CSD have an associated Raw Data DOI. Interestingly, half of these entries are from non X-ray collections – neutron and electron diffraction studies. Electron diffraction structures, although a very small percentage of data in the CSD (as seen in the graph below), are more likely to have Raw Data DOIs available. Approximately 20% of the electron diffraction structures in the CSD have an associated Raw Data DOI. As an emerging technique, this could be due to community practises of making this data accessible to enable further development. As the availability of raw diffraction data for all small molecule data begins to be discussed and encouraged or required by funders and journals, we hope this number will continue to grow.
Graph showing the percentage of CSD structures collected with different radiation types
For more information about associating raw data to deposited structures, please see our FAQ here.
Future developments
Looking to the future, the CCDC is continuing to look at ways to help assist depositors to provide reflection information. We are working on producing new resources to help guide depositors (see our new structure factor information page here) as well as assess if changes are needed to our deposition service to make it easier to share such information. We will also be holding one of our CCDC Virtual Workshop in November 2021 on the topic of Depositing Data to the CSD using our web deposition service (register to our newsletter not to miss it).
Check out our #CSDTopTipTuesday series on Twitter, Facebook, Instagram, and LinkedIn next month for more information about supporting experimental data in the CSD.
If you have any thoughts on our current progress or if you think any other experimental metadata is also important to be obtained and shared, please get in contact with us.
*Data for 2020 is up to October 2020 (Nov 2020 release)