The Benefits of Data Sharing

The benefits of data sharing

The Benefits of Data Sharing

In discussions on open science and reproducible research, the crystallography community is often mentioned as being at the forefront. Thanks to important pioneers and figures in crystallography who identified the value in collecting and aggregating the structural data underlying research, a strong tradition of data sharing has been able to thrive. This enduring tradition is underlined by the adoption of standards for data archival, such as, the Crystallographic Information File (CIF) and data repositories (such as the CSD, the ICSD, the PDB) by both new and established members of the community alike.

Given the importance of continuing this tradition of data sharing for chemistry and crystallography, we outline below some of the benefits of sharing your data in the CSD, for you as a publishing researcher, for the chemistry and crystallography community, and beyond.

When you are depositing your structural data, please make sure it is as complete as possible: ensure the CIF fields are completed, include the structure factor file and, when available the Raw Data DOI. At the CCDC, we will curate, store, and maintain your data for the benefit of you as a depositor, and for our present and future scientific community.

Long-term Preservation

As a trusted data repository provider, the CCDC has preserved data since 1965, even containing data published around 100 years ago. For example, CSD refcodes DMANTL06 and ZZZJJG were both published in 1923.

For more information on what the CCDC does in data archival and management, read the dedicated paragraph in the Scientific Data Preservation webpage. We are accredited with the CoreTrustSeal and transparently outline how the data is stored in our Data Preservation policy.

As an additional benefit, DOIs (Digital Object Identifiers) to diffraction images can be associated with your data (‘Raw Data DOI’, see how to include it here). Other files that we collect include hkl and res files, for which the information could also be included in the CIF file, and the structure factor file (fcf). This way if someone queries your structure, you know exactly where to find it, while otherwise in the future it could be more difficult to locate the data, e.g., if you move institution, or the computer storing the data breaks. These circumstances do happen, and in cases where it is not possible to retrieve the original data, scientists would have to recrystallise and redetermine the structure.

Citations and Credit

At the CCDC, your datasets are assigned identifiers and accession codes, making your structures more findable and easier to cite.

After deposition, each structure receives a unique Deposition Number (sometimes known as CCDC Number), which is communicated back to the depositor to be cited in their own manuscript. Upon publication, the structure is assigned a unique database identifier, known as CSD refcode, and a CCDC Data DOI, which will be used by other authors to cite your structure. By registering the DOI through DataCite, the metadata for the structure is made openly accessible and searchable via DataCite.

These services are available also for CSD Communications. CSD Communications is a platform that provides researchers with the opportunity to publish datasets independently of a journal article. All CSD Communications receive a CCDC Data DOI, which is sent directly to the depositor as soon as the data is made publicly available, allowing for immediate access and citation. In 2019, an ISSN was also acquired to help publishers, institutions and researchers track and record citations to CSD Communications.

Moreover, since 2016, users can provide an ORCID iD when depositing data which will be shown alongside the data record once published.

Meet Funding and Publication Requirements

Funders and publishers increasingly request that data is stored and accessible in a trusted repository in line with global standards for data preservation. For example, the RSC author guidelines request that “X-Ray crystallographic data and macromolecular structure and sequence data should be deposited in an appropriate repository”. EU H2020 funded research guidelines request that data adheres to the FAIR data principles. As another example, UK government funding agencies also require data to be available in repositories (see UK research funding policies).

The CCDC is also a proponent of the FAIR principles. We work to enhance the Findability, Accessibility, Interoperability and Reusability (FAIR) of crystallographic data.

Make Sharing Data with Collaborators Easier

Once you deposited your data, it will be confidential until publication. If, however, during this time you need to share your structures with your collaborators, you can do so in MyStructures via the “Share Structures” button. More information here.

Make your Data Reusable for Others in the Community, for Today and for the Future

Our scientific validation process ensures that data is stored in standard format so it can be easily reused by other researchers.

Sharing your structures in the CSD makes them findable and accessible, and it enables other scientists to discover and reuse your structural data for their research, going beyond the original uses and context of the deposited data. The fight against COVID-19 is an example of this.

In the CSD we have identified over 120 structures that have been investigated for use against COVID-19. The majority of these structures were determined prior to the pandemic, but the views and downloads for the relevant structures in the CSD increased dramatically after 2019. For example, Remdesivir (CSD refcode ZARNAK) was published in 2017, but over 99% of its views and downloads on our Access Structures service were after 2019 when it started to be investigated for use against COVID-19. This perfectly demonstrates that you never know how your structure might be used in the future and that it may help advance science globally in ways you didn’t expect.

Sharing not only your final structure in the CSD, but also the associated files enables future proofing of your data. With technology growing and improving fast, in the future it might be possible to assess structural data with new techniques and new standards, with the potential need to calculate new quality metrics for the data if they arise. Storing complete data would mean that when new fields might be required, the necessary data will already be there for it to be added to the CSD.

An example is the inclusion of atomic displacement parameters (ADPs). In 2018, ADPs were made available in the database, where the values were already included in the deposited CIFs. Structures which include the ADPs information can now be visualised in Mercury using the “Ellipsoid” style, which represents the atomic displacement or thermal ellipsoid, and have been grouped in a dedicated CSD subset, the ‘ADPs available’ subset. Read more about ADPs in the CSD in the FAQ and in the dedicated paragraph in this blog.

Regardless of what your main reason is for sharing your data in the CSD, all these benefits are entangled together. When you are depositing your structural data, please make sure it is as complete as possible: ensure the CIF fields are completed, include the structure factor file and, when available the Raw Data DOI. At the CCDC, we will curate, store, and maintain your data for the benefit of you as a depositor, and for our present and future scientific community.