Scientific Data Preservation
One key benefit of depositing data into centralized databases such as the Cambridge Structural Database is preservation of the data. The CCDC began collecting experimentally derived structures into the CSD from 1965, making all crystallographic data, from the earliest collections in the 1920s to the present day, easily accessible and searchable.
Here we explain our data processing and preservation processes.
Data Deposition in the CSD
Data can be deposited via the CCDC deposition service.
The data deposition page provides information on the file types and data formats which the Centre can process and preserve. Depositors can also find more detailed support and guidelines for depositing data here.
The CCDC web-based deposition service has been designed to enhance data processing efficiency from deposition through to publication by giving depositors the chance to fix errors and add additional scientific or publication metadata at the point of deposition. This service contains eight steps for the depositor to complete before the final submission of files: Login, Upload, Check Syntax, Validation, Add Publication, Enhance Data, Review, Submit.
By using this service, various checks are run on the deposited data, including:
- Structure factor check
- IUCr checkCIF
- Unit Cell Check
In cases where depositors are unable to use the online deposition service, alternative methods for depositing data are available. Primarily, this can be done by sending data by email to
, or for data which exceed the limits of our systems, via third party file sharing systems.
Following the submission of data, further automatic validation processes take place to allow for the conversion of the data into a format which can be processed by CCDC systems. A duplicate check is run on the deposited data to establish whether this has already been added to the CCDC archive. Deposits which fail the validation process or are found to be duplicates and have not been identified as revised datasets by the depositor are then put in queue to be processed manually by CCDC staff. Datasets which pass all checks are assigned a unique Deposition Number, which is communicated by email to the depositor immediately after processing.
Data Archival and Management in the CSD
The CCDC seeks to ensure the indefinite retention of all information resources stored at the Centre. To ensure safe archival and prevent loss of data, the CCDC takes daily backup copies of deposited data files.
Bibliographic details deposited with datasets are preserved but may be modified by CCDC staff to reflect CCDC formatting rules and published citations, or when requested by the data producer.
The CCDC promotes the involvement of its designated user community in the management of their data through its web-based My Structures service. This service’s functionalities include viewing, editing and publishing data, as well as, the ability to share data amongst colleagues prior to publication. Through this service, depositors can also extend the embargo date for their data beyond one year to ensure that their data remains unpublished.
All data files deposited at the CCDC are archived, along with their descriptive data. In most circumstances the CCDC will not delete data from the data archive but deactivate the documentation. Data stored in this deactivated state cannot be further processed or accessed externally.
The CCDC takes its role as a data steward very seriously and takes every precaution to ensure that data remains private until we are made aware by depositors or published literature that data can be made public.
Upon publication of data, organic and metal-organic experimental structures will be curated into the Cambridge Structural Database and inorganic experimental structures will be curated into the Inorganic Crystal Structure Database.
Following publication, the deposited data can be searched for using the persistent identifiers or publication details associated with the structure via CCDC’s Access Structures service. Users can then view, download and interact with the data directly in the web browser free of charge. The access and use of data available through this service is governed by our Access Structures Terms and Conditions legal and regulatory framework.
To facilitate the publication and preparation of data stored in the repository, data can also be made available pre-publication to publishers, referees and depositors. When data is requested pre-publication, checks are performed on requestors to confirm their role and identity. Subsequently, they are invited to use the CCDC Referee Service which allows data to be viewed and downloaded before being made public. Similarly, the My Structures service and its related functionalities serve to assist data publication workflows by allowing depositors to view, share and claim data they have deposited or authored pre-publication.