Datasets, Data Citation and DOIs

As announced in our DOI press release we are delighted to be now assigning Digital Object Identifier (DOIs) to datasets of crystal structures deposited with CCDC. You may be familiar with the use of DOIs alongside article citations. These provide a persistent link that is guaranteed to take you to a page that provides metadata associated with an article and invariably a link to the article itself. Assigning an identifier such as a DOI reflects a desire for published content to be easily discoverable for the long term. CCDC believes that just as articles are worthy of citation, so too are the datasets that represent the primary output of the crystallographic community. We are not alone in this view as is evidenced by the Joint Declaration of Data Citation Principles (http://www.force11.org/datacitation) to which CCDC fully subscribes. These principles recognise the importance of data as a citable product of research for which credit and attribution is due. They also highlight attributes such as unique identification, access, persistence, versioning and interoperability which assigning DOIs can help facilitate.

The DOIs we are assigning to datasets are registered through DataCite (http://www.datacite.org/) and specifically via their UK member organisation based at the British Library (http://www.bl.uk/datasets). We chose DataCite because of their wider aims around acceptance of, access to and archiving of research data. We anticipate that this is just the beginning of our association with the DataCite organisation and look forward to working with them to further common aims in the future. At this point we would also like to acknowledge the excellent support provided by the DataCite UK team at the British Library.

At the time of writing, we had assigned DOIs to almost 500,000 of the datasets deposited with the CCDC, essentially any structure to which a CCDC Number has been assigned. One of the first datasets to receive a DOI was CCDC 936802, a pharmaceutical co-crystal published this year. This can now be identified as 10.5517/cc10ftfp and linked to via http://dx.doi.org/10.5517/cc10ftfp. The link will take you to the CCDC landing page for this structure from where the deposited data can be freely downloaded. Metadata associated with this entry can be found in the DataCite Metadata store at http://data.datacite.org/10.5517/CC10FTFP. This shows how this dataset can now be cited as:

Sowa, Michał; Ślepokura, Katarzyna; Matczak-Jon, Ewa; (2014): CCDC 936802: Experimental Crystal Structure Determination; Cambridge Crystallographic Data Centre. http://dx.doi.org/10.5517/CC10FTFP
 
936802_2.jpg
CCDC 936802, one of the first datasets to receive a DOI.
 
The services provided by DataCite provide a platform that facilitates interoperability with other services. Thomson Reuters has developed workflows that will take a feed from the DataCite to populate their Data Citation Index (http://wokinfo.com/products_tools/multidisciplinary/dci/) and we will be working with them to take advantage of these and ensure that your data gets exposure in this resource. The metadata available through DataCite will also enable you to add your research data to your ORCID profile through ODIN - the ORCID and DataCite Interoperability Network (http://odin-project.eu/). Additionally, we will be discussing with publishers how DOIs can enable an extension of the linking options currently provided to data associated with scientific articles.

By assigning DOIs to deposited datasets, we are adding to the array of free services we currently provide to the whole scientific community, to aid with the discoverability of your research data, reinforcing the CCDC’s position as the definitive source of crystal structure data for small organic and metal-organic compounds. Beyond this, we have laid the foundation for providing even more services that will aid in the exposure and subsequent reuse of datasets deposited with us. Some of these possibilities have been mentioned in this article but there will undoubtedly be more. We welcome your thoughts on how we can build on this for the benefit of you and the wider community. Please do drop me a line at bruno@ccdc.cam.ac.uk