Supporting the Crystal Structure Prediction (CSP) community
In 2019 we started exploring how the CCDC’s experience in data management and standards could best serve the data needs of the Crystal Structure Prediction (CSP) community. Around 18 months on, we wanted to share the outputs so far, how you can get involved, and what you can expect to see from us in the future.
Establishing a CCDC CSP Consortium
Back in November 2019 we hosted an external CSP discussion meeting, and this helped set our direction for 2020 perhaps in more ways than we anticipated. In the meeting we trialled a hybrid physical/virtual approach with discussion sessions designed to allow us to gather input from both in person and virtual attendees and this experience certainly helped us adapt the way engage with users this year in light of the pandemic. Alongside testing out new ways to engage our community the meeting highlighted the need for appropriate standards and guidelines to enable the effective and reliable communication and reuse of the results of Crystal Structure Prediction experiments.
It was recognised that for this to be effective, we need the engagement of representatives from a range of stakeholder groups and the discussion groups felt that CCDC was ideally placed to support efforts.
This led us to establish the CSP Consortium, which brings together representatives from industry, academia, solution providers and data organisations to share progress, hear perspectives on the current state of CSP and discuss challenges and opportunities.
Chart showing what type of organisations attendees were representing at the CCDC CSP Consortium meeting in November 2019.
Launching the 7th CSP Blind Test
During the meeting we were able to review how the CCDC has been working to support the CSP community in 2020 in a variety of ways. In October 2020 we launched our 7th CSP Blind Test – the leading challenge in crystal structure prediction. The CSP Blind Test brings together scientists in the field from industry and academia to test their methods against real examples in a controlled environment, and make connections in the CSP community. This year we are doing things slightly differently and the test poses new challenges, asks deeper questions, and examines method developments closely.
You can find out more about the CSP Blind Test in Jason Cole’s recent blog.
Establishing new data standards
It was also recognised that the CSP Blind Test provides us with an excellent opportunity to help establish and test out new standards for CSP data. Through the CSP Consortium we have been working towards proposing a set of new data standards that we can launch in 2021 ready for 7th CSP Blind Test submissions. Over the year we have engaged with key thought leaders in academia, industry and software as well as investigating the status of current CSP data in the literature, existing databases and held at the CCDC.
We have had some in depth discussions with our CSP Industry Members (more about this later) and key academics to better understand data requirements and to support industry members in creating their own proprietary databases.
This work has led us to draft a set of new CCDC defined CIF fields. These CIF fields can be broken down into categories that have been designed to extend the existing standards to accommodate data items key to the reuse of CSP results, including:
- Simulation temperature type (dynamic or static) and value
- Conformer generation, optimisation and clustering methods
- Crystal structure generation methods and space groups used
- Energy values and energy optimisation models used
- Version of software used for each step
For each category we have been following CIF conventions set out by the International Union of Crystallography (IUCr) and the Committee for the Maintenance of the CIF Standard (COMCIFS) to come up with the data category, data item – name and type, the data item definition, permitted values/ranges, hierarchy and relationships and importance. Now that we have our first draft we are working to refine the proposal, gather feedback, create a basic checking procedure and produce and publish simple submission guidelines and resources for Blind Test entrants early in 2021. Our intention is to use the Blind Test to road test these new standards so we can improve our proposal and work towards a more formally recognised dictionary with COMCIFS alongside the creation of more advanced validation tools.
Establishing Industry Partnerships
With CSP now routinely adopted by industry to inform drug product formulation and significant time and costs needed to calculate landscapes there has been an increased drive to make the most of this data. To this end in 2020 we established industry partnerships with major pharmaceutical companies to help accelerate solutions. These support our overall objective of the CSP Consortium to develop solutions for the storage, visualisation and management of CSP knowledge, underpinned by community standards and driven by industry needs.
Image showing how our Industry Partnerships fit into the wider consortium
These partnerships have helped us to identify industry-wide requirements and, guided by their inputs, we developed and launched CSD-Theory. This can be used on real data to help address industry challenges in managing, visualising, storing and accessing CSP data. It will hopefully form the basis of future platforms to share published CSP data too.
A collaborative effort
As you can see from our efforts so far we are committed to supporting the CSP community but it is essential that we are guided by you, our community in the direction we take. We know that engaging representatives from a range of stakeholder groups is key to the success of this venture. We also need to ensure we balance the work we do with the resources we have available to us and make sure we do this alongside our other commitments.
This means we need your support and your input! If you would like to know more or think you can help contribute your time, expertise or resources to help us realise the CSP Consortium aims contact us here.
If you’re interested in a demo of CSD-Theory, just email