Online Platforms for Crystal Structures Exploration
This blog is based on the recent virtual workshop “Explore Crystal Structures Online Using WebCSD” and introduces the online platforms to search the Cambridge Structural Database (CSD): these are WebCSD and Access Structures. Watch the full workshop here.
The Cambridge Structural Database
The Cambridge Structural Database (CSD) is the largest database of organic and metal-organic experimental crystal structures, containing today over 1.25 million structures (Figure 1). It includes data published in associated scientific articles, patents, institutional repositories, thesis publications and structures published directly through the database (CSD Communications).
The CSD represents a community effort, with datasets determined by researchers worldwide. Every single CSD entry is enriched and annotated by experts at the CCDC to aid the discoverability of data and knowledge from the resource.
Alongside the multitude of structures, the CSD also contains a wealth of structural information such as over 94 million atomic coordinates, 28 million bond lengths, 40 million valence angles, 14 million torsion angles and 2 million rings, representing a rich and extensive source of data.
How Might You Use the CSD?
Contextual insights using the CSD are highly valuable in crystallography and wider scientific research. The database is invaluable when undertaking background research – it can reveal if a substance exists in the solid form and if any similar substances have been determined. Checking the unit cell during a data collection can also help in finding out if the desired compound was crystallized before undertaking a full data collection.
Newly determined datasets can be compared to what is in the CSD. The collection of experimental structures included in the database can in fact give valuable information about interactions and geometric features in the solid form.
As well as supporting research the CSD helps students to learn about chemistry in the 3D using real experimental data and it can teach students how to effectively search for scientific information and critically evaluate results.
How To Perform Structure Searching Online
You can search structures in the CSD through the web browser via Access Structures and WebCSD, from the desktop software ConQuest, and programmatically using the CSD Python API. Access Structures is free to use, while the other ways of searching are included with any CSD license.
This blog will focus on the web-based platforms (Access Structures and WebCSD), which present the advantage of accessing real-time updated data.
Access Structures
Access Structures is the online portal that allows the user to access the CSD and the ICSD. Viewing deposited datasets of a structure and downloading their associated files is free. No local installation of software is required as Access Structures can be accessed through any standard internet browser from the link reported.
In the example in Figure 2 it can be seen how the search results for the compound paracetamol looks like. From the list of entries obtained in the middle, one specific entry can be selected and more information about that structure will be displayed, such as the 3D viewer and chemical diagram. The deposited cif file for the selected structure, alongside any other information available like structure factor data and the checkCIF report can be downloaded.
WebCSD
Similarly to Access Structures, WebCSD is an easy and intuitive web-based platform that does not require any installation, presents data updated to the minute, and can be accessed from the same link. The main advantage of using the licenced software WebCSD is that it allows the users to perform advanced searching of the CSD and access data that have been fully curated.
To be able to access WebCSD you need to create an account on the CCDC website and then connect your account to your license from “Activate WebCSD” using your licence customer number and activation key. We can also add IP address(es) to a specific customer account: if this is your preferred option, please contact us via this form.
Once the login is done, a variety of searching options will be available. Figure 3 highlights in blue the more advanced searching options that can be accessed with WebCSD, such as structure search, unit cell search and formula search.
As Structure Search is the most complex, a summary of its main characteristics is here reported and can be seen in Figure 4.
Structure Search allows the user to draw the compound of interest. Several functionalities are available: editing features (at the top centre, highlighted in dark green in the figure) include clearing the canvas, undo and redo options, selecting, erasing and rotating the structure; quick access to the elements, periodic table and different bond types and templates (on the left and top right, highlighted in yellow in the figure) allows the users to draw a variety of molecules; finally, more advanced options (at the top left and bottom right, highlighted in grey in the figure) such as adding 3D parameters can be explored.
The user can then initiate the search by clicking “Search” and a results page similar to the one reported in Figure 5 will load. A list of entries is available on the left-hand side and details of the selected structure can be seen on the right-hand side. A 3D viewer and 2D chemical diagram of the entry can be seen, alongside bibliographic, chemical crystal and experimental details. The CCDC DOI is also reported, and the publication that contains the structure can be easily accessed from the link.
Don’t Miss These Features!
- Explore Advanced Structure Searching
Advanced options are available when searching structures through the WebCSD, and this includes editing or adding extra properties on specific atoms, such as the charge or number of connections, and selecting multiple atoms from the periodic table, including entire groups or periods (Figure 6, left).
3D parameters can also be added (Figure 6, right), and if you want to see an example where this is explored in detail, follow the link to the self-guided workshop “Searching the CSD online with WebCSD“.
- Define the Search Type
When performing a structure search, different search types can be defined (Figure 7), and hence lead to different results.
A substructure search returns hits where the drawn query is a part of any molecule (similar to ConQuest). To return structures that contain the exact molecule as it is drawn, the user needs to tick “Auto Generate” and from the “Auto Generate Settings” the option “Exact” should be selected, with “Substructure” chosen as “Match condition” (Figure 7). The similarity option calculates instead a molecular fingerprint for the drawn molecule and compares that to pre-calculated fingerprints for structures in the CSD using the Tanimoto coefficient, returning structures that are similar to the drawn one, and a similarity score (where 1 is an identical molecule).
- Upload a Drawn Structure
Structure Search allows the user to upload a .mol template of a structure previously drawn and saved from any drawing program like ChemDraw. This is useful when the compound that the user wants to draw has a particularly complex structure and can be done by clicking in the arrow highlighted in grey in Figure 8.
Featured Questions and Answers Asked at the Workshop
What is the difference between the file formats .gcd and .cif when downloading a structure from WebCSD?
The .cif file is the deposited dataset for an individual entry, while the .gcd file contains the list of refcodes obtained from the search, which can be loaded into Mercury to view the relevant structures in the CSD. The parameter data is instead in the .tsv format, hence parameters such as the torsion angles can be downloaded and loaded into any spreadsheet like Excel.
Can we search the distribution of bond lengths of a particular type of bond, such as for the N=N bond? How can we tabulate them?
If you draw the N=N substructure in Structure Search and then add the 3D parameter for this distance, you can easily find the bond lengths you are interested in. When the search has completed, WebCSD will have the option “Download Parameter Data” in a .tsv file format. You will see this button at the bottom of the results list, below the “Download” button, on the left-hand side of the screen.
My licence is activated, but I can’t draw structures in the Structure Search window. What can I do?
You might need to clear your cache or restart your browser. If that doesn’t help, please and include a screenshot of what happens when you try to access Structure Search.
Next Steps
To discuss further and/or request a demo with one of our scientists, please contact us via this form or .