CSD in Action: Data Mining to Find Hydrate–Anhydrate Structure Pairs
February 16, 2021
Here we highlight a paper which employed SMILES to mine the Cambridge Structural Database to identify hydrate-anhydrate pairs. Part of our series highlighting examples of CCDC tools in action by scientists around the world.
Organic molecules can crystallise in hydrated or anhydrous forms, with significant changes in properties in the different forms. The authors here used the Cambridge Structural Database (CSD) to find test systems for their hypotheses, and to identify trends across a large range of materials. Using the CSD Python API and SMILES strings they were able to effectively mine the database for systems which fit their requirements. The work shows how the wealth of experimentally validated data in the CSD can be used to test and identify new hypotheses.
The transformation of molecules between hydrate and anhydrate forms due to environmental changes can have a big impact on their properties. This is evident across many physiological and pharmaceutical cases.
To understand how these changes impact stability and mechanical properties, the researchers needed to understand the thermal stability of the materials, and the mechanism and products of dehydration.
After several studies on specific cases, the authors turned to the CSD to find more systems to test their hypotheses against, and to identify trends across a highly varied sample set.
Scripts were created for use in the CSD Python API to match SMILES strings and return organic compounds, with 3D coordinates and no errors. The returned structures were split into hydrates and anhydrate forms. Further screening including with packing similarity and SMILES string comparison was used to match up hydrate-anhydrate pairs.
With the data sets defined, the authors looked for trends and found preferences for low numbers of water molecules, odd numbers of water molecules and a bias for hydrates to crystallise in a lattice with lower symmetry than their anhydrous form.
CSD Python API (16)
Data mining (13)
Tools in Action (26)