Here we highlight a paper which employed SMILES to mine the Cambridge Structural Database to identify hydrate-anhydrate pairs. Part of our series highlighting examples of CCDC tools in action by scientists around the world.
One of the major developments in the 2020.1 CSD Release is the addition of the CSD Pipeline Pilot component collection, which will allow you to build custom tools for analysing CSD structural data without writing code.
As well as allowing research to be done faster and more efficiently, this should remove barriers to entry and allow more people to create custom analyses.
Machine learning is a fast growing area of active research within structural science and it is particularly effective in the crystallographic structural sciences due to the wealth of highly accurate structural data available. A key part of machine learning though is having effective molecular descriptors to represent complex chemical information about molecules and structures into easily machine-interpretable vectors of numbers to feed into machine learning algorithms.