Using Experimental Crystal Structures From the CSD for Gibbs Lattice Energy Calculations
January 4, 2024
This blog highlights the work done by Dr Detlef W. M. Hofmann and co-workers, who employed machine learning methods on the experimental crystal structures of the Cambridge Structural Database (CSD) to accurately derive the intermolecular Gibbs energy for the investigated structures. The full work can be accessed here .
The calculation of Gibbs free energy of intermolecular interactions, also known as lattice energy, is one of the biggest challenges in computational chemistry and crystallography. The stability of polymorphs is highly influenced by temperature, and the entropic term TΔS of the Gibbs free energy is decisive in identifying the stable form at the given temperature.
There are two approaches to calculate the intermolecular energy: first-principles methods and force fields. While the first present some limitations, such as not always accounting accurately for temperature effects, force fields provide a computationally affordable and accurate model. However, force fields still require a vast quantity of computational resources, so more optimization of the workflow is needed.
Machine learning (ML) can now be used in force field development. Challenges were encountered in obtaining the required parameters when the system contained more than just a few elements. A good approach to get these parameters consists in accessing large data repositories of experimental crystal structures, such as the CSD.
In this work, the researchers applied the widely used ML methods called support vector machines (SVMs) to the experimental structures contained in the CSD, to derive a force field for all available types of atoms (general force field).
The Data Set
When the research was carried out, the CSD contained over 1 000 000 experimental crystal structures. A preliminary data processing was performed, and numerous filters were applied to remove structures that were not of interest, such as those measured at conditions that differ from the standard pressure and temperature, polymeric and disordered structures. A final number of 259 041 records were used.
The Bonds Calculation
The first step to derive a force field is the bond calculation, as bonds define whether an interaction is intermolecular or intramolecular. The scientists introduced a critical bond distance for the interactions between any pair of atoms, defined as the distance below which two atoms should be considered bonded. Initial interatomic bond distances were calculated as a geometric or arithmetic mean from the atom radii (following the approach by Bondi), and then refined using force field and chemical knowledge.
Assignment of Atom Types
The step after the bond calculation is the assignment of atom types. A correct atom type assignment is crucial to obtain accurate results, and can be done considering multiple atom properties, including the element type, hybridization, aromaticity, and chemical environment. The scientists defined the atom types for all the elements available in the CSD. The final list of atom types was obtained through optimization and was influenced by the training of the parameters.
Structures and Gibbs Energy
To describe the Gibbs energy of crystal structures, several thermodynamic equations need to be fulfilled. In machine learning techniques the parameters of the force field are optimized until the point in which the equations are fulfilled the best and the error function is the smallest possible. The SVMs algorithm was used in this step.
Validation and Application
The values of potentials obtained by the training were compared with the expected ones. The scientists determined whether they were reflecting effects like hybridization, oxidation state, and formal charge, finding a good match between the potential curves and the chemical knowledge.
An additional validation of the general force field was then performed. Considering that the free lattice energy of any existing crystal structure must be negative, the team calculated the energy of the of selected 259 041 experimental crystal structures contained in the CSD. In 99.86% of the cases, the energy presented negative values, confirming that the condition Gcalc<0 was fulfilled. An interesting investigation of the 350 outlier structures that did not fulfil the conditions was important to improve the force field.
A final validation of the force field consisted in investigating the aspect that any experimental structure is a local minimum (dG/dx = 0) and should not show major changes during minimization. The first 500 crystal structures in the CSD (selected in alphabetical order) were hence extracted and minimized. Figure 1 shows the calculated Gibbs energies of the experimental structures plotted against those of the minimized crystal structures.
Owing to its fast and accurate calculation of intermolecular and lattice Gibbs energy, the general force field can be applied for crystal structure predictions and solubility predictions.
Intermolecular interactions were calculated using a general force field. The CSD had a fundamental role in this work, and will continue to serve as a source of experimental crystal structure data for further force field developments.
 Hofmann, D. W. M. & Kuleshova, L. N. (2023). A general force field by machine learning on experimental crystal structures. Calculations of intermolecular Gibbs energy with FlexCryst. Acta Cryst. A79, 132-144.
CSD Database (43)
Machine Learning (8)