Adding Hydrogen Atoms in Sensible Places
For structural scientists working with both small molecule and macromolecule structures, the hydrogen addition capabilities now more accurately place hydrogens in sensible locations when protonating.
Addition Sort Order
Heavy atoms are now pre-sorted and hydrogens are added in this order:
- Firstly, atoms that have unambiguous hydrogen positions (e.g. aromatic C-H groups, methylenes, amide groups, etc.) are processed.
- Following this, partially defined hydrogens are added (for example phenol OH groups, terminal -NH3 and CH3 groups etc.).
- Finally, waters are processed (i.e. atoms where there is no intramolecular information on H-positions). The atoms are further sorted within the classes above so that heavy atoms that form more close contacts are treated first. The sorting is re-applied after each atom is processed to account for any new hydrogens added.
Multiple Trial Positions
When hydrogen atoms are added multiple positions based on chemical logic are now trialled and the best selected.
Picking the Best Position
Accounting for Bad Clashes
For clashes with existing hydrogens, only positions are now used that do not clash with other atoms. This process is dynamic, so previously added hydrogens can be responsible for clashes with new hydrogens, changing the positional choice made for the new hydrogens.
Accounting for Hydrogen Bonds in the Local Environment
When selecting the position to add, the position is now checked in the local environment and if possible added where it is forming an H-bond to a nearby acceptor. For small molecule crystal structures crystalline symmetry is used to generate this local environment.
Back-tracking
Sometimes all the hydrogen sites in water molecules can’t be assigned due to bad clashes. A back tracking algorithm now swaps hydrogen atoms around (i.e. fixes a bad water molecule by taking a hydrogen atom from an adjacent water molecule to switch around the H-bond to see if the now incomplete adjacent water molecule can be assigned different hydrogen sites that don’t clash) to see if a more complete network can be obtained.
Are all Hydrogen Atoms now added in Sensible Places?
If there are still hydrogen atoms without sites, the addition on those atoms is re-run without the clash check or use of the local environment for consistency with current behaviour.
Although not perfect, around 88% of observed flexible hydrogen positions are recapitulated in a subset of high-quality neutron structures from the the Cambridge Structural Database (CSD). 85% of hydroxyl hydrogens (including water hydrogens) are predicted. This compares to just 13% for the 2023.1 release of the CSD.
An Example from the CSD
Some structures in the CSD lack hydrogen positions; it’s interesting to look at how the new code works with such structures. CSD Entry COMXIL is a steroidal compound.
Adding the hydrogen atoms using the new code leads to highly plausible hydrogen bonded cycle, involving the two COH groups in the structure and an ester C=O, along with reasonably placed CH hydrogen atoms.
Next Steps
- To discuss further and/or request a demo with one of our scientists please contact us via this form or email.
- Check out the Cambridge Structural Database for yourself and see how your research can benefit from the combined knowledge of over 1.28M small-molecule organic and metal-organic crystal structure data.
- More information on CSD software trusted by academic and industrial institutions around the world.
- Proprietary data? Our team can curate your proprietary data into a Cambridge Structural Database-like database, accessible through a simple browser-based interface. This is all done within your firewall to comply with your data security requirements. More information.
- See case studies of the CSD in action, driving forward the boundaries of scientific research.