I am attempting to count the number of hydrogen bonds around molecules in several hundred structures from the CSD.  I would like to count these on a "per molecule" basis -- i.e. ignoring crystal symmetry equivalence, simple count the number of hydrogen bonds made by each chemically-independent species in a crystal or co-crystal.

Using the Crystal.hbonds function seems to behave fairly sensibly, returning several hydrogen bonds for a structure like e.g. CATKIT, although it is not counting in the fashion that I would like (i.e. it at least partly accounts for crystal symmetry and tries to return only completely unique H-bonds).  Given that I want to consider the immediate environment around each molecule regardless of the crystal symmetry, this seems like it would be the perfect use case for the molecular_shell function in the Crystal() class.

However, when I build the molecular shell for CATKIT and use the Molecule.hbonds function on the resulting object (which I understand is itself a collection of Molecule() objects for each "component"), I get no hydrogen bonds (i.e. the function returns an empty tuple).  I have attempted to create both molecular and packing shells and use Molecule.hbonds on the resulting objects to no avail (both methods return empty tuples).

Similarly, even calling Molecule.hbonds on the original Crystal.Molecule object (i.e. the "base" molecule of this entry in the CSD) returns an empty tuple, despite there clearly being at least one intermolecular hydrogen bond present in the asymmetric unit.  This seems counter-intuitive to me -- surely even if a Molecule object is itself composed of several further Molecules, the Molecule.hbonds() function should return all contacts that match the hydrogen bond criteria in the Molecule class instance it is called on?

Any help you could provide with getting the Molecule.hbonds function to correctly count hydrogen bonds would be greatly appreciated.  Alternatively, if you could suggest another means of counting all hydrogen bonds around distinct, independent molecules using the API, that would be greatly appreciated.  (To give you an idea of what I'm looking for, in CATKIT, the bi-napthalene-diol species I would count as participating in 3 hydrogen bonds, while the other benzamidazole species participates in 2.)

Thanks in advance.

Hi Chris.

The HBond in CATKIT is not found because the default path length range for detecting hydrogen bonds is set to (4, 999), so excluding contacts between separate components of the molecule.  You can include such contacts by setting the path_length_range to (-1, 999), i.e:

from ccdc import io

csd = io.MoleculeReader('csd')

catkit = csd.molecule('CATKIT')

print catkit.hbonds(path_length_range=(-1, 999))

(HBond(Atom(N2)-Atom(H2)-Atom(O2)),)

The value -1 is used to cope with both options to the 'require_hydrogens' parameter of the hbonds() method.  I appreciate that this is not clear from the documentation, and this will be rectified in a forthcoming release.

I think the default behaviour is somewhat counterintuitive; I shall discuss with colleagues whether the default should be made more permissive.

Hope this is helpful.

Richard

 

Richard,

Yes, that is very helpful, thank you!  I can confirm that setting the lower bound to the path_length_range to -1 now returns hydrogen bonds for the molecular shells, as well as the one intermolecular hydrogen bond present in the asymmetric unit when it is called on the raw crystal.molecule instance for CATKIT.

For my own understanding (and any future readers); it appears (from looking at the API source code) the reason that Crystal().hbonds behaves differently to Molecule().hbonds is that the intermolecular flag that exists for the Crystal() version defaults to including intermolecular contacts in its analysis, while the Molecule() version has no such option and therefore doesn't -- is this the only reason for the difference?  It is understandable, but perhaps counter-intuitive given that a "Molecule" can be multiple molecules.  (Perhaps the Molecule() version should consider whether there are multiple components and count accordingly?)

I appreciate your prompt and very helpful reply.  I will let you know if I encounter any other issues, but thanks again for clarifying this for me.

You must be signed in to post in this forum.