Hi,

I would like to use the python csd api for hydrogen bond geometry analysis. In ConQuest is an option for searching hydrogen normalised entries. I can not find this option in the python substructure search. I only found an option to normalise hydrogens of molecules but that makes everything rather complicated. Did I miss here something for performing this analysis on normalised entries?

Thanks in advance for any suggestion,

David

Dear David,

I'm afraid the API does not currently support this feature.  If your search has constraints or measurements which depend on hydrogen positions, these will be taken from the structure without applying hydrogen position normalisation, so may well not find hits of interest.  If, on the other hand, your search is independent of the hydrogen bond positions, then the results of the search may be normalised after the search, when measurements may be taken, or constraints applied.

For example, you could perform a search with rather looser distance constraints to a hydrogen, then for each hit, normalise the molecule's hydrogens, and test with a more accurate measurement.

Here is a worked example, searching for COOH dimers:

from ccdc import search
searcher = search.SubstructureSearch()
query = search.SubstructureQuery()
sub1 = searcher.add_substructure(search.SMARTSSubstructure('C(=O)-O-H'))
sub2 = searcher.add_substructure(search.SMARTSSubstructure('C(=O)-O-H'))

Normally we would search for the hydrogen bonds using a vdw corrected distance of (-5, 0) but if we do not trust the positions of the hydrogens we can make the range larger:

searcher.add_distance_constraint('Dist1', sub1, 1, sub2, 3, (-5, 5))
searcher.add_distance_constraint('Dist1', sub2, 1, sub1, 3, (-5, 5))

hits = searcher.search(max_hit_structures=10, max_hits_per_structure=1)

This gives 10 hits.

[h.identifier for h in hits]

[u'AADAMC', u'ABADIS', u'ABADIS02', u'ABAFER', u'ABEKUO', u'ABENEB', u'ABEQUU', u'ABEXUD', u'ABIDAR', u'ABIDAR01']

Now we can filter using the correct distances:

from ccdc.descriptors import MolecularDescriptors

def correct_distance(h):
    atoms = h.match_atoms()
    o1, h2 = atoms[1], atoms[7]
    o2, h1 = atoms[5], atoms[3]
    return (
        MolecularDescriptors.atom_distance(o1, h2) - o1.vdw_radius - h2.vdw_radius < 0 and
        MolecularDescriptors.atom_distance(o2, h1) - o2.vdw_radius - h1.vdw_radius < 0
    )

hits = [h for h in hits if correct_distance(h)]

And now we have 6 hits:

[h.identifier for h in hits]

[u'AADAMC', u'ABADIS', u'ABEKUO', u'ABENEB', u'ABEQUU', u'ABEXUD']

I hope this is helpful.

Best wishes

Richard

 

 

 

 

 

Dear David,

I'm afraid in my last post I omitted the essential step of normalising hydrogen positions before testing the hits.  This should be done by:

for h in hits:
    h.molecule.normalise_hydrogens()

before testing the distance.

Sorry about this.

Best wishes

Richard

 

Dear Richard,

thanks for your reply. I will implement your suggestion into my workflow.

Best wishes,

David

For anyone who has the same problem, here is another possible implementation into a substructure search:

 

import numpy as np
import ccdc.search

# normalised bond lengths between hydrogen and C, N or O (same as in ConQuest)
h_dict = {'C': 1.089, 'N': 1.015, 'O': 0.993}


def norm(v):
    """ calculates the norm of a vector """
    return np.sqrt(np.dot(v, v))


def angle(a, b, c):
    """ angle between 3 points with b in the center """
    ba = a - b
    bc = c - b
    x = np.dot(ba, bc) / (np.linalg.norm(ba) * np.linalg.norm(bc))
    return np.degrees(np.arccos(x))


def h_correction(d, h, type):
    """ Correcting length of bond between hydrogen and C, O or N"""
    ha = h - a
    dis = norm(ha)
    ha_corrected = (ha / dis) * h_dict[type]
    h_corrected = a + ha_corrected
    return h_corrected


# preparing search with aliphatic hydroxyl as donor and oxygen or nitrogen as acceptor
search = ccdc.search.SubstructureSearch()
don_sub = ccdc.search.SMARTSSubstructure('[CX4][OX2][H]')
acc_sub = ccdc.search.SMARTSSubstructure('[O,N]')
don_sub_id = search.add_substructure(don_sub)
acc_sub_id = search.add_substructure(acc_sub)

# constraining distance between donor and acceptor atom to 5 A
search.add_distance_constraint('DIST1', don_sub_id, 1, acc_sub_id, 0, (0.0, 5.0), 'Intermolecular')

# constraining angle involving donor, hydrogen and acceptor atom between 100 and 180
search.add_angle_constraint('ANG1', don_sub_id, 1, don_sub_id, 2, acc_sub_id, 0, (100.0, 180.0))

# using only high quality data
search.settings.has_3d_coordinates = True
search.settings.only_organic = True
search.settings.no_errors = True
search.settings.not_polymeric = True
search.settings.no_disorder = 'Non-hydrogen'
search.settings.max_r_factor = 5.0

# starting search
hits = search.search()

# empty lists for data
dists = []
angles = []
errors = []

# analysing hits
for hit in hits:
    # some entries miss the hydrogen position, so we need to try
    try:
        atoms = hit.match_atoms()
        # getting coordinates (atom index according to queries --> [CX4][OX2][H] - [O,N] --> [0][1][2] - [3])
        donor = np.array(atoms[1].coordinates[:])
        hydrogen = np.array(atoms[2].coordinates[:])
        acceptor = np.array(atoms[3].coordinates[:])
        # correcting hydrogen position for angle calculation
        hydrogen = h_correction(donor, hydrogen, atoms[1].atomic_symbol)
        ang = angle(acceptor, hydrogen, donor)
        # keeping hbonds with angle of at least 120
        if ang >= 120:
            dists.append(hit.constraints['DIST1'])
            angles.append(ang)
    # if something doesnt work in the try statement save the identifier
    except:
        errors.append(hit.identifier)

 

I hope that helps,

David

You must be signed in to post in this forum.