• RE: Filtering CSD for molecules with no disorder

    Hi Puck,

    there are a couple of issues with this script:

    Firstly, the no_disorder setting actually has three distinct values corresponding to False (don't check disorder at all), True (check for significant disorder, i.e of heavy atoms), or 'All' (check for any disorder).  Since you have specified True, structures like AARBOX will pass as the disorder is only in the position of the hydrogens.

    Secondly, you are checking the crystal for disorder, rather than the entry.  This is a less specific test than that of the entry, as the editorial decisions made for the entry are lost when constructing the crystal, and this is why ABABUB is found.

    There is a third problem for which I must apologise: there is a bug in the processing of the test for entries which, coincidentally I fixed today.  This will appear in the next release of the API.  This concerns entries with disorder but where the editors have not suppressed any atoms.

    A better check would be:

    for entry in io.EntryReader('csd'):
        if settings.test(entry) and not entry.has_disorder and len(entry.molecule.components) == 1:
            refcodes.append(entry.identifier)

    Please let me know how you get on with this.

    Best wishes

    Richard

  • RE: unable to access coordinates

    Dear Gabor,

    I'm sorry to say you have come across a bug in the interaction between our licensing and our implementation.  The orth method has inadvertently taken its definition from a library with restrictions.  This will certainly be fixed in a forthcoming release, probably version 1.5.2 in about three months' time.

    In the meantime I have provided a 'monkey patch' which will change the use of 'orth' to one in a library without these restrictions.  This should be applied in any script you use to examine the coordinates of a molecule's atoms.

    Please let me know if anything is unclear about this, and my apologies for allowing this bug to slip into the release.

    Best wishes
    Richard Sykes

    --

    # Monkey-patch around the licence restriction

    from ccdc.utilities import _private_importer
    with _private_importer():
        import ChemistryLib
        import PackingSimilarityLib
    PackingSimilarityLib.Point = ChemistryLib.Point

     

  • RE: Problem with atomic coordinates

    Hi Anna,

    this did come as a bit of a surprise to me, but after looking at it a bit I think I am clear as to how it happened.

    The symmetry operator returned by the  substructure search returns a composite of the unit cell translations and the underlying symmetry operator.  In this instance it represents the symmetry operator 1/3+y,2/3-x+y,2/3+z with a translation of (0, 0, 1).  When putting the translation component in to the operator we end up with 1 + 2/3 +z, which is rounded to 1.66667.  When applying the symetry operator to the crystal to create the symmetric_molecule, a check is made that the operator is one of the symmetry operators of the crystal, and 0.66667 is not exactly equal to 1/3.  

    The solution here is very simple: you can force the symmetry operator to be applied:

    >>> crystal = hits[0].crystal
    >>> symmop = hits[0].match_symmetry_operators()[-1]
    >>> symmetric_molecule = crystal.symmetric_molecule(symmop, force=True)

    I shall consider whether there may be a better representation of symmetry operators, with the translation kept separate from the operator for some future release.

    Thank you for raising this interesting case.

    Best wishes
    Richard

  • RE: Problem with atomic coordinates

    Hi Anna,

    this occurs because the search is performed over the all molecules of the crystal as can be seen by inspecting the symmetry operators of the match:

    >>> print hits[0].match_symmetry_operators()[-1]
    2-x,2-y,1-z

    The atoms provided by match_atoms() are of the single, central molecule of the crystal. I do think that this is not ideal, and I would like to provide some method for capturing the actually matched atoms in some future release.

    In the meantime you will can get the matched atoms in a rather roundabout way:

    >>> sym_copy = hits[0].crystal.symmetric_molecule(hits[0].match_symmetry_operators()[-1])
    >>> sym_Y = sym_copy.atom(Y.label)
    >>> print ccdc.descriptors.MolecularDescriptors.atom_distance (X, sym_Y)
    3.10294644555

    This will work as long as the atom labels of the molecule are distinct. If not the match will have to be done on the index of the atoms:

    >>> sym_Y = [a for a in sym_copy.atoms if a.index == Y.index][0]

    Best wishes
    Richard

     

     

  • RE: conquest will not boot on Mac

    Dear Thomas,

    I think you may have put ConQuest as part of your startup system, so it gets started when you log in.  If this is the case, a colleague suggests starting your Mac in safe mode, then removing ConQuest from the startup folder.  You can then run ConQuest when you want to, rather than automatically when you log in.

    Please let me know if this works.

    Best wishes
    Richard

     

  • RE: empty result for what seem a reasonable query

    Hi Pascal,

    I'm glad you've got it all sorted out.

    You'll be pleased to hear that the forthcoming release of the CSD (due later this month) will contain ADPs for many structures - around a quarter of them.  These will be accessible through the API as properties of the atoms of a molecule.

    Best wishes

    Richard

  • RE: empty result for what seem a reasonable query

    Hi Pascal,

    I worked out the SMARTS pattern by hand, since I've had a reasonable amount of experience with Daylight SMARTS.  The best reference for it is:

    http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

    There is a sketcher which I believe will display SMARTS, though I've not used it:

    https://pubchem.ncbi.nlm.nih.gov/edit2/index.html

    It appears to support setting variable atom and bond types.

    I'm afraid the CSD Python API does not support pickling of classes and instances.  Underneath the python is a set of bindings to native C++ objects and these cannot currently be serialised.  If I wish to save the results of a CSD search I will normally save the identifiers in a GCD file:

    with open('search_hits.gcd', 'w') as writer:
        writer.write('\n'.join(h.identifier for h in hits))
        writer.write('\n')

    Then if I wish to get the matching atoms, I will search the GCD file:

    hits = searcher.search('search_hits.gcd')

    Alternatively, you could use a CSV file, to record the identifier and atom numbers of each hit.

    Best wishes

    Richard

     

  • RE: empty result for what seem a reasonable query

    Hi Pascal,

    I think this is because the filters which we use to screen out structures which cannot match the query are not doing a particularly good job here, since the query structure contains nothing but carbon and hydrogen. This means a lot of graph searching needs to be done.  I can speed up the search quite a lot - approximately seven-fold - by recasting the query using fewer atoms and some extra constraints:

    s = SubstructureSearch()
    cps_substructure = QuerySubstructure()
    c1 = cps_substructure.add_atom('C')
    c2 = cps_substructure.add_atom('C')
    c3 = cps_substructure.add_atom('C')
    c4 = cps_substructure.add_atom('C')
    c5 = cps_substructure.add_atom('C')
    b1 = cps_substructure.add_bond(QueryBond(), c1, c2)
    b2 = cps_substructure.add_bond(QueryBond(), c2, c3)
    b3 = cps_substructure.add_bond(QueryBond(), c3, c4)
    b4 = cps_substructure.add_bond(QueryBond(), c4, c5)
    b5 = cps_substructure.add_bond(QueryBond(), c5, c1)
    b1.cyclic = b2.cyclic = b3.cyclic = b4.cyclic = b5.cyclic = True

    c11 = cps_substructure.add_atom('C')
    c12 = cps_substructure.add_atom('C')
    c13 = cps_substructure.add_atom('C')
    c14 = cps_substructure.add_atom('C')
    c15 = cps_substructure.add_atom('C')
    c11.num_hydrogens = c12.num_hydrogens = c13.num_hydrogens = c14.num_hydrogens = c15.num_hydrogens = 3

    b11 = cps_substructure.add_bond('Single', c1, c11)
    b12 = cps_substructure.add_bond('Single', c2, c12)
    b13 = cps_substructure.add_bond('Single', c3, c13)
    b14 = cps_substructure.add_bond('Single', c4, c14)
    b15 = cps_substructure.add_bond('Single', c5, c15)

    Slightly faster still is to use a SMARTS query:

    smarts = SMARTSSubstructure('[#6]1(-[CH3])~[#6](-[CH3])~[#6](-[CH3])~[#6](-[CH3])~[#6](-[CH3])1')

    Best wishes

    Richard

  • RE: normalise hydrogen position

    Dear David,

    I'm afraid in my last post I omitted the essential step of normalising hydrogen positions before testing the hits.  This should be done by:

    for h in hits:
        h.molecule.normalise_hydrogens()

    before testing the distance.

    Sorry about this.

    Best wishes

    Richard

     

  • RE: normalise hydrogen position

    Dear David,

    I'm afraid the API does not currently support this feature.  If your search has constraints or measurements which depend on hydrogen positions, these will be taken from the structure without applying hydrogen position normalisation, so may well not find hits of interest.  If, on the other hand, your search is independent of the hydrogen bond positions, then the results of the search may be normalised after the search, when measurements may be taken, or constraints applied.

    For example, you could perform a search with rather looser distance constraints to a hydrogen, then for each hit, normalise the molecule's hydrogens, and test with a more accurate measurement.

    Here is a worked example, searching for COOH dimers:

    from ccdc import search
    searcher = search.SubstructureSearch()
    query = search.SubstructureQuery()
    sub1 = searcher.add_substructure(search.SMARTSSubstructure('C(=O)-O-H'))
    sub2 = searcher.add_substructure(search.SMARTSSubstructure('C(=O)-O-H'))

    Normally we would search for the hydrogen bonds using a vdw corrected distance of (-5, 0) but if we do not trust the positions of the hydrogens we can make the range larger:

    searcher.add_distance_constraint('Dist1', sub1, 1, sub2, 3, (-5, 5))
    searcher.add_distance_constraint('Dist1', sub2, 1, sub1, 3, (-5, 5))

    hits = searcher.search(max_hit_structures=10, max_hits_per_structure=1)

    This gives 10 hits.

    [h.identifier for h in hits]

    [u'AADAMC', u'ABADIS', u'ABADIS02', u'ABAFER', u'ABEKUO', u'ABENEB', u'ABEQUU', u'ABEXUD', u'ABIDAR', u'ABIDAR01']

    Now we can filter using the correct distances:

    from ccdc.descriptors import MolecularDescriptors

    def correct_distance(h):
        atoms = h.match_atoms()
        o1, h2 = atoms[1], atoms[7]
        o2, h1 = atoms[5], atoms[3]
        return (
            MolecularDescriptors.atom_distance(o1, h2) - o1.vdw_radius - h2.vdw_radius < 0 and
            MolecularDescriptors.atom_distance(o2, h1) - o2.vdw_radius - h1.vdw_radius < 0
        )

    hits = [h for h in hits if correct_distance(h)]

    And now we have 6 hits:

    [h.identifier for h in hits]

    [u'AADAMC', u'ABADIS', u'ABEKUO', u'ABENEB', u'ABEQUU', u'ABEXUD']

    I hope this is helpful.

    Best wishes

    Richard