• RE: conquest will not boot on Mac

    Dear Thomas,

    I think you may have put ConQuest as part of your startup system, so it gets started when you log in.  If this is the case, a colleague suggests starting your Mac in safe mode, then removing ConQuest from the startup folder.  You can then run ConQuest when you want to, rather than automatically when you log in.

    Please let me know if this works.

    Best wishes
    Richard

     

  • RE: empty result for what seem a reasonable query

    Hi Pascal,

    I'm glad you've got it all sorted out.

    You'll be pleased to hear that the forthcoming release of the CSD (due later this month) will contain ADPs for many structures - around a quarter of them.  These will be accessible through the API as properties of the atoms of a molecule.

    Best wishes

    Richard

  • RE: empty result for what seem a reasonable query

    Hi Pascal,

    I worked out the SMARTS pattern by hand, since I've had a reasonable amount of experience with Daylight SMARTS.  The best reference for it is:

    http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

    There is a sketcher which I believe will display SMARTS, though I've not used it:

    https://pubchem.ncbi.nlm.nih.gov/edit2/index.html

    It appears to support setting variable atom and bond types.

    I'm afraid the CSD Python API does not support pickling of classes and instances.  Underneath the python is a set of bindings to native C++ objects and these cannot currently be serialised.  If I wish to save the results of a CSD search I will normally save the identifiers in a GCD file:

    with open('search_hits.gcd', 'w') as writer:
        writer.write('\n'.join(h.identifier for h in hits))
        writer.write('\n')

    Then if I wish to get the matching atoms, I will search the GCD file:

    hits = searcher.search('search_hits.gcd')

    Alternatively, you could use a CSV file, to record the identifier and atom numbers of each hit.

    Best wishes

    Richard

     

  • RE: empty result for what seem a reasonable query

    Hi Pascal,

    I think this is because the filters which we use to screen out structures which cannot match the query are not doing a particularly good job here, since the query structure contains nothing but carbon and hydrogen. This means a lot of graph searching needs to be done.  I can speed up the search quite a lot - approximately seven-fold - by recasting the query using fewer atoms and some extra constraints:

    s = SubstructureSearch()
    cps_substructure = QuerySubstructure()
    c1 = cps_substructure.add_atom('C')
    c2 = cps_substructure.add_atom('C')
    c3 = cps_substructure.add_atom('C')
    c4 = cps_substructure.add_atom('C')
    c5 = cps_substructure.add_atom('C')
    b1 = cps_substructure.add_bond(QueryBond(), c1, c2)
    b2 = cps_substructure.add_bond(QueryBond(), c2, c3)
    b3 = cps_substructure.add_bond(QueryBond(), c3, c4)
    b4 = cps_substructure.add_bond(QueryBond(), c4, c5)
    b5 = cps_substructure.add_bond(QueryBond(), c5, c1)
    b1.cyclic = b2.cyclic = b3.cyclic = b4.cyclic = b5.cyclic = True

    c11 = cps_substructure.add_atom('C')
    c12 = cps_substructure.add_atom('C')
    c13 = cps_substructure.add_atom('C')
    c14 = cps_substructure.add_atom('C')
    c15 = cps_substructure.add_atom('C')
    c11.num_hydrogens = c12.num_hydrogens = c13.num_hydrogens = c14.num_hydrogens = c15.num_hydrogens = 3

    b11 = cps_substructure.add_bond('Single', c1, c11)
    b12 = cps_substructure.add_bond('Single', c2, c12)
    b13 = cps_substructure.add_bond('Single', c3, c13)
    b14 = cps_substructure.add_bond('Single', c4, c14)
    b15 = cps_substructure.add_bond('Single', c5, c15)

    Slightly faster still is to use a SMARTS query:

    smarts = SMARTSSubstructure('[#6]1(-[CH3])~[#6](-[CH3])~[#6](-[CH3])~[#6](-[CH3])~[#6](-[CH3])1')

    Best wishes

    Richard

  • RE: normalise hydrogen position

    Dear David,

    I'm afraid in my last post I omitted the essential step of normalising hydrogen positions before testing the hits.  This should be done by:

    for h in hits:
        h.molecule.normalise_hydrogens()

    before testing the distance.

    Sorry about this.

    Best wishes

    Richard

     

  • RE: normalise hydrogen position

    Dear David,

    I'm afraid the API does not currently support this feature.  If your search has constraints or measurements which depend on hydrogen positions, these will be taken from the structure without applying hydrogen position normalisation, so may well not find hits of interest.  If, on the other hand, your search is independent of the hydrogen bond positions, then the results of the search may be normalised after the search, when measurements may be taken, or constraints applied.

    For example, you could perform a search with rather looser distance constraints to a hydrogen, then for each hit, normalise the molecule's hydrogens, and test with a more accurate measurement.

    Here is a worked example, searching for COOH dimers:

    from ccdc import search
    searcher = search.SubstructureSearch()
    query = search.SubstructureQuery()
    sub1 = searcher.add_substructure(search.SMARTSSubstructure('C(=O)-O-H'))
    sub2 = searcher.add_substructure(search.SMARTSSubstructure('C(=O)-O-H'))

    Normally we would search for the hydrogen bonds using a vdw corrected distance of (-5, 0) but if we do not trust the positions of the hydrogens we can make the range larger:

    searcher.add_distance_constraint('Dist1', sub1, 1, sub2, 3, (-5, 5))
    searcher.add_distance_constraint('Dist1', sub2, 1, sub1, 3, (-5, 5))

    hits = searcher.search(max_hit_structures=10, max_hits_per_structure=1)

    This gives 10 hits.

    [h.identifier for h in hits]

    [u'AADAMC', u'ABADIS', u'ABADIS02', u'ABAFER', u'ABEKUO', u'ABENEB', u'ABEQUU', u'ABEXUD', u'ABIDAR', u'ABIDAR01']

    Now we can filter using the correct distances:

    from ccdc.descriptors import MolecularDescriptors

    def correct_distance(h):
        atoms = h.match_atoms()
        o1, h2 = atoms[1], atoms[7]
        o2, h1 = atoms[5], atoms[3]
        return (
            MolecularDescriptors.atom_distance(o1, h2) - o1.vdw_radius - h2.vdw_radius < 0 and
            MolecularDescriptors.atom_distance(o2, h1) - o2.vdw_radius - h1.vdw_radius < 0
        )

    hits = [h for h in hits if correct_distance(h)]

    And now we have 6 hits:

    [h.identifier for h in hits]

    [u'AADAMC', u'ABADIS', u'ABEKUO', u'ABENEB', u'ABEQUU', u'ABEXUD']

    I hope this is helpful.

    Best wishes

    Richard

     

     

     

     

     

  • RE: empty result for what seem a reasonable query

    Dear Pascal,

    I'm afraid the API does not support the 'Any' keyword to mean any bond.  Instead this becomes a specification for an explicit 'Unknown' bond.  This is an error, and will be rectified in the next release of the API.  Instead, use the form

    cps_substructure.add_bond(QueryBond(), c1, c2)

    which will do the right search.

    Best wishes

    Richard

  • RE: Filter components of molecules?

    Hi Ryan,

    the simplest way to do this is to use the attribute 'heaviest_component'.  This will work as long as the solvent molecules have molecular weights lower than that of the component of interest.

    All discrete components of the molecule are accessible via the attribute 'components'.  These will be sorted by molecular weight, so can be inspected to find the structure of interest.

    Hope this is clear and helpful.

    Best wishes

    Richard

     

  • RE: finding covalently bonded clusters in crystal structures

     HI Xiwen,

    You are right to remove the atoms with no fractional coordinates, a case I hadn't considered when writing the code.  This occurs where there is some disorder in the structure.

    When I run the code with VOLMAL and IXASUW I get results in under a minute, with the longest step being the calculation of to_be_bonded (around 35 - 40 seconds).  Where did you put your extra line?  I put mine in the loop which adds expansions:

        for mol in expansions:
            mol.remove_atoms(
                a for a in mol.atoms if (a.atomic_symbol, str(a.coordinates)) in present
            )
            mol.remove_atoms(
                a for a in mol.atoms if a.fractional_coordinates is None
            )
            atoms_not_of_interest=[
                a for a in mol.atoms
                if a.fractional_coordinates.x<0 or a.fractional_coordinates.x>2
                or a.fractional_coordinates.y<0 or a.fractional_coordinates.y>2
                or a.fractional_coordinates.z<0 or a.fractional_coordinates.z>2
            ]
            mol.remove_atoms(atoms_not_of_interest)
            present |= set((a.atomic_symbol, str(a.coordinates)) for a in mol.atoms)
            shell.add_molecule(mol)

    though it could be an extra case in the atoms_not_of_interest comprehension.

    By the way, mol.atoms.sort() won't do anything useful.  The atoms property of the molecule is calculated afresh every time you call mol.atoms.  Better would be to sort a list of atoms, for example:

    atoms_not_of_interest.sort(...)
    mol.remove_atoms(atoms_not_of_interest)

    I've attached the script I use for testing these things.  Can you have a look at it and see if I'm doing anything terribly wrong?

    Best wishes

    Richard

  • RE: finding covalently bonded clusters in crystal structures

     Hi Xiwen,

    I've had a look at the code, and found that it is the remove_atoms() calls that take a long time - around half an hour to remove 8000 atoms.  I think this is to do with reshuffling atom indices for every removal.  I have tried sorting the atoms in decreaising value if index, and that does make a huge difference (but is an ugly trick) so I've come up with a scheme to do the deletions on the smaller expansion molecules before appending to the shell:

        present = set()
        for mol in expansions:
            mol.remove_atoms(
                a for a in mol.atoms if (a.atomic_symbol, str(a.coordinates)) in present
            )
            atoms_not_of_interest=[
                a for a in mol.atoms
                if a.fractional_coordinates.x<0 or a.fractional_coordinates.x>2
                or a.fractional_coordinates.y<0 or a.fractional_coordinates.y>2
                or a.fractional_coordinates.z<0 or a.fractional_coordinates.z>2
            ]
            mol.remove_atoms(atoms_not_of_interest)
            present |= set((a.atomic_symbol, str(a.coordinates)) for a in mol.atoms)
            shell.add_molecule(mol)

    For LAQNOH this fragment runs in less than four seconds.

    Best wishes

    Richard