• RE: finding covalently bonded clusters in crystal structures

     Hi Xiwen,

    I think the approach I would take here is to identify the atoms of interest in the unit cell, remove those not of interest and place the molecule back into the crystal:

    For definiteness, I'll write an example from the CSD refcode AXOHOL, which is a Vanadium-Tellurium-Oxygen polymeric structure.  I've put extensive comments on the lines of code, so I hope everything will be clear.

    from ccdc import io # import the input-output module
    csd = io.EntryReader('csd') # This is the whole, current CSD
    axohol = csd.crystal('AXOHOL') # This pulls out the crystal given its refcode.  

    unit_mol = axohol.molecule
    heaviest = unit_mol.heaviest_component # This strips out the solvent molecule

    axohol.molecule = heaviest # Replace the crystal's molecule with the desolvated molecule

    Now we can make the symmetry expanded representation:

    from ccdc import molecule
    shell = molecule.Molecule('SHELL') # Create a new, empty molecule

    expansions = [
        for symmop in axohol.symmetry_operators
        for i in range(0,2)
        for j in range(0,2)
        for k in range(0,2)
    ] # This is exactly as you wrote

    Now add the molecules to the shell:

    for mol in expansions:

    Now the shell contains all the atoms of the super-cell.  However, because there may be atoms lying in special postions, which is the case with AXOHOL, we must take care to remove duplicate atoms from the super-cell.  We will determine uniquness by atoms of the same element lying in the same position:

    from collections import defaultdict
    unique_dict = defaultdict(list)
    for a in shell.atoms:
        unique_dict[(a.atomic_symbol, str(a.coordinates))].append(a) # I use str() on the coordinates to avoid rounding errors failing to catch duplicates

    Now any we can remove the duplicates:

        a for l in unique_dict.values() for a in l[1:]

    Now we can make any extra bonds according to your criterion.  We could remove all bonds first, but I think what is more reasonable is to create a bond between the symmetrically related components, where there are two atoms close enough.

    from ccdc.descriptors import MolecularDescriptors

    to_be_bonded = [
        (a, b) for a in shell.atoms for b in shell.atoms
        if a.index < b.index
        and a not in b.neighbours
        and MolecularDescriptors.atom_distance(a, b) <= 2.21 # This is the ideal bond length for V-O bonds

    shell.add_bonds(('Single', a, b) for a, b in to_be_bonded)

    And now we can get the heaviest component of the super-cell:

    super_heaviest = shell.heaviest_component


    Please let me know if this works for you, and do not hesitate to come back with further questions.

    Best wishes

  • RE: finding covalently bonded clusters in crystal structures

    Dear Xiwen,

    I'm not entirely sure what you mean by 'covalently bonded clusters', so forgive me if I have gone down the wrong track in what follows, and please post some explanation of the term.

    In the API we have a number of methods for finding covalently bonded structures from a crystal.  For example,

    from ccdc import io
    csd = io.EntryReader('csd')
    crystal = csd.crystal('ABACUF')
    molecule = crystal.molecule

    Here the molecule derived from the crystal is all covalently bonded, but in the presence of solvents, ions or co-crystals the resulting molecule will form more than one discretely bonded structure.  These can be separated using the molecule's component attribute:

    components = molecule.components
    heaviest = components[0]

    It is also possible to extract the moiety lying in the asymetric unit, i.e the minimal representation of the crystal:

    asymm_mol = crystal.asymmetric_unit_molecule

    We can build larger representations of the molecule by applying the crystal's symmetry operators:

    from ccdc.molecule import Molecule
    shell = Molecule('SHELL')
    for symmop in crystal.symmetry_operators:

    And again, if these form non-covalently bonded structures, the heaviest component can be extracted from this shell molecule:

    heaviest_of_shell = shell.heaviest_component

    The bonds of these various molecules have been deduced from the crystallographic data and are generally considered to be very good.  If, however, you wish to use a bespoke distance range to define your clusters, then you will have to analyse the lengths of atom-atom distances according to your criterion.  If this is what you wish to do, please post again and I will post an example for you to work with.

    I hope this is helpful.

    Best wishes


  • RE: Z/Density query in Python

    Hi Jan-Joris,

    I'm afraid we have not implemented this sort of search in the API.  The rather clumsy work-around is to scan the whole CSD with the required criteria, for example:

    from ccdc import io

    csd = io.CrystalReader('csd')
    refcodes = []
    for c in csd:
        if c.z_value == required_z and c.z_prime == required_z_prime and c.calculated_density is not None and low_density < c.calculated_density < high_density:
            m = c.molecule
            if len(m.components) == required_components:
                atoms_with_sites = [a for a in m.atoms if a.coordinates is not None]
                if len(atoms_with_sites) == required_sited_atoms:

    with open('hit_structures.gcd', 'w') as writer:

    I'm sorry that there is nothing faster available in the API: I shall raise this at the next scheduling meeting to see whether it can be implemented for a future release.

    Thank you for the question: it is feedback like this that helps to make the API better.

    Best wishes

  • RE: H-bond propensities

    Hi Geoff,

    the HBond propensity module has not yet been released, and is currently being implemented.  There will need to be substantial scientific and usability testing before it is ready for release, and so I'm afraid I have no release date for it.

    Best wishes


  • RE: Combine queries?

     HI Pascal,

    I'm afraid the substructure search over the list of identifiers will be slow, since it will not be able to make use of the screens for the database and so will attempt the substructure match on each of the 39000 structures.

    You may find it faster in this case to perform both searches on the full database and combine them by hand:

    substructure_hits = substructure_search.search()
    text_hits = text_numeric_search.search()
    text_ids = set(h.identifier for h in text_hits)
    both_hits = [h for h in substructure_hits if h.identifier in text_ids]

    Let me know if this is better.

    Best wishes


  • RE: Combine queries?

     Hi Pascal,

    yes, all the searches except TextNumericSearch will accept a list of identifiers, a molecule, a crystal, an entry or a file.  By default it will search the CSD.  You will find it faster to do the TextNumericSearch first, since that is much faster than the substructure search.

    Best wishes

  • RE: Combine queries?

    Hi Pascal,

    It is not currently possible to make a combined search directly, though this is under consideration for a future release.  Instead you can perform a second search on the results of the first search.  For example:

    text_hits = text_numeric_search.search()
    # this gives 3234 hits
    smarts_hits = s.search([h.identifier for h in text_hits])
    # This gives 231 hits, of which 112 are different structures

    I hope this is helpful.  Please get back in touch if anything is unclear.

    Best wishes



  • RE: calculating atom distance across the unit cell boundary

     Hi Xiwen,

    you can generate molecules outside the unit cell with the translate parameter of the symmetric_molecule() method if you want explicit control of the symmety operator, or you can generate expanded representations of the crystal through methods such as packing_shell() and molecular_shell().  So, for example, you could use the molecular shell:

    mol = crystal.molecular_shell()
    atoms_of_interest = [a for a in mol.atoms if a.label == 'Te1']
    min_dist = min(MolecularDescriptors.atom_distance(a, b) for a in atoms_of_interest for b in atoms_of_interest if a != b)

    or by hand you can calculate all translated symmetric molecules:

    expansions = [
        crystal.symmetric_molecule(symmop, (i, j, k))
        for symmop in cry.symmetry_operators
        for i in range(-2, 3)
        for j in range(-2, 3)
        for k in range(-2, 3)

    min_dist = min(
        MolecularDescriptors.atom_distance(expansions[i].atom('Te1'), expansions[j].atom('Te1'))
        for i in range(len(expansions)) for j in range(len(expansions))
        if i != j

    Is either of these solutions what you are after?

    Best wishes



  • RE: calculating atom distance across the unit cell boundary

    Hi Xiwen,

    for this I think you will need to know the symmetry operators of the crystal, and to know which of the symmetry operators you are interested in using.  For example,

    from ccdc import io, descriptors
    csd = io.EntryReader('csd')
    crystal = csd.crystal('AABHTZ')
    base_mol = crystal.molecule
    symm_mol = crystal.symmetric_molecule(crystal.symmetry_operators[1])
    print descriptors.MolecularDescriptors.atom_distance(base_mol.atom('N2'), symm_mol.atom('N2')


    Hope this is helpful; please let me know if you would like any more help.

    Best wishes



  • RE: Strange results from TextNumericSearch

     Thanks for the suggestions, Paul, I'll certainly consider them for the next release.

    Best wishes