• RE: finding covalently bonded clusters in crystal structures

     Hi Xiwen,

    I've had a look at the code, and found that it is the remove_atoms() calls that take a long time - around half an hour to remove 8000 atoms.  I think this is to do with reshuffling atom indices for every removal.  I have tried sorting the atoms in decreaising value if index, and that does make a huge difference (but is an ugly trick) so I've come up with a scheme to do the deletions on the smaller expansion molecules before appending to the shell:

        present = set()
        for mol in expansions:
                a for a in mol.atoms if (a.atomic_symbol, str(a.coordinates)) in present
                a for a in mol.atoms
                if a.fractional_coordinates.x<0 or a.fractional_coordinates.x>2
                or a.fractional_coordinates.y<0 or a.fractional_coordinates.y>2
                or a.fractional_coordinates.z<0 or a.fractional_coordinates.z>2
            present |= set((a.atomic_symbol, str(a.coordinates)) for a in mol.atoms)

    For LAQNOH this fragment runs in less than four seconds.

    Best wishes


  • RE: finding covalently bonded clusters in crystal structures

    HI Xiwen,

    sorry about the delay in answering.  I didn't notice that you had replied.

    Could you please post the code that you are running so I can try to see what's going wrong with it?



  • RE: finding covalently bonded clusters in crystal structures

    Hi Xiwen,

    This message occurs when an attempt is made to add a bond that has already been made in the molecule, so either the to_be_bonded_2 list contains duplicates, or it contains a pair that is already bonded in the shell molecule.  You can check for either case when adding the bonds, by adding them one-by-one with an appropriate check:

    for a, c in to_be_bonded_2:
        if a not in c.neighbours:
            shell.add_bond('Single', a, c)

    but it would be better to eliminate the duplicates when constructing this list.

    I'm afraid there was a bug in the function I wrote earlier,  which should read:

    def to_be_bonded(shell, distance, *atom_types):
        '''Pairs of atoms of the correct type with a distance criterion.'''
        return [
            (a, b) for a in shell.atoms for b in shell.atoms
            if set([a.atomic_symbol, b.atomic_symbol]) == set(atom_types)
            and a.index < b.index
            and a not in b.neighbours
            and MolecularDescriptors.atom_distance(a, b) <= distance

    to_be_bonded_1 = to_be_bonded(shell, 2.6, 'V', 'O')
    to_be_bonded_2 = to_be_bonded(shell, 2.8, 'O', 'Te')

    I have tested it this time!  This works correctly for AXOHOL, IKELOA, YOTHUM, VOLMAL and VOLMEP.

    Have a good weekend.

    Best wishes

  • RE: finding covalently bonded clusters in crystal structures

     Dear Xiwen,

    it's a pleasure for me to be able to help people like you to do interesting research with the API.  There are a couple of things that need to be changed in the code:

    Firstly, we don't know in which order the bonds will be defined, so we have to check for both cases O-V and V-O.  This is most easily done with a set comparison.  It's probably worth writing a little function for this, to save copy and paste errors, and to make life easier if you wish to study other crystal structures:

    def to_be_bonded(shell, *atom_types, distance=2.8):
        '''Pairs of atoms of the correct type with a distance criterion.'''
        return [
            (a, c) for a in shell.atoms for b in shell.atoms
            if set([a.atomic_symbol, c.atomic_symbol]) < set(atom_types)
            and a.index < c.index
            and a not in c.neighbours
            and MD.atom_distance(a, c) <= distance

    to_be_bonded_1 = to_be_bonded(shell, 'V', 'O', distance=2.6)
    to_be_bonded_2 = to_be_bonded(shell, 'O', 'Te', distance=2.8)

    The last line was accidentally left in the code.  When I wrote and tested the code, to check that I had done everything correctly I wrote the molecules out so I could see them in mercury.  You are right, it was a MoleculeWriter:

    with io.MoleculeWriter('/tmp/VOTe.mol2') as writer:

    I find this useful when developing molecular or crystallographic code.

    Your second expression for the expansions will not apply any translations to the symmetry operators, so is more concisely and clearly written as:

    expansions = [axohol.symmetric_molecule(symmop) for symmop in axohol.symmetry_operators]

    For counting the atoms in the unit cell and the supercell it is probably easier
    and clearer to use a generator expression rather than a for loop:

    max_unit_cell = max(len(c.atoms) for c in unit_cell.components)
    min_unit_cell = min(len(c.atoms) for c in unit_cell.components)

    This, though, is purely a matter of taste.

    May I ask what is the nature of the research you are doing?  

    Best wishes,

  • RE: finding covalently bonded clusters in crystal structures

     Hi Xiwen,

    I think the approach I would take here is to identify the atoms of interest in the unit cell, remove those not of interest and place the molecule back into the crystal:

    For definiteness, I'll write an example from the CSD refcode AXOHOL, which is a Vanadium-Tellurium-Oxygen polymeric structure.  I've put extensive comments on the lines of code, so I hope everything will be clear.

    from ccdc import io # import the input-output module
    csd = io.EntryReader('csd') # This is the whole, current CSD
    axohol = csd.crystal('AXOHOL') # This pulls out the crystal given its refcode.  

    unit_mol = axohol.molecule
    heaviest = unit_mol.heaviest_component # This strips out the solvent molecule

    axohol.molecule = heaviest # Replace the crystal's molecule with the desolvated molecule

    Now we can make the symmetry expanded representation:

    from ccdc import molecule
    shell = molecule.Molecule('SHELL') # Create a new, empty molecule

    expansions = [
        for symmop in axohol.symmetry_operators
        for i in range(0,2)
        for j in range(0,2)
        for k in range(0,2)
    ] # This is exactly as you wrote

    Now add the molecules to the shell:

    for mol in expansions:

    Now the shell contains all the atoms of the super-cell.  However, because there may be atoms lying in special postions, which is the case with AXOHOL, we must take care to remove duplicate atoms from the super-cell.  We will determine uniquness by atoms of the same element lying in the same position:

    from collections import defaultdict
    unique_dict = defaultdict(list)
    for a in shell.atoms:
        unique_dict[(a.atomic_symbol, str(a.coordinates))].append(a) # I use str() on the coordinates to avoid rounding errors failing to catch duplicates

    Now any we can remove the duplicates:

        a for l in unique_dict.values() for a in l[1:]

    Now we can make any extra bonds according to your criterion.  We could remove all bonds first, but I think what is more reasonable is to create a bond between the symmetrically related components, where there are two atoms close enough.

    from ccdc.descriptors import MolecularDescriptors

    to_be_bonded = [
        (a, b) for a in shell.atoms for b in shell.atoms
        if a.index < b.index
        and a not in b.neighbours
        and MolecularDescriptors.atom_distance(a, b) <= 2.21 # This is the ideal bond length for V-O bonds

    shell.add_bonds(('Single', a, b) for a, b in to_be_bonded)

    And now we can get the heaviest component of the super-cell:

    super_heaviest = shell.heaviest_component


    Please let me know if this works for you, and do not hesitate to come back with further questions.

    Best wishes

  • RE: finding covalently bonded clusters in crystal structures

    Dear Xiwen,

    I'm not entirely sure what you mean by 'covalently bonded clusters', so forgive me if I have gone down the wrong track in what follows, and please post some explanation of the term.

    In the API we have a number of methods for finding covalently bonded structures from a crystal.  For example,

    from ccdc import io
    csd = io.EntryReader('csd')
    crystal = csd.crystal('ABACUF')
    molecule = crystal.molecule

    Here the molecule derived from the crystal is all covalently bonded, but in the presence of solvents, ions or co-crystals the resulting molecule will form more than one discretely bonded structure.  These can be separated using the molecule's component attribute:

    components = molecule.components
    heaviest = components[0]

    It is also possible to extract the moiety lying in the asymetric unit, i.e the minimal representation of the crystal:

    asymm_mol = crystal.asymmetric_unit_molecule

    We can build larger representations of the molecule by applying the crystal's symmetry operators:

    from ccdc.molecule import Molecule
    shell = Molecule('SHELL')
    for symmop in crystal.symmetry_operators:

    And again, if these form non-covalently bonded structures, the heaviest component can be extracted from this shell molecule:

    heaviest_of_shell = shell.heaviest_component

    The bonds of these various molecules have been deduced from the crystallographic data and are generally considered to be very good.  If, however, you wish to use a bespoke distance range to define your clusters, then you will have to analyse the lengths of atom-atom distances according to your criterion.  If this is what you wish to do, please post again and I will post an example for you to work with.

    I hope this is helpful.

    Best wishes


  • RE: Z/Density query in Python

    Hi Jan-Joris,

    I'm afraid we have not implemented this sort of search in the API.  The rather clumsy work-around is to scan the whole CSD with the required criteria, for example:

    from ccdc import io

    csd = io.CrystalReader('csd')
    refcodes = []
    for c in csd:
        if c.z_value == required_z and c.z_prime == required_z_prime and c.calculated_density is not None and low_density < c.calculated_density < high_density:
            m = c.molecule
            if len(m.components) == required_components:
                atoms_with_sites = [a for a in m.atoms if a.coordinates is not None]
                if len(atoms_with_sites) == required_sited_atoms:

    with open('hit_structures.gcd', 'w') as writer:

    I'm sorry that there is nothing faster available in the API: I shall raise this at the next scheduling meeting to see whether it can be implemented for a future release.

    Thank you for the question: it is feedback like this that helps to make the API better.

    Best wishes

  • RE: H-bond propensities

    Hi Geoff,

    the HBond propensity module has not yet been released, and is currently being implemented.  There will need to be substantial scientific and usability testing before it is ready for release, and so I'm afraid I have no release date for it.

    Best wishes


  • RE: Combine queries?

     HI Pascal,

    I'm afraid the substructure search over the list of identifiers will be slow, since it will not be able to make use of the screens for the database and so will attempt the substructure match on each of the 39000 structures.

    You may find it faster in this case to perform both searches on the full database and combine them by hand:

    substructure_hits = substructure_search.search()
    text_hits = text_numeric_search.search()
    text_ids = set(h.identifier for h in text_hits)
    both_hits = [h for h in substructure_hits if h.identifier in text_ids]

    Let me know if this is better.

    Best wishes


  • RE: Combine queries?

     Hi Pascal,

    yes, all the searches except TextNumericSearch will accept a list of identifiers, a molecule, a crystal, an entry or a file.  By default it will search the CSD.  You will find it faster to do the TextNumericSearch first, since that is much faster than the substructure search.

    Best wishes