I'm very sorry about the long delay in getting back to you.
It sounds as if you need to do a substructure search on the CSD which for 22,000 structures is going to take some time.
This script will do something like what you want. It will process your target file, search the CSD for structures containing your target molecule and will write the hits to a results file. The hits will contain matches where your target molecule is a proper substructure of the matched structure; if you need exact matches, then please let me know.
from ccdc import io, search, utilities
target_file = '/path/to/your/file/of/structures.mol2'
hit_file = '/path/to/where/you/want/to/put/results.gcd'
searcher = search.SubstructureSearch()
hits = searcher.search()
print 'CSD contained %d hits for %s' % (len(hits), m.identifier)
return [h.identifier for h in hits]
reader = io.MoleculeReader(target_file)
start = time.time()
with open(hit_file, 'w') as gcd_writer:
for i, m in enumerate(reader):
utilities.Timer.progress(start, i+1, len(reader), '')
As I say, 20,000 structures is a lot of searching. It may be advisable to split the input file into smaller chunks and process them separately.
Once again, I am terribly sorry about the delay in answering.
If you need any more help, please get back in touch.
similarity searching of the CSD follows the implementation described in:
that is to say by comparing fingerprints of the probe molecule and of a structures from the CSD.
I hope this is enough for your project; if not, please get back to me and I will happily provide more information.
if you want to run scripts which take longer than three minutes, you can invoke python directly from a shell prompt:
C:\Program Files (x86)\CCDC\Python_AP I_2018\miniconda\Scripts\activate
This will not need the mercury interface.
There is no direct method in the API to search on spacegroup, so one has to fall back on enumeration of the database. I believe the following will do what you want:
from ccdc import io, utilities
csd = io.EntryReader('csd')
start = time.time()
hits = 
for i, e in enumerate(csd):
if i and i % 1000 == 0:
utilities.Timer.progress(start, i, len(csd), '%d hits' % len(hits))
if not e.has_3d_structure:
c = e.crystal
if not c.spacegroup_symbol.startswith('R'):
This works by rejecting structures with no 3d information (they won't support a void volume calculation) and rejecting structures which do not have a rhombohdral space group. If you like you can change this line to test different space groups.
At the end you will have a list of void_volume, refcode pairs which you can write in any format you care.
Hope this is helpful,
there is a workaround to this problem along the lines of:
from ccdc import search
smiles = 'something interesting'
searcher = search.SubstructureSearch()
hits = [h for h in searcher.search() if len(h.molecule.components) == 1 and h.molecule.smiles = smiles]
That's good to know, Puck. Feel free to raise any other issues you have with the API.
there are a couple of issues with this script:
Firstly, the no_disorder setting actually has three distinct values corresponding to False (don't check disorder at all), True (check for significant disorder, i.e of heavy atoms), or 'All' (check for any disorder). Since you have specified True, structures like AARBOX will pass as the disorder is only in the position of the hydrogens.
Secondly, you are checking the crystal for disorder, rather than the entry. This is a less specific test than that of the entry, as the editorial decisions made for the entry are lost when constructing the crystal, and this is why ABABUB is found.
There is a third problem for which I must apologise: there is a bug in the processing of the test for entries which, coincidentally I fixed today. This will appear in the next release of the API. This concerns entries with disorder but where the editors have not suppressed any atoms.
A better check would be:
for entry in io.EntryReader('csd'):
if settings.test(entry) and not entry.has_disorder and len(entry.molecule.components) == 1:
Please let me know how you get on with this.
I'm sorry to say you have come across a bug in the interaction between our licensing and our implementation. The orth method has inadvertently taken its definition from a library with restrictions. This will certainly be fixed in a forthcoming release, probably version 1.5.2 in about three months' time.
In the meantime I have provided a 'monkey patch' which will change the use of 'orth' to one in a library without these restrictions. This should be applied in any script you use to examine the coordinates of a molecule's atoms.
Please let me know if anything is unclear about this, and my apologies for allowing this bug to slip into the release.
# Monkey-patch around the licence restriction
from ccdc.utilities import _private_importer
PackingSimilarityLib.Point = ChemistryLib.Point
this did come as a bit of a surprise to me, but after looking at it a bit I think I am clear as to how it happened.
The symmetry operator returned by the substructure search returns a composite of the unit cell translations and the underlying symmetry operator. In this instance it represents the symmetry operator 1/3+y,2/3-x+y,2/3+z with a translation of (0, 0, 1). When putting the translation component in to the operator we end up with 1 + 2/3 +z, which is rounded to 1.66667. When applying the symetry operator to the crystal to create the symmetric_molecule, a check is made that the operator is one of the symmetry operators of the crystal, and 0.66667 is not exactly equal to 1/3.
The solution here is very simple: you can force the symmetry operator to be applied:
>>> crystal = hits.crystal
>>> symmop = hits.match_symmetry_operators()[-1]
>>> symmetric_molecule = crystal.symmetric_molecule(symmop, force=True)
I shall consider whether there may be a better representation of symmetry operators, with the translation kept separate from the operator for some future release.
Thank you for raising this interesting case.
this occurs because the search is performed over the all molecules of the crystal as can be seen by inspecting the symmetry operators of the match:
>>> print hits.match_symmetry_operators()[-1]
The atoms provided by match_atoms() are of the single, central molecule of the crystal. I do think that this is not ideal, and I would like to provide some method for capturing the actually matched atoms in some future release.
In the meantime you will can get the matched atoms in a rather roundabout way:
>>> sym_copy = hits.crystal.symmetric_molecule(hits.match_symmetry_operators()[-1])
>>> sym_Y = sym_copy.atom(Y.label)
>>> print ccdc.descriptors.MolecularDescriptors.atom_distance (X, sym_Y)
This will work as long as the atom labels of the molecule are distinct. If not the match will have to be done on the index of the atoms:
>>> sym_Y = [a for a in sym_copy.atoms if a.index == Y.index]