Hi,

I am looking for Cp* in the database via the API and I got zero result. I tried in conquest and I have several.

Did I made a mistake somewhere?


#from mercury_interface import MercuryInterface
from ccdc.search import TextNumericSearch
from ccdc.io import EntryReader
from ccdc.search import SubstructureSearch, QuerySubstructure, ConnserSubstructure
          
s = SubstructureSearch()
cps_substructure = QuerySubstructure()
c1 = cps_substructure.add_atom('C')
c2 = cps_substructure.add_atom('C')
c3 = cps_substructure.add_atom('C')
c4 = cps_substructure.add_atom('C')
c5 = cps_substructure.add_atom('C')
b1 = cps_substructure.add_bond('Any', c1, c2)
b2 = cps_substructure.add_bond('Any', c2, c3)
b3 = cps_substructure.add_bond('Any', c3, c4)
b4 = cps_substructure.add_bond('Any', c4, c5)
b5 = cps_substructure.add_bond('Any', c5, c1)

c11 = cps_substructure.add_atom('C')
c12 = cps_substructure.add_atom('C')
c13 = cps_substructure.add_atom('C')
c14 = cps_substructure.add_atom('C')
c15 = cps_substructure.add_atom('C')
b11 = cps_substructure.add_bond('Single', c1, c11)
b12 = cps_substructure.add_bond('Single', c2, c12)
b13 = cps_substructure.add_bond('Single', c3, c13)
b14 = cps_substructure.add_bond('Single', c4, c14)
b15 = cps_substructure.add_bond('Single', c5, c15)

h11 = cps_substructure.add_atom('H')
h12 = cps_substructure.add_atom('H')
h13 = cps_substructure.add_atom('H')
h21 = cps_substructure.add_atom('H')
h22 = cps_substructure.add_atom('H')
h23 = cps_substructure.add_atom('H')
h31 = cps_substructure.add_atom('H')
h32 = cps_substructure.add_atom('H')
h33 = cps_substructure.add_atom('H')
h41 = cps_substructure.add_atom('H')
h42 = cps_substructure.add_atom('H')
h43 = cps_substructure.add_atom('H')
h51 = cps_substructure.add_atom('H')
h52 = cps_substructure.add_atom('H')
h53 = cps_substructure.add_atom('H')
bh11 = cps_substructure.add_bond('Single', c11, h11)
bh12 = cps_substructure.add_bond('Single', c11, h12)
bh13 = cps_substructure.add_bond('Single', c11, h13)
bh21 = cps_substructure.add_bond('Single', c12, h21)
bh22 = cps_substructure.add_bond('Single', c12, h22)
bh23 = cps_substructure.add_bond('Single', c12, h23)
bh31 = cps_substructure.add_bond('Single', c13, h31)
bh32 = cps_substructure.add_bond('Single', c13, h32)
bh33 = cps_substructure.add_bond('Single', c13, h33)
bh41 = cps_substructure.add_bond('Single', c14, h41)
bh42 = cps_substructure.add_bond('Single', c14, h42)
bh43 = cps_substructure.add_bond('Single', c14, h43)
bh51 = cps_substructure.add_bond('Single', c15, h51)
bh52 = cps_substructure.add_bond('Single', c15, h52)
bh53 = cps_substructure.add_bond('Single', c15, h53)

s.add_substructure(cps_substructure)    

hits = s.search()#([h.identifier for h in texthits])  

print hits

Dear Pascal,

I'm afraid the API does not support the 'Any' keyword to mean any bond.  Instead this becomes a specification for an explicit 'Unknown' bond.  This is an error, and will be rectified in the next release of the API.  Instead, use the form

cps_substructure.add_bond(QueryBond(), c1, c2)

which will do the right search.

Best wishes

Richard

Hi,

Thanks, it works. However, it takes more than 2 hours to do the search????

 

Hi Pascal,

I think this is because the filters which we use to screen out structures which cannot match the query are not doing a particularly good job here, since the query structure contains nothing but carbon and hydrogen. This means a lot of graph searching needs to be done.  I can speed up the search quite a lot - approximately seven-fold - by recasting the query using fewer atoms and some extra constraints:

s = SubstructureSearch()
cps_substructure = QuerySubstructure()
c1 = cps_substructure.add_atom('C')
c2 = cps_substructure.add_atom('C')
c3 = cps_substructure.add_atom('C')
c4 = cps_substructure.add_atom('C')
c5 = cps_substructure.add_atom('C')
b1 = cps_substructure.add_bond(QueryBond(), c1, c2)
b2 = cps_substructure.add_bond(QueryBond(), c2, c3)
b3 = cps_substructure.add_bond(QueryBond(), c3, c4)
b4 = cps_substructure.add_bond(QueryBond(), c4, c5)
b5 = cps_substructure.add_bond(QueryBond(), c5, c1)
b1.cyclic = b2.cyclic = b3.cyclic = b4.cyclic = b5.cyclic = True

c11 = cps_substructure.add_atom('C')
c12 = cps_substructure.add_atom('C')
c13 = cps_substructure.add_atom('C')
c14 = cps_substructure.add_atom('C')
c15 = cps_substructure.add_atom('C')
c11.num_hydrogens = c12.num_hydrogens = c13.num_hydrogens = c14.num_hydrogens = c15.num_hydrogens = 3

b11 = cps_substructure.add_bond('Single', c1, c11)
b12 = cps_substructure.add_bond('Single', c2, c12)
b13 = cps_substructure.add_bond('Single', c3, c13)
b14 = cps_substructure.add_bond('Single', c4, c14)
b15 = cps_substructure.add_bond('Single', c5, c15)

Slightly faster still is to use a SMARTS query:

smarts = SMARTSSubstructure('[#6]1(-[CH3])~[#6](-[CH3])~[#6](-[CH3])~[#6](-[CH3])~[#6](-[CH3])1')

Best wishes

Richard

Thanks!

How did you get the smart string? I tried to use it but could find any tool to get the string from a 2d drawing.

Is there any way to save the result a search in a file and load it later? something similar to pickle but for a ccdc object.

Hi Pascal,

I worked out the SMARTS pattern by hand, since I've had a reasonable amount of experience with Daylight SMARTS.  The best reference for it is:

http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

There is a sketcher which I believe will display SMARTS, though I've not used it:

https://pubchem.ncbi.nlm.nih.gov/edit2/index.html

It appears to support setting variable atom and bond types.

I'm afraid the CSD Python API does not support pickling of classes and instances.  Underneath the python is a set of bindings to native C++ objects and these cannot currently be serialised.  If I wish to save the results of a CSD search I will normally save the identifiers in a GCD file:

with open('search_hits.gcd', 'w') as writer:
    writer.write('\n'.join(h.identifier for h in hits))
    writer.write('\n')

Then if I wish to get the matching atoms, I will search the GCD file:

hits = searcher.search('search_hits.gcd')

Alternatively, you could use a CSV file, to record the identifier and atom numbers of each hit.

Best wishes

Richard

 

Ok, all sorted. I attached the python script for the curious. Beware, I am crawling the IUCR website and that is probably against their T&C.

Here is what I am doing. Not nicely written but does the job:

I am using the CSD to search for structures containing a Cp* ligand. With each hit, i look for the publication link.
Then I crawled the website (I only keep Acta papers) to download the cif file.
From the matched atoms, I also write a script file for CRYSTALS to calculate a TLS model on these atoms.

The idea is to support the common sense that Cp* ligand are rigid and can be refined as such.

So if there are plans to include ADPs in the database that would be great :)

 

Hi Pascal,

I'm glad you've got it all sorted out.

You'll be pleased to hear that the forthcoming release of the CSD (due later this month) will contain ADPs for many structures - around a quarter of them.  These will be accessible through the API as properties of the atoms of a molecule.

Best wishes

Richard

You must be signed in to post in this forum.