I'm trying to search for molecules containing a Cu bonded to 6 O atoms. When I build this structure in ConQuest and search CSD v5.36 and CSD v5.36 updates, 3D coordinates determined, Not disordered, No errors, I get 630 hits. But when I try to build the same query using the API I get only 355 hits. Am I going about this the right way?

from __future__ import print_function
from datetime import datetime
import numpy as np
import ccdc
import os
import glob

csd_dir = ccdc.io.csd_directory()
csd_and_updates = glob.glob(os.path.join(csd_dir, '*.inf'))
csd_and_updates_reader = ccdc.io.EntryReader(csd_and_updates)

cuo6 = ccdc.search.QuerySubstructure()
cu = cuo6.add_atom('Cu')
o, b = [], []
for i in range(6):
    o.append(cuo6.add_atom('O'))
    b.append(cuo6.add_bond('Single', cu, o[-1]))
substructure_search = ccdc.search.SubstructureSearch()
sub_id = substructure_search.add_substructure(cuo6)

substructure_search.settings.has_3d_coordinates = True
substructure_search.settings.no_disorder = True
substructure_search.settings.no_errors = True
#substructure_search.settings.max_r_factor = 0.05

start = datetime.now()
hits = substructure_search.search(csd_and_updates_reader, max_hits_per_structure=1)
end = datetime.now()

for hit in hits:
    print(hit.identifier)

print('{} hits in {:.1f} secs'.format(len(hits), (end-start).total_seconds()))

Why is my program not picking up, e.g. ABUXAW? I also don't understand why, when I set the Maximum r-factor to 0.05 I get 388 hits from ConQuest but none at all from the API.

 

Hi Christian,

You are certainly going about things the right way.

The API currently has different criteria for identifying error flagged structures compared to ConQuest - this is something we intend to review and fix in the next release. The error flag used by ConQuest is much more appropriate for regular use.

The max_r_factor value should be expressed in percent, i.e.

settings.max_r_factor = 5.0

I'll change the documentation to reflect this.

There do seem to be some discrepancies between the ConQuest and the API results, even when errors are ignored:  I shall look further into this.

Cheers

Richard

 

Hi Christian,

I've looked a little further into the discrepancy between ConQuest searching and API searching.  Once the difference between ConQuest and the API's notion of error is straightened out, ConQuest finds nine extra structures:

'BUDFEN', 'DENLOY', 'HOHKIA', 'HOHLEX', 'HOHLIB', 'IYEXUF', 'MURCIL', 'MURCOR', 'YAFRAY01'

These structures all contain 'Unknown' bond types between a Cu and an O, and so do not get selected by the API search.  The structures will be found by the ConQuest search which, in the absence of a 3D parameter will perform a 2D search.

There is a good case to be made that a 2D search mode would be useful in the API.  I shall raise this for consideration for the next API release.

In the meantime, if you want these structures, use ConQuest; if you can live without them continue to use the API.

 

Best wishes

Richard

Many thanks, Richard -- for my purposes I'm not too worried about a few missing structures, so I shall continue using the API for this. Great work, by the way!

Cheers,

Christian

Thanks, Christian.

Please carry on raising any difficulties you have, and making suggestions for ways in which we may improve the API.

Cheers

Richard

 

You must be signed in to post in this forum.