I'm a newcomer to using the CSD API, so this may be a newbie question. On the other hand the code below is simple enough that it should work or tell the user they have done something stupid!

I'm searching the CSD specifically for structures containing disorder. The doesn't seem to be a one-step way to do this, so I'm having to subtract a list of structures that don't contain disorder from a list of all matching structures. 

from ccdc.search import TextNumericSearch, Search
search_settings = Search.Settings()
search_settings.has_3d_coordinates = True
search_settings.max_r_factor = 10.0
search_settings.no_metals = True
search_settings.not_polymeric = True
search_settings.only_organic = True
search_settings.no_errors = True
search_settings.no_disorder = True
search = TextNumericSearch()
search.settings = search_settings
search.add_citation(year=range(1970, 2017))
nodishits = [hit.identifier for hit in search.search()]
print("Found %i hits without disorder" % len(nodishits))

 

Problem 1: How to list all structures?  I settled on TextNumericSearch with a year bogus range, which will essentially give me everything. Without the add_citation, the search silently returns no results.  I wanted to specify "no higher atoms than Cl", but didn't see a way to do this in the API, so settled on no_metals instead. Oddly TextNumericSearch didn't seem to allow settings to be set in construction (which the other search methods did).

Problem 2: This search only returns 656 results, which makes no sense!

Any ideas?

Paul Hodgkinson

 

Hi Paul,

firstly you are right to spot that TextNumericSearch doesn't take a settings parameter.  I shall fix this for the next release.

Secondly, you can exclude entries with specific elements using the search_settings class:

search_settings.must_not_have_elements = [
    'Ar', 'K', ...
]

or, programatically:
from ccdc.molecule import Atom
ats = []
for i in range(18, 93):
    ats.append(Atom())
    ats[-1].atomic_number = i
search_settings.must_not_have_elements = [a.atomic_symbol for a in ats]


This is slightly clumsy because there is no atomic_number keyword for Atom creation.

Thirdly you don't have to use a bogus search to extract all entries of a database matching specific search criteria.  You can set up the search_settings as above, then iterate over the csd:

from ccdc import io
csd = io.EntryReader('csd')
for e in csd:
    if search_settings.test(e):
        do_something_interesting_with_entry(e)

Fourthly the year range is a pair of numbers, interpreted as an inclusive range, rather than the list of values you have given.  The first two values of the range have been used as the inclusive range, so you are getting hits from 1970-1971.  The query should be written:

search.add_citation(year=(1970, 2017))

It would have been helpful if the API had made this clear.

Lastly, it is perhaps counter-intuitive that a TextNumericSearch with no criteria returns no hits.  It would probably be better to raise an exception as the other classes do.  I shall consider this for the next release.


Thank you for your questions; it is feedback like this that helps me to make the API better.

Best wishes

Richard

 

 

Thanks Richard. That's very helpful.

It would be useful to have a "no elements heavier than" (if only syntactic sugar) since this is a common operation which is present in Conquest. The suggested work-arounds are not pretty!

Thanks for the "iterating over the CSD" idiom. That will provide a more elegant solution. It would be useful to include this in the documentation / script examples.

Paul

 

P.S. Another helpful warning would be if the max R-factor is <1. Properly speaking 5% should be entered as 0.05, where-as the API is clearly expecting the R-factor * 100. A search with 0.05 returned 1 result, where clearly the depositors had made a mistake! An R-factor of <1% would be physically ludicrous so there's no worry about generating bogus warnings.

 Thanks for the suggestions, Paul, I'll certainly consider them for the next release.

Best wishes
Richard

 

You must be signed in to post in this forum.