I'm interested in finding structures in CSD that contain two or more molecules that are different in protonation state only, e.g. contain both neutral molecule of an acid and mono-anion of an acid. However, I don't want to limit search only to acids, but to include all compounds. So the molecules belonging to the same structure should have the same connectivity but to differ in one hydrogen atom.

Does anyone have suggestion how this search could be done?

Thank you in advance,

Hi Marko,

I've recently tackled a similar problem, but I was only interested in two-component structures in the CSD. The approach I followed made use of the SubstructureSearch functionality and is summarised below:

  1. Search the CSD to find all two-components structures that are organic and have 3D coordinates, and store resulting refcodes in a .gcd file for future usage
  2. Split each entry in its heaviest component and smallest component
  3. Create and save a Substructure Screen of all heaviest components (please note that this functionality will be available with the November release of the CSD Python API, but it helped to speed up the search)
  4. Create a dictionary of the smallest components (key: CSD identifier, value: Molecule object of the component). A Substructure Screen can be used here, too.
  5. Perform the substructure search:
    1. Start by using the heaviest component as a query
    2. Loop over all the “heavy_hits” and check that the smallest component is the same as the one in the query. Used the dictionary created above in this step.

N.B. I check that query and hit have the same number of heavy atoms. However, it’s likely that you get false positives, as in same cases query and hit differ in stereochemistry rather than in the number of hydrogen atoms.

I hope that helps.



You must be signed in to post in this forum.