• RE: Differing results between mogul and API

    Hi Dave,

    Sorry for the delay in getting back to you, but I've just seen this post.

    There are a few things to clarify here.

    1) CSD data

    The CSD Python API, by default, uses the November release of the database, but is is possible to add multiple libraries to the calculation (e.g. the CSD updates) if you want to. 

    To have consistent results I've changed the Mogul settings in Mercury from CSD-System --> Mogul Settings... where in the Include library column I've selected only CSD 5.37.

    2) Standardisation of aromatic and delocalised bonds

    It is important to standardise your input molecule according to CSD conventions. Mogul does it automatically, while in the API you need two additional lines of code:

     

     

    -------------------------------------------------------------------------------------------------------------------------------------

    from ccdc.io import MoleculeReader
    from ccdc.conformer import GeometryAnalyser

    mol = MoleculeReader('test2_3d_opt.sdf')[0]
    mol.standardise_aromatic_bonds()
    mol.standardise_delocalised_bonds()

    geometry_analysed_mol = engine.analyse_molecule(mol)

    for b in geometry_analysed_mol.analysed_bonds:
        print b.atom_labels , b.atom_indices , b.value , b.mean , b.median , b.z_score , b.generalised , b.nhits

    -------------------------------------------------------------------------------------------------------------------------------------

    I hope that helps.

    Best wishes,

    Ilenia

     

     

  • RE: Finding structures that contain two molecules that are different in protonation state only

    Hi Marko,

    I've recently tackled a similar problem, but I was only interested in two-component structures in the CSD. The approach I followed made use of the SubstructureSearch functionality and is summarised below:

    1. Search the CSD to find all two-components structures that are organic and have 3D coordinates, and store resulting refcodes in a .gcd file for future usage
    2. Split each entry in its heaviest component and smallest component
    3. Create and save a Substructure Screen of all heaviest components (please note that this functionality will be available with the November release of the CSD Python API, but it helped to speed up the search)
    4. Create a dictionary of the smallest components (key: CSD identifier, value: Molecule object of the component). A Substructure Screen can be used here, too.
    5. Perform the substructure search:
      1. Start by using the heaviest component as a query
      2. Loop over all the “heavy_hits” and check that the smallest component is the same as the one in the query. Used the dictionary created above in this step.

    N.B. I check that query and hit have the same number of heavy atoms. However, it’s likely that you get false positives, as in same cases query and hit differ in stereochemistry rather than in the number of hydrogen atoms.

    I hope that helps.

    Regards,

    Ilenia