How to use SMARTS and SMILES in Mercury and the CSD Python API

Here we look at how you can use SMARTS and SMILES in Mercury and the CSD Python API to perform substructure searches, and generate 3D molecules from strings to support your cheminformatics work - including some new functions added in the 2021.3 release.

Using logical operators in SMARTS

In this release we have improved the handling of logic operators when working with SMARTS in Mercury - it’s now possible to use high and low priority AND statements, plus mix these with OR statements.

The logical operators used in SMARTS are;

  • ! exclamation = not
  • & ampersand = and (high priority)
  • ; semicolon = and (low priority)
  • , comma = or

Usability improvements to SMARTS and SMILES in Mercury and the CSD Python API

We have made a number of improvements to how SMARTS and SMILES are handled in the CSD software suite.

Non-standard aromatic atoms - the SMARTS language does not allow all atoms to be designated aromatic by writing them in lower case - one common example is Boron, B. In this update we allow you to specify an aromatic bond to Boron with a colon : symbol. SMILES strings generated for non-standard aromatic atoms use a colon : symbol in the same way.

Double bond E/Z or cis/trans constraints can be specified in SMARTS with / and \ bond markers around a double bond, so searches can specify if cis or trans is required.

E and Z markers on double bonds are now accessible via the CSD python API.

Stereochemistry operators can now be used in searches, where @ means anticlockwise and @@ means clockwise. For example, this might be used to specify D or L form amino acids.

How to use SMARTS and SMILES in Mercury

  • Select by SMARTS - use this search in Mercury to identify which atoms in a structure meet the search criteria. For example, searching refcode AABHTZ for the string; [C;D3!R,x2H1] returns 3 atoms, shown in the image below, which are aliphatic carbons, and are bonded to 3 non-hydrogen atoms and are not members of a ring, or have two ring bonds and one hydrogen attached.
  • SMILES to 3D molecule - generate a 3D molecule from a SMILES string, with its conformation informed by the empirical data in the CSD. Find this function under the “file” menu. Learn more about this functionality here.

 

How to use SMARTS and SMILES in the CSD Python API

  • Generate SMILES string for a molecule - quickly write a SMILES string for a molecule via the CSD Python API. This example has aromatic Boron atoms and trans double bonds.
>>> from ccdc import io
>>> io.MoleculeReader("csd").molecule("ABEHUK").components[1].to_string('smiles')
'c1cc:[B-](:cc1)/C=C/c1ccc(cc1)/C=C/[B-]1:ccccc:1'
  • Substructure search - search the CSD for structures matching a specific SMARTS query. This example matches atom chirality around C7 in CSD entry AACFAZ10
>>> import ccdc.search
>>> search = ccdc.search.SubstructureSearch()
>>> search.add_substructure(ccdc.search.SMARTSSubstructure("c[C@@]1(H)OCC=C1/C=N/N"))
>>> hits = search.search()
>>> hits[0].identifier
'AACFAZ10'

Learn more

If you are not familiar with SMARTS, Daylight has some great tutorials and examples to get started.

See what else is new in the 2021.3 release here.

Learn more about the CSD here, or about Mercury here.