Overview
SMILES is a one-dimensional string format for representing chemical molecular structures. It provides a linearized, serialized representation of 3D molecular structures, functioning like a depth-first traversal of the molecular graph. Similar to a connection table, SMILES identifies the nodes (atoms) and edges (bonds) of a molecular graph.
Key Characteristics
- Human-readable: Built for human readability (versus InChI for hierarchical representation and machine readability)
- Compact: More compact than other representations (3D coordinates, connectivity tables)
- Simple syntax: A language with simple syntax and structure, making it relatively easy to learn and use for chemists and researchers
- Flexible: Both linear and cyclic structures can be represented in many different valid ways
Basic Syntax
Atomic Symbols
SMILES uses standard atomic symbols with implied hydrogen atoms:
C
(methane, CH4)N
(ammonia, NH3)O
(water, H2O)P
(phosphine, PH3)S
(hydrogen sulfide, H2S)Cl
(hydrogen chloride, HCl)
Bracket notation: Elements outside the organic subset must be shown in brackets, e.g., [Pt]
for elemental platinum. The organic subset (B
, C
, N
, O
, P
, S
, F
, Cl
, Br
, and I
) can omit brackets.
Bond Representation
Bonds are represented by symbols:
- Single bond:
-
(usually omitted)

- Double bond:
=

- Triple bond:
#

- Aromatic bond:
:
(usually omitted, or*
for aromatic rings)

- Delocalized bond:
.

Structural Features
- Branches: Enclosed in parentheses and can be nested

- Cyclic structures: Written by breaking bonds and using numbers to indicate bond connections
- Aromaticity: Lower case letters are used for atoms in rings to denote aromaticity
- Formal charges: Indicated by placing the charge in brackets after the atom symbol, e.g.,
[C+]
,[C-]
, or[C-2]
Stereochemistry and Isomers
Isotope Notation
Isotope notation specifies the exact isotope of an element and comes before the element within square brackets, e.g., [13C]
for carbon-13.
Double Bond Stereochemistry
Directional bonds can be specified using \
and /
symbols to indicate the stereochemistry of double bonds:
C/C=C\C
vsC/C=C/C
Tetrahedral Chirality
Chirality around tetrahedral centers uses @
and @@
symbols:
N[C@](C)(F)C(=O)O
vsN[C@@](F)(C)C(=O)O
- Anti-clockwise counting vs clockwise counting
@
and@@
are shorthand for@TH1
and@TH2
, respectively

Advanced Stereochemistry
More general notation for other stereocenters:
@AL1
,@AL2
for allene-type stereocenters@SP1
,@SP2
,@SP3
for square-planar stereocenters@TB1
…@TB20
for trigonal bipyramidal stereocenters@OH1
…@OH30
for octahedral stereocenters
SMILES allows partial specification since it relies on local chirality instead of absolute chirality.
Variants and Standards
Canonical SMILES
Canonical SMILES seeks unique representations of molecules to ensure consistency across different software implementations.
OpenSMILES vs. Proprietary
- Proprietary: SMILES is technically closed source, which can cause compatibility issues between different groups/labs
- OpenSMILES: Open-source alternative standardization to address compatibility concerns
Isomeric SMILES
Isomeric SMILES incorporates isotopes and stereochemistry information, providing more detailed molecular representations.