Overview
SMILES is a one-dimensional string format for representing chemical molecular structures. It provides a linearized, serialized representation of 3D molecular structures, functioning like a depth-first traversal of the molecular graph. Similar to a connection table, SMILES identifies the nodes (atoms) and edges (bonds) of a molecular graph.
For example, the simple molecule ethanol (C2H6O) can be represented as CCO
, while the more complex caffeine molecule becomes CN1C=NC2=C1C(=O)N(C(=O)N2C)C
.
Key Characteristics
- Human-readable: Built for human readability (versus InChI for hierarchical representation and machine readability)
- Compact: More compact than other representations (3D coordinates, connectivity tables)
- Simple syntax: A language with simple syntax and structure, making it relatively easy to learn and use for chemists and researchers
- Flexible: Both linear and cyclic structures can be represented in many different valid ways
Limitations
- Non-uniqueness: Different SMILES strings can represent the same molecule (e.g., different resonance forms).
- Non-robustness: SMILES strings can be written that do not correspond to any valid molecular structure.
- Strings that cannot represent a molecular structure.
- Strings that violate basic rules (more bonds than is physically possible).
- Information loss: If 3D structural information exists, a SMILES string cannot encode it.
For a more robust alternative that guarantees 100% valid molecules, see SELFIES (Self-Referencing Embedded Strings).
Basic Syntax
Atomic Symbols
SMILES uses standard atomic symbols with implied hydrogen atoms:
C
(methane, CH4)N
(ammonia, NH3)O
(water, H2O)P
(phosphine, PH3)S
(hydrogen sulfide, H2S)Cl
(hydrogen chloride, HCl)
Bracket notation: Elements outside the organic subset must be shown in brackets, e.g., [Pt]
for elemental platinum. The organic subset (B
, C
, N
, O
, P
, S
, F
, Cl
, Br
, and I
) can omit brackets.
Bond Representation
Bonds are represented by symbols:
- Single bond:
-
(usually omitted)

- Double bond:
=

- Triple bond:
#

- Aromatic bond:
:
(usually omitted, or*
for aromatic rings)

- Delocalized bond:
.

Structural Features
- Branches: Enclosed in parentheses and can be nested. For example,
C(C)C
represents propane with a methyl branch.

- Cyclic structures: Written by breaking bonds and using numbers to indicate bond connections. For example,
C1CCCCC1
represents cyclohexane (the1
connects the first and last carbon). - Aromaticity: Lower case letters are used for atoms in aromatic rings. For example, benzene is written as
c1ccccc1
. - Formal charges: Indicated by placing the charge in brackets after the atom symbol, e.g.,
[C+]
,[C-]
, or[C-2]
Stereochemistry and Isomers
Isotope Notation
Isotope notation specifies the exact isotope of an element and comes before the element within square brackets, e.g., [13C]
for carbon-13.
Double Bond Stereochemistry
Directional bonds can be specified using \
and /
symbols to indicate the stereochemistry of double bonds:
C/C=C\C
represents (E)-2-butene (trans configuration)C/C=C/C
represents (Z)-2-butene (cis configuration)
The direction of the slashes indicates which side of the double bond each substituent is on.
Tetrahedral Chirality
Chirality around tetrahedral centers uses @
and @@
symbols:
N[C@](C)(F)C(=O)O
vsN[C@@](F)(C)C(=O)O
- Anti-clockwise counting vs clockwise counting
@
and@@
are shorthand for@TH1
and@TH2
, respectively

Advanced Stereochemistry
More general notation for other stereocenters:
@AL1
,@AL2
for allene-type stereocenters@SP1
,@SP2
,@SP3
for square-planar stereocenters@TB1
…@TB20
for trigonal bipyramidal stereocenters@OH1
…@OH30
for octahedral stereocenters
SMILES allows partial specification since it relies on local chirality instead of absolute chirality.
Practical Applications
SMILES notation is widely used in:
- Chemical databases: Storage and retrieval of molecular structures
- Machine learning: Input representation for molecular property prediction
- Chemical informatics: Substructure searching and similarity analysis
- Drug discovery: High-throughput virtual screening
- Chemical reaction databases: Representing reactants and products
For a hands-on tutorial on visualizing SMILES strings as 2D molecular images, see Converting SMILES Strings to 2D Molecular Images.
Variants and Standards
Canonical SMILES
Canonical SMILES seeks unique representations of molecules to ensure consistency across different software implementations.
OpenSMILES vs. Proprietary
- Proprietary: SMILES is technically closed source, which can cause compatibility issues between different groups/labs
- OpenSMILES: Open-source alternative standardization to address compatibility concerns
Isomeric SMILES
Isomeric SMILES incorporates isotopes and stereochemistry information, providing more detailed molecular representations.