Lately, I’ve spent a lot of time staring at datasets full of SMILES strings.
With time, I find I get better at recognizing functional groups and substructures like C(=O)O (carboxylic acid) or c1ccccc1 (benzene ring).
But anything really complex is beyond my personal visualization capabilities.
I ran into this recently while debugging a generative model. Sometimes the grammar of the SMILES string provides the clue as to what is going wrong. Other times, actually seeing the molecule is what helps. I had a terminal full of generated strings and needed to verify their structures visually. Manually pasting strings into web tools or opening heavy desktop applications felt inefficient. I needed a lightweight script to turn that text into a properly formatted image directly from the terminal.
What Are SMILES Strings?
SMILES is a compact way to represent molecular structures as text strings. (Check out my SMILES Reference Note for my personal notes on the topic).
For example:
O=C=Orepresents carbon dioxideOC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1Orepresents glucoseC12C3C4C1C5C4C3C25represents cubane
The notation uses simple rules: C for carbon, O for oxygen, = for double bonds, parentheses for branches, and numbers for ring closures. It is widely used in cheminformatics because it is compact, machine-parseable, and (relatively) human-readable (for smaller structures).
Technically, it functions as a depth-first traversal of the molecular graph (with its traversal influenced by chemical rules).
The Quick Win: Native RDKit
If you just need a quick image and don’t care about the image dimensions or adding a legend, RDKit can do this in three lines:
from rdkit import Chem
from rdkit.Chem import Draw
mol = Chem.MolFromSmiles("CCO")
Draw.MolToFile(mol, "ethanol.png")
This works well for quick checks. However, presentations and publications require specific dimensions, high resolution, and a clean legend with the molecular formula.
For that level of quality, we need a custom tool.
The Professional Tool: Building a Custom Renderer
While the native RDKit method is fast, it has limitations. It lacks support for formula subscripts (like rendering the 2 below the C in $\text{CO}_2$), and precise layout control is difficult.
Let me build a more robust tool using RDKit for chemical processing and PIL for high-quality image manipulation.
Core Dependencies
from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor, rdMolDescriptors
from PIL import Image, ImageDraw, ImageFont
RDKit handles the chemical logic, while PIL gives us fine control over the final image appearance.
The Main Conversion Function
Here is the entire conversion logic in one function:
def smiles_to_png(smiles, output_file, size=500):
"""Generates a 2D molecule image with a chemical formula legend."""
mol = Chem.MolFromSmiles(smiles)
if not mol:
raise ValueError(f"Invalid SMILES string: {smiles}")
# Generate 2D coordinates and formula
rdDepictor.Compute2DCoords(mol)
formula = rdMolDescriptors.CalcMolFormula(mol)
# Render the molecule
img = Draw.MolToImage(mol, size=(size, size)).convert("RGBA")
# Create a canvas with extra space at the bottom for the legend
legend_height = int(size * 0.1)
canvas = Image.new("RGBA", (size, size + legend_height), "white")
canvas.paste(img, (0, 0))
draw = ImageDraw.Draw(canvas)
# Define dynamic font sizes
font_reg = get_font(int(size * 0.03))
font_sub = get_font(int(size * 0.02))
# Draw the legend
x = int(size * 0.02)
y = size + int(size * 0.02)
# Draw "Formula: " label
draw.text((x, y), "Formula: ", fill="black", font=font_reg)
x += draw.textlength("Formula: ", font=font_reg)
# Draw formula with subscript handling for numbers
for char in formula:
# Use smaller font and lower y-offset for numbers (subscripts)
font = font_sub if char.isdigit() else font_reg
y_offset = int(size * 0.005) if char.isdigit() else 0
draw.text((x, y + y_offset), char, fill="black", font=font)
x += draw.textlength(char, font=font)
# Draw original SMILES string
draw.text((x, y), f" | SMILES: {smiles}", fill="black", font=font_reg)
canvas.save(output_file)
print(f"Saved: {output_file}")
This function handles everything: validation, coordinate generation, image creation, and legend drawing. It uses rdDepictor for the coordinates and rdMolDescriptors to calculate the hill notation formula.
Font Handling
We need a helper to handle fonts robustly across systems:
def get_font(size, font_name="arial.ttf"):
"""Attempts to load a TTF font, falls back to default if unavailable."""
try:
return ImageFont.truetype(font_name, size)
except IOError:
return ImageFont.load_default()
I aimed to keep it simple and barebones.
Examples in Action
Let me show the tool in action with some common molecules:
Simple Molecules
The ethanol example shows how even simple molecules benefit from clear visualization. The SMILES string CCO becomes an immediately recognizable structure.
Aromatic Compounds
Benzene demonstrates how the tool handles ring structures. The SMILES notation C1=CC=CC=C1 uses numbers to indicate ring closure points.
Complex Pharmaceuticals
Aspirin shows the tool’s ability to handle complex molecules with multiple functional groups. The resulting image clearly shows the benzene ring, carboxyl group, and acetyl ester.
Substituted Aromatics
This example shows how the tool handles substituted aromatic compounds with bulky groups like tert-butyl.
Going Further: Vector Graphics (SVG)
For true publication-quality figures, you often want vector graphics (SVG/PDF) rather than raster images (PNG). Vector graphics scale infinitely without pixelation.
RDKit handles this natively with rdMolDraw2D:
from rdkit import Chem
from rdkit.Chem.Draw import rdMolDraw2D
mol = Chem.MolFromSmiles("CCO")
d = rdMolDraw2D.MolDraw2DSVG(500, 500)
d.DrawMolecule(mol)
d.FinishDrawing()
with open("ethanol.svg", "w") as f:
f.write(d.GetDrawingText())
This gives you a perfect vector image, though you lose the custom PIL-based legend we built above. Choose the right tool for the job: PNG for quick checks and slides, SVG for journal submissions.
Command-Line Interface
The tool uses Python’s standard argparse library for a robust command-line interface:
# Basic usage
python smiles2image.py "CCO"
# Specify output filename and size
python smiles2image.py "CCO" -o ethanol.png --size 800
# Generate SVG for publication
python smiles2image.py "CCO" -o ethanol.svg
# OR
python smiles2image.py "CCO" --svg
This allows you to easily specific the output filename and customize the image dimensions directly from the terminal. By simply changing the extension to .svg (or using the --svg flag), the script automatically switches to the vector graphics renderer.
Download the Complete Script
You can copy the complete smiles2image.py script directly from the code block below.
Installation and Setup
Before using the script, install the required dependencies:
pip install rdkit pillow
Complete Script
Click to expand the complete smiles2image.py script
import argparse
import sys
import os
from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor, rdMolDescriptors
from rdkit.Chem.Draw import rdMolDraw2D
from PIL import Image, ImageDraw, ImageFont
def get_font(size, font_name="arial.ttf"):
"""Attempts to load a TTF font, falls back to default if unavailable."""
try:
return ImageFont.truetype(font_name, size)
except IOError:
return ImageFont.load_default()
def smiles_to_svg(smiles, output_file, size=500):
"""Generates a 2D molecule SVG image."""
mol = Chem.MolFromSmiles(smiles)
if not mol:
raise ValueError(f"Invalid SMILES string: {smiles}")
rdDepictor.Compute2DCoords(mol)
d = rdMolDraw2D.MolDraw2DSVG(size, size)
d.DrawMolecule(mol)
d.FinishDrawing()
with open(output_file, "w") as f:
f.write(d.GetDrawingText())
print(f"Saved: {output_file}")
def smiles_to_png(smiles, output_file, size=500):
"""Generates a 2D molecule image with a chemical formula legend."""
mol = Chem.MolFromSmiles(smiles)
if not mol:
raise ValueError(f"Invalid SMILES string: {smiles}")
# Generate 2D coordinates and formula
rdDepictor.Compute2DCoords(mol)
formula = rdMolDescriptors.CalcMolFormula(mol)
# Render the molecule
img = Draw.MolToImage(mol, size=(size, size)).convert("RGBA")
# Create a canvas with extra space at the bottom for the legend
legend_height = int(size * 0.1)
canvas = Image.new("RGBA", (size, size + legend_height), "white")
canvas.paste(img, (0, 0))
draw = ImageDraw.Draw(canvas)
# Define dynamic font sizes
font_reg = get_font(int(size * 0.03))
font_sub = get_font(int(size * 0.02))
# Draw the legend
x = int(size * 0.02)
y = size + int(size * 0.02)
# Draw "Formula: " label
draw.text((x, y), "Formula: ", fill="black", font=font_reg)
x += draw.textlength("Formula: ", font=font_reg)
# Draw formula with subscript handling for numbers
for char in formula:
# Use smaller font and lower y-offset for numbers (subscripts)
font = font_sub if char.isdigit() else font_reg
y_offset = int(size * 0.005) if char.isdigit() else 0
draw.text((x, y + y_offset), char, fill="black", font=font)
x += draw.textlength(char, font=font)
# Draw original SMILES string
draw.text((x, y), f" | SMILES: {smiles}", fill="black", font=font_reg)
canvas.save(output_file)
print(f"Saved: {output_file}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Convert a SMILES string to a 2D molecular image.")
parser.add_argument("smiles", help="The SMILES string to convert")
parser.add_argument("-o", "--output", default="molecule.png", help="Output filename (default: molecule.png)")
parser.add_argument("--size", type=int, default=500, help="Image width/height in pixels (default: 500)")
parser.add_argument("--svg", action="store_true", help="Force SVG output (overrides filename extension)")
args = parser.parse_args()
try:
# Determine format based on flag or file extension
is_svg = args.svg or args.output.lower().endswith(".svg")
if is_svg:
# Ensure extension is correct if not present
if not args.output.lower().endswith(".svg"):
args.output = os.path.splitext(args.output)[0] + ".svg"
smiles_to_svg(args.smiles, args.output, args.size)
else:
smiles_to_png(args.smiles, args.output, args.size)
except Exception as e:
print(f"Error: {e}")
sys.exit(1)
Quick Start
- Save the script as
smiles2image.py - Install dependencies:
pip install rdkit pillow - Run:
python smiles2image.py "CCO" -o ethanol.pngorpython smiles2image.py "CCO" --svg
