Lately, I’ve spent a lot of time staring at datasets full of SMILES strings.
With time, I find I get better at recognizing functional groups and substructures like C(=O)O (carboxylic acid) or c1ccccc1 (benzene ring).
But anything really complex is beyond my personal visualization capabilities.
I ran into this recently while debugging a generative model. Sometimes the grammar of the SMILES string provides the clue as to what is going wrong. Other times, actually seeing the molecule is what helps. I had a terminal full of generated strings and just needed to see what they looked like. I didn’t want to copy-paste them one by one into a web tool or open a heavy desktop application. I just wanted a quick, reliable script to turn that text into a properly formatted image.
The solution relies on RDKit and PIL. Here is a clean reference implementation for the interested.
What Are SMILES Strings?
SMILES is a compact way to represent molecular structures as text strings. (Check out my SMILES Reference Note for my personal notes on the topic).
For example:
O=C=Orepresents carbon dioxideOC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1Orepresents glucoseC12C3C4C1C5C4C3C25represents cubane
The notation uses simple rules: C for carbon, O for oxygen, = for double bonds, parentheses for branches, and numbers for ring closures. It is widely used in cheminformatics because it is compact, machine-parseable, and (relatively) human-readable (for smaller structures).
Technically, it functions as a depth-first traversal of the molecular graph (with its traversal influenced by chemical rules).
Building the Solution
Let me build a tool using RDKit for chemical processing and PIL for image manipulation.
Core Dependencies
from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor, rdMolDescriptors
from PIL import Image, ImageDraw, ImageFont
RDKit handles the chemical logic, while PIL gives us fine control over the final image appearance.
The Main Conversion Function
Here is the entire conversion logic in one function:
def smiles_to_png(smiles, output_file, size=500):
"""Generates a 2D molecule image with a chemical formula legend."""
mol = Chem.MolFromSmiles(smiles)
if not mol:
raise ValueError(f"Invalid SMILES string: {smiles}")
# Generate 2D coordinates and formula
rdDepictor.Compute2DCoords(mol)
formula = rdMolDescriptors.CalcMolFormula(mol)
# Render the molecule
img = Draw.MolToImage(mol, size=(size, size)).convert("RGBA")
# Create a canvas with extra space at the bottom for the legend
legend_height = int(size * 0.1)
canvas = Image.new("RGBA", (size, size + legend_height), "white")
canvas.paste(img, (0, 0))
draw = ImageDraw.Draw(canvas)
# Define dynamic font sizes
font_reg = get_font(int(size * 0.03))
font_sub = get_font(int(size * 0.02))
# Draw the legend
x = int(size * 0.02)
y = size + int(size * 0.02)
# Draw "Formula: " label
draw.text((x, y), "Formula: ", fill="black", font=font_reg)
x += draw.textlength("Formula: ", font=font_reg)
# Draw formula with subscript handling for numbers
for char in formula:
# Use smaller font and lower y-offset for numbers (subscripts)
font = font_sub if char.isdigit() else font_reg
y_offset = int(size * 0.005) if char.isdigit() else 0
draw.text((x, y + y_offset), char, fill="black", font=font)
x += draw.textlength(char, font=font)
# Draw original SMILES string
draw.text((x, y), f" | SMILES: {smiles}", fill="black", font=font_reg)
canvas.save(output_file)
print(f"Saved: {output_file}")
This function handles everything: validation, coordinate generation, image creation, and legend drawing. It uses rdDepictor for the coordinates and rdMolDescriptors to calculate the hill notation formula.
Font Handling
We need a helper to handle fonts robustly across systems:
def get_font(size, font_name="arial.ttf"):
"""Attempts to load a TTF font, falls back to default if unavailable."""
try:
return ImageFont.truetype(font_name, size)
except IOError:
return ImageFont.load_default()
I aimed to keep it simple and barebones.
Examples in Action
Let me show the tool in action with some common molecules:
Simple Molecules
The ethanol example shows how even simple molecules benefit from clear visualization. The SMILES string CCO becomes an immediately recognizable structure.
Aromatic Compounds
Benzene demonstrates how the tool handles ring structures. The SMILES notation C1=CC=CC=C1 uses numbers to indicate ring closure points.
Complex Pharmaceuticals
Aspirin shows the tool’s ability to handle complex molecules with multiple functional groups. The resulting image clearly shows the benzene ring, carboxyl group, and acetyl ester.
Substituted Aromatics
This example shows how the tool handles substituted aromatic compounds with bulky groups like tert-butyl.
Command-Line Interface
The tool includes a basic CLI for easy use:
# Basic usage
python smiles2png.py "CCO"
# Specify output filename
python smiles2png.py "CCO" ethanol.png
It uses sys.argv for a lightweight interface without external dependencies beyond the core standard library for argument parsing.
Download the Complete Script
You can copy the complete smiles2png.py script directly from the code block below.
Installation and Setup
Before using the script, install the required dependencies:
pip install rdkit pillow
Complete Script
Click to expand the complete smiles2png.py script
import sys
from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor, rdMolDescriptors
from PIL import Image, ImageDraw, ImageFont
def get_font(size, font_name="arial.ttf"):
"""Attempts to load a TTF font, falls back to default if unavailable."""
try:
return ImageFont.truetype(font_name, size)
except IOError:
return ImageFont.load_default()
def smiles_to_png(smiles, output_file, size=500):
"""Generates a 2D molecule image with a chemical formula legend."""
mol = Chem.MolFromSmiles(smiles)
if not mol:
raise ValueError(f"Invalid SMILES string: {smiles}")
# Generate 2D coordinates and formula
rdDepictor.Compute2DCoords(mol)
formula = rdMolDescriptors.CalcMolFormula(mol)
# Render the molecule
img = Draw.MolToImage(mol, size=(size, size)).convert("RGBA")
# Create a canvas with extra space at the bottom for the legend
legend_height = int(size * 0.1)
canvas = Image.new("RGBA", (size, size + legend_height), "white")
canvas.paste(img, (0, 0))
draw = ImageDraw.Draw(canvas)
# Define dynamic font sizes
font_reg = get_font(int(size * 0.03))
font_sub = get_font(int(size * 0.02))
# Draw the legend
x = int(size * 0.02)
y = size + int(size * 0.02)
# Draw "Formula: " label
draw.text((x, y), "Formula: ", fill="black", font=font_reg)
x += draw.textlength("Formula: ", font=font_reg)
# Draw formula with subscript handling for numbers
for char in formula:
# Use smaller font and lower y-offset for numbers (subscripts)
font = font_sub if char.isdigit() else font_reg
y_offset = int(size * 0.005) if char.isdigit() else 0
draw.text((x, y + y_offset), char, fill="black", font=font)
x += draw.textlength(char, font=font)
# Draw original SMILES string
draw.text((x, y), f" | SMILES: {smiles}", fill="black", font=font_reg)
canvas.save(output_file)
print(f"Saved: {output_file}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python smiles2png.py <SMILES_STRING> [OUTPUT_FILENAME]")
sys.exit(1)
smiles_input = sys.argv[1]
filename = sys.argv[2] if len(sys.argv) > 2 else "molecule.png"
smiles_to_png(smiles_input, filename)
Quick Start
- Save the script as
smiles2png.py - Install dependencies:
pip install rdkit pillow - Run:
python smiles2png.py "CCO" ethanol.png
