Lately, I’ve spent a lot of time staring at datasets full of SMILES strings. With time, I find I get better at recognizing functional groups and substructures like C(=O)O (carboxylic acid) or c1ccccc1 (benzene ring). But anything really complex is beyond my personal visualization capabilities.

I ran into this recently while debugging a generative model. Sometimes the grammar of the SMILES string provides the clue as to what is going wrong. Other times, actually seeing the molecule is what helps. I had a terminal full of generated strings and needed to verify their structures visually. Manually pasting strings into web tools or opening heavy desktop applications felt inefficient. I needed a lightweight script to turn that text into a properly formatted image directly from the terminal.

What Are SMILES Strings?

SMILES is a compact way to represent molecular structures as text strings. (Check out my SMILES Reference Note for my personal notes on the topic).

For example:

  • O=C=O represents carbon dioxide
  • OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O represents glucose
  • C12C3C4C1C5C4C3C25 represents cubane

The notation uses simple rules: C for carbon, O for oxygen, = for double bonds, parentheses for branches, and numbers for ring closures. It is widely used in cheminformatics because it is compact, machine-parseable, and (relatively) human-readable (for smaller structures). Technically, it functions as a depth-first traversal of the molecular graph (with its traversal influenced by chemical rules).

The Quick Win: Native RDKit

If you just need a quick image and don’t care about the image dimensions or adding a legend, RDKit can do this in three lines:

from rdkit import Chem
from rdkit.Chem import Draw

mol = Chem.MolFromSmiles("CCO")
Draw.MolToFile(mol, "ethanol.png")

This works well for quick checks. However, presentations and publications require specific dimensions, high resolution, and a clean legend with the molecular formula.

For that level of quality, we need a custom tool.

The Professional Tool: Building a Custom Renderer

While the native RDKit method is fast, it has limitations. It lacks support for formula subscripts (like rendering the 2 below the C in $\text{CO}_2$), and precise layout control is difficult.

Let me build a more robust tool using RDKit for chemical processing and PIL for high-quality image manipulation.

Core Dependencies

from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor, rdMolDescriptors
from PIL import Image, ImageDraw, ImageFont

RDKit handles the chemical logic, while PIL gives us fine control over the final image appearance.

The Main Conversion Function

Here is the entire conversion logic in one function:

def smiles_to_png(smiles, output_file, size=500):
    """Generates a 2D molecule image with a chemical formula legend."""
    mol = Chem.MolFromSmiles(smiles)
    if not mol:
        raise ValueError(f"Invalid SMILES string: {smiles}")

    # Generate 2D coordinates and formula
    rdDepictor.Compute2DCoords(mol)
    formula = rdMolDescriptors.CalcMolFormula(mol)
    
    # Render the molecule
    img = Draw.MolToImage(mol, size=(size, size)).convert("RGBA")

    # Create a canvas with extra space at the bottom for the legend
    legend_height = int(size * 0.1)
    canvas = Image.new("RGBA", (size, size + legend_height), "white")
    canvas.paste(img, (0, 0))
    
    draw = ImageDraw.Draw(canvas)
    
    # Define dynamic font sizes
    font_reg = get_font(int(size * 0.03))
    font_sub = get_font(int(size * 0.02))
    
    # Draw the legend
    x = int(size * 0.02)
    y = size + int(size * 0.02)
    
    # Draw "Formula: " label
    draw.text((x, y), "Formula: ", fill="black", font=font_reg)
    x += draw.textlength("Formula: ", font=font_reg)
    
    # Draw formula with subscript handling for numbers
    for char in formula:
        # Use smaller font and lower y-offset for numbers (subscripts)
        font = font_sub if char.isdigit() else font_reg
        y_offset = int(size * 0.005) if char.isdigit() else 0
        
        draw.text((x, y + y_offset), char, fill="black", font=font)
        x += draw.textlength(char, font=font)

    # Draw original SMILES string
    draw.text((x, y), f" | SMILES: {smiles}", fill="black", font=font_reg)

    canvas.save(output_file)
    print(f"Saved: {output_file}")

This function handles everything: validation, coordinate generation, image creation, and legend drawing. It uses rdDepictor for the coordinates and rdMolDescriptors to calculate the hill notation formula.

Font Handling

We need a helper to handle fonts robustly across systems:

def get_font(size, font_name="arial.ttf"):
    """Attempts to load a TTF font, falls back to default if unavailable."""
    try:
        return ImageFont.truetype(font_name, size)
    except IOError:
        return ImageFont.load_default()

I aimed to keep it simple and barebones.

Examples in Action

Let me show the tool in action with some common molecules:

Simple Molecules

Ethanol molecular structure with formula C2H6O
Ethanol (CCO): A simple alcohol showing the basic structure with proper bond representation and molecular formula $\text{C}_2\text{H}_6\text{O}$.

The ethanol example shows how even simple molecules benefit from clear visualization. The SMILES string CCO becomes an immediately recognizable structure.

Aromatic Compounds

Benzene molecular structure with formula C6H6
Benzene (C1=CC=CC=C1): The classic aromatic ring structure with alternating double bonds clearly depicted.

Benzene demonstrates how the tool handles ring structures. The SMILES notation C1=CC=CC=C1 uses numbers to indicate ring closure points.

Complex Pharmaceuticals

Aspirin molecular structure with formula C9H8O4
Aspirin (CC(=O)OC1=CC=CC=C1C(=O)O): A more complex molecule showing how the tool handles branched structures and multiple functional groups.

Aspirin shows the tool’s ability to handle complex molecules with multiple functional groups. The resulting image clearly shows the benzene ring, carboxyl group, and acetyl ester.

Substituted Aromatics

4-tert-butylphenol molecular structure with formula C10H14O
4-tert-butylphenol (CC(C)(C)C1=CC=C(C=C1)O): Demonstrates handling of bulky substituents and substitution patterns.

This example shows how the tool handles substituted aromatic compounds with bulky groups like tert-butyl.

Going Further: Vector Graphics (SVG)

For true publication-quality figures, you often want vector graphics (SVG/PDF) rather than raster images (PNG). Vector graphics scale infinitely without pixelation.

RDKit handles this natively with rdMolDraw2D:

from rdkit import Chem
from rdkit.Chem.Draw import rdMolDraw2D

mol = Chem.MolFromSmiles("CCO")
d = rdMolDraw2D.MolDraw2DSVG(500, 500)
d.DrawMolecule(mol)
d.FinishDrawing()

with open("ethanol.svg", "w") as f:
    f.write(d.GetDrawingText())

This gives you a perfect vector image, though you lose the custom PIL-based legend we built above. Choose the right tool for the job: PNG for quick checks and slides, SVG for journal submissions.

Command-Line Interface

The tool uses Python’s standard argparse library for a robust command-line interface:

# Basic usage
python smiles2image.py "CCO"

# Specify output filename and size
python smiles2image.py "CCO" -o ethanol.png --size 800

# Generate SVG for publication
python smiles2image.py "CCO" -o ethanol.svg
# OR
python smiles2image.py "CCO" --svg

This allows you to easily specific the output filename and customize the image dimensions directly from the terminal. By simply changing the extension to .svg (or using the --svg flag), the script automatically switches to the vector graphics renderer.

Download the Complete Script

You can copy the complete smiles2image.py script directly from the code block below.

Installation and Setup

Before using the script, install the required dependencies:

pip install rdkit pillow

Complete Script

Click to expand the complete smiles2image.py script
import argparse
import sys
import os
from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor, rdMolDescriptors
from rdkit.Chem.Draw import rdMolDraw2D
from PIL import Image, ImageDraw, ImageFont

def get_font(size, font_name="arial.ttf"):
    """Attempts to load a TTF font, falls back to default if unavailable."""
    try:
        return ImageFont.truetype(font_name, size)
    except IOError:
        return ImageFont.load_default()

def smiles_to_svg(smiles, output_file, size=500):
    """Generates a 2D molecule SVG image."""
    mol = Chem.MolFromSmiles(smiles)
    if not mol:
        raise ValueError(f"Invalid SMILES string: {smiles}")

    rdDepictor.Compute2DCoords(mol)
    
    d = rdMolDraw2D.MolDraw2DSVG(size, size)
    d.DrawMolecule(mol)
    d.FinishDrawing()

    with open(output_file, "w") as f:
        f.write(d.GetDrawingText())
    print(f"Saved: {output_file}")

def smiles_to_png(smiles, output_file, size=500):
    """Generates a 2D molecule image with a chemical formula legend."""
    mol = Chem.MolFromSmiles(smiles)
    if not mol:
        raise ValueError(f"Invalid SMILES string: {smiles}")

    # Generate 2D coordinates and formula
    rdDepictor.Compute2DCoords(mol)
    formula = rdMolDescriptors.CalcMolFormula(mol)
    
    # Render the molecule
    img = Draw.MolToImage(mol, size=(size, size)).convert("RGBA")

    # Create a canvas with extra space at the bottom for the legend
    legend_height = int(size * 0.1)
    canvas = Image.new("RGBA", (size, size + legend_height), "white")
    canvas.paste(img, (0, 0))
    
    draw = ImageDraw.Draw(canvas)
    
    # Define dynamic font sizes
    font_reg = get_font(int(size * 0.03))
    font_sub = get_font(int(size * 0.02))
    
    # Draw the legend
    x = int(size * 0.02)
    y = size + int(size * 0.02)
    
    # Draw "Formula: " label
    draw.text((x, y), "Formula: ", fill="black", font=font_reg)
    x += draw.textlength("Formula: ", font=font_reg)
    
    # Draw formula with subscript handling for numbers
    for char in formula:
        # Use smaller font and lower y-offset for numbers (subscripts)
        font = font_sub if char.isdigit() else font_reg
        y_offset = int(size * 0.005) if char.isdigit() else 0
        
        draw.text((x, y + y_offset), char, fill="black", font=font)
        x += draw.textlength(char, font=font)

    # Draw original SMILES string
    draw.text((x, y), f" | SMILES: {smiles}", fill="black", font=font_reg)

    canvas.save(output_file)
    print(f"Saved: {output_file}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert a SMILES string to a 2D molecular image.")
    parser.add_argument("smiles", help="The SMILES string to convert")
    parser.add_argument("-o", "--output", default="molecule.png", help="Output filename (default: molecule.png)")
    parser.add_argument("--size", type=int, default=500, help="Image width/height in pixels (default: 500)")
    parser.add_argument("--svg", action="store_true", help="Force SVG output (overrides filename extension)")
    
    args = parser.parse_args()
    
    try:
        # Determine format based on flag or file extension
        is_svg = args.svg or args.output.lower().endswith(".svg")
        
        if is_svg:
            # Ensure extension is correct if not present
            if not args.output.lower().endswith(".svg"):
                args.output = os.path.splitext(args.output)[0] + ".svg"
            smiles_to_svg(args.smiles, args.output, args.size)
        else:
            smiles_to_png(args.smiles, args.output, args.size)
            
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

Quick Start

  1. Save the script as smiles2image.py
  2. Install dependencies: pip install rdkit pillow
  3. Run: python smiles2image.py "CCO" -o ethanol.png or python smiles2image.py "CCO" --svg