Lately, I’ve spent a lot of time staring at datasets full of SMILES strings. With time, I find I get better at recognizing functional groups and substructures like C(=O)O (carboxylic acid) or c1ccccc1 (benzene ring). But anything really complex is beyond my personal visualization capabilities.

I ran into this recently while debugging a generative model. Sometimes the grammar of the SMILES string provides the clue as to what is going wrong. Other times, actually seeing the molecule is what helps. I had a terminal full of generated strings and just needed to see what they looked like. I didn’t want to copy-paste them one by one into a web tool or open a heavy desktop application. I just wanted a quick, reliable script to turn that text into a properly formatted image.

The solution relies on RDKit and PIL. Here is a clean reference implementation for the interested.

What Are SMILES Strings?

SMILES is a compact way to represent molecular structures as text strings. (Check out my SMILES Reference Note for my personal notes on the topic).

For example:

  • O=C=O represents carbon dioxide
  • OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O represents glucose
  • C12C3C4C1C5C4C3C25 represents cubane

The notation uses simple rules: C for carbon, O for oxygen, = for double bonds, parentheses for branches, and numbers for ring closures. It is widely used in cheminformatics because it is compact, machine-parseable, and (relatively) human-readable (for smaller structures). Technically, it functions as a depth-first traversal of the molecular graph (with its traversal influenced by chemical rules).

Building the Solution

Let me build a tool using RDKit for chemical processing and PIL for image manipulation.

Core Dependencies

from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor, rdMolDescriptors
from PIL import Image, ImageDraw, ImageFont

RDKit handles the chemical logic, while PIL gives us fine control over the final image appearance.

The Main Conversion Function

Here is the entire conversion logic in one function:

def smiles_to_png(smiles, output_file, size=500):
    """Generates a 2D molecule image with a chemical formula legend."""
    mol = Chem.MolFromSmiles(smiles)
    if not mol:
        raise ValueError(f"Invalid SMILES string: {smiles}")

    # Generate 2D coordinates and formula
    rdDepictor.Compute2DCoords(mol)
    formula = rdMolDescriptors.CalcMolFormula(mol)
    
    # Render the molecule
    img = Draw.MolToImage(mol, size=(size, size)).convert("RGBA")

    # Create a canvas with extra space at the bottom for the legend
    legend_height = int(size * 0.1)
    canvas = Image.new("RGBA", (size, size + legend_height), "white")
    canvas.paste(img, (0, 0))
    
    draw = ImageDraw.Draw(canvas)
    
    # Define dynamic font sizes
    font_reg = get_font(int(size * 0.03))
    font_sub = get_font(int(size * 0.02))
    
    # Draw the legend
    x = int(size * 0.02)
    y = size + int(size * 0.02)
    
    # Draw "Formula: " label
    draw.text((x, y), "Formula: ", fill="black", font=font_reg)
    x += draw.textlength("Formula: ", font=font_reg)
    
    # Draw formula with subscript handling for numbers
    for char in formula:
        # Use smaller font and lower y-offset for numbers (subscripts)
        font = font_sub if char.isdigit() else font_reg
        y_offset = int(size * 0.005) if char.isdigit() else 0
        
        draw.text((x, y + y_offset), char, fill="black", font=font)
        x += draw.textlength(char, font=font)

    # Draw original SMILES string
    draw.text((x, y), f" | SMILES: {smiles}", fill="black", font=font_reg)

    canvas.save(output_file)
    print(f"Saved: {output_file}")

This function handles everything: validation, coordinate generation, image creation, and legend drawing. It uses rdDepictor for the coordinates and rdMolDescriptors to calculate the hill notation formula.

Font Handling

We need a helper to handle fonts robustly across systems:

def get_font(size, font_name="arial.ttf"):
    """Attempts to load a TTF font, falls back to default if unavailable."""
    try:
        return ImageFont.truetype(font_name, size)
    except IOError:
        return ImageFont.load_default()

I aimed to keep it simple and barebones.

Examples in Action

Let me show the tool in action with some common molecules:

Simple Molecules

Ethanol molecular structure with formula C2H6O
Ethanol (CCO): A simple alcohol showing the basic structure with proper bond representation and molecular formula C₂H₆O.

The ethanol example shows how even simple molecules benefit from clear visualization. The SMILES string CCO becomes an immediately recognizable structure.

Aromatic Compounds

Benzene molecular structure with formula C6H6
Benzene (C1=CC=CC=C1): The classic aromatic ring structure with alternating double bonds clearly depicted.

Benzene demonstrates how the tool handles ring structures. The SMILES notation C1=CC=CC=C1 uses numbers to indicate ring closure points.

Complex Pharmaceuticals

Aspirin molecular structure with formula C9H8O4
Aspirin (CC(=O)OC1=CC=CC=C1C(=O)O): A more complex molecule showing how the tool handles branched structures and multiple functional groups.

Aspirin shows the tool’s ability to handle complex molecules with multiple functional groups. The resulting image clearly shows the benzene ring, carboxyl group, and acetyl ester.

Substituted Aromatics

4-tert-butylphenol molecular structure with formula C10H14O
4-tert-butylphenol (CC(C)(C)C1=CC=C(C=C1)O): Demonstrates handling of bulky substituents and substitution patterns.

This example shows how the tool handles substituted aromatic compounds with bulky groups like tert-butyl.

Command-Line Interface

The tool includes a basic CLI for easy use:

# Basic usage
python smiles2png.py "CCO"

# Specify output filename
python smiles2png.py "CCO" ethanol.png

It uses sys.argv for a lightweight interface without external dependencies beyond the core standard library for argument parsing.

Download the Complete Script

You can copy the complete smiles2png.py script directly from the code block below.

Installation and Setup

Before using the script, install the required dependencies:

pip install rdkit pillow

Complete Script

Click to expand the complete smiles2png.py script
import sys
from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor, rdMolDescriptors
from PIL import Image, ImageDraw, ImageFont

def get_font(size, font_name="arial.ttf"):
    """Attempts to load a TTF font, falls back to default if unavailable."""
    try:
        return ImageFont.truetype(font_name, size)
    except IOError:
        return ImageFont.load_default()

def smiles_to_png(smiles, output_file, size=500):
    """Generates a 2D molecule image with a chemical formula legend."""
    mol = Chem.MolFromSmiles(smiles)
    if not mol:
        raise ValueError(f"Invalid SMILES string: {smiles}")

    # Generate 2D coordinates and formula
    rdDepictor.Compute2DCoords(mol)
    formula = rdMolDescriptors.CalcMolFormula(mol)
    
    # Render the molecule
    img = Draw.MolToImage(mol, size=(size, size)).convert("RGBA")

    # Create a canvas with extra space at the bottom for the legend
    legend_height = int(size * 0.1)
    canvas = Image.new("RGBA", (size, size + legend_height), "white")
    canvas.paste(img, (0, 0))
    
    draw = ImageDraw.Draw(canvas)
    
    # Define dynamic font sizes
    font_reg = get_font(int(size * 0.03))
    font_sub = get_font(int(size * 0.02))
    
    # Draw the legend
    x = int(size * 0.02)
    y = size + int(size * 0.02)
    
    # Draw "Formula: " label
    draw.text((x, y), "Formula: ", fill="black", font=font_reg)
    x += draw.textlength("Formula: ", font=font_reg)
    
    # Draw formula with subscript handling for numbers
    for char in formula:
        # Use smaller font and lower y-offset for numbers (subscripts)
        font = font_sub if char.isdigit() else font_reg
        y_offset = int(size * 0.005) if char.isdigit() else 0
        
        draw.text((x, y + y_offset), char, fill="black", font=font)
        x += draw.textlength(char, font=font)

    # Draw original SMILES string
    draw.text((x, y), f" | SMILES: {smiles}", fill="black", font=font_reg)

    canvas.save(output_file)
    print(f"Saved: {output_file}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python smiles2png.py <SMILES_STRING> [OUTPUT_FILENAME]")
        sys.exit(1)
        
    smiles_input = sys.argv[1]
    filename = sys.argv[2] if len(sys.argv) > 2 else "molecule.png"
    
    smiles_to_png(smiles_input, filename)

Quick Start

  1. Save the script as smiles2png.py
  2. Install dependencies: pip install rdkit pillow
  3. Run: python smiles2png.py "CCO" ethanol.png