Optical Chemical Structure Recognition
Diagram of the ChemInk sketch recognition system converting freehand chemical drawings into structured molecular data

ChemInk: Real-Time Recognition for Chemical Drawings

ChemInk introduces a sketch recognition system for chemical diagrams that combines multi-level visual features via a joint Conditional Random Field (CRF), achieving 97.4% accuracy and outperforming CAD tools in user speed.

Optical Chemical Structure Recognition
Diagram of the CLiDE Pro system for segmenting document images and reconstructing chemical connection tables

CLiDE Pro: Optical Chemical Structure Recognition Tool

This paper introduces CLiDE Pro, an advanced OCSR system that segments document images and reconstructs chemical connection tables. It features novel handling for crossing bonds and generic structures, validating performance on a publicly released benchmark of 454 scanned images.

Optical Chemical Structure Recognition

Imago: Open-Source Chemical Structure Recognition (2011)

Imago is an open-source, cross-platform C++ toolkit designed to recognize 2D chemical structure images from scientific papers and convert them into machine-readable molecule formats using a rule-based pipeline.

Optical Chemical Structure Recognition

Kekulé-1 System for Chemical Structure Recognition

This paper introduces Kekulé-1, one of the first successful Optical Chemical Structure Recognition (OCSR) systems. It details a hybrid approach using neural networks for character recognition and heuristic vectorization for bond detection, achieving 98.9% accuracy on a test set of 524 structures.

Optical Chemical Structure Recognition

OSRA at TREC-CHEM 2011: Optical Structure Recognition

This paper details the algorithmic pipeline of OSRA, an open-source tool that converts raster images of chemical diagrams into connection tables (SMILES/SDF). It outlines specific heuristics for page segmentation, vectorization, and atom recognition used in the TREC-CHEM Image2Structure task.

Optical Chemical Structure Recognition

Structural Analysis of Handwritten Chemical Formulas

This paper proposes a strategy for interpreting handwritten chemical formulas by converting bitmap images into a dynamic structural graph of quadrilaterals. It achieves ~97% recognition on graphical elements by using recursive ‘specialists’ to identify chemical bonds and rings.

Optical Chemical Structure Recognition
Three-phase pipeline converting scanned chemical diagrams into connection tables via primitive recognition and semantic interpretation

Chemical Literature Data Extraction: The CLiDE Project

The CLiDE project presents a foundational architecture for Optical Chemical Structure Recognition (OCSR). It details a three-phase pipeline to convert bitmapped journal pages into chemically significant connection tables, handling complex features like stereochemistry.

Optical Chemical Structure Recognition
Visualization of Gabor wavelets and Kohonen networks for chemical image classification

Chemical Machine Vision

This 2003 paper introduces a machine vision approach for extracting chemical metadata from raster images. By using Gabor wavelets for feature extraction and Kohonen networks for classification, it distinguishes between chemical and non-chemical images, as well as ring and non-ring systems, without requiring high-resolution inputs.

Optical Chemical Structure Recognition
Overview of the ChemReader pipeline for extracting chemical structures from raster images using Hough transform and OCR

ChemReader: Automated Structure Extraction

This paper presents ChemReader, a fully automated optical structure recognition tool that converts raster images of chemical diagrams into machine-readable formats. It introduces a modified Hough transform for bond detection and a chemical spell checker that improves OCR accuracy from 66% to 87%.

Optical Chemical Structure Recognition

Hand-Drawn Chemical Diagram Recognition (AAAI 2007)

An early method paper (AAAI ‘07) proposing a multi-stage sketch recognition pipeline. It introduces a domain verification step that uses chemical rules to refine ink parsing, achieving a 27% error reduction over geometric-only baselines.

Optical Chemical Structure Recognition
Optical chemical structure recognition example

IMG2SMI: Translating Molecular Structure Images to SMILES

A 2021 image-to-text approach treating OCSR as an image captioning task. It uses Transformers with SELFIES representation to convert molecular structure diagrams into SMILES strings, enabling extraction of visual chemical knowledge from scientific literature.

Optical Chemical Structure Recognition

OCSR Methods: A Taxonomy of Approaches

A comprehensive categorization of OCSR methods, organizing techniques by their fundamental approach: deep learning, traditional ML, and rule-based systems.