The earliest OCSR systems converted raster images into vector primitives (lines, arcs, characters) and applied hand-coded chemical rules to assemble those primitives into molecular graphs. Pioneering tools like Kekulé (1992), CLiDE (1993), and the Contreras system (1990) established the core pipeline: binarize, thin, vectorize, classify atoms and bonds, then compile a connection table. Later systems such as OSRA, ChemReader, and CLiDE Pro refined each stage with better segmentation, chemical spell-checking, and support for superatom labels. The approach dominated the field for over two decades, but brittleness in the face of diverse drawing styles, noisy scans, and edge-case notation ultimately motivated the shift to learned representations.
| Year | Paper | Key Idea |
|---|---|---|
| 1990 | Graph Perception for Chemical Structure OCR | Early OCR system for digitizing chemical structures using C and Prolog |
| 1992 | Kekulé: OCR-Optical Chemical Recognition | Seminal OCSR system using neural networks and heuristic graph compilation |
| 1993 | Chemical Literature Data Extraction: The CLiDE Project | Seminal system converting scanned diagrams into connection tables |
| 1993 | Optical Recognition of Chemical Graphics | Prototype using vectorization and heuristic-based structure recognition |
| 1996 | Kekulé-1 System for Chemical Structure Recognition | Neural OCR with chemical rule-based post-processing for OCSR |
| 2003 | Chemical Machine Vision | Gabor wavelets and Kohonen networks for chemical image classification |
| 2007 | Automatic Recognition of Chemical Images | Rule-based extraction validated against commercial baselines |
| 2007 | Reconstruction of Chemical Molecules from Images | 5-module system converting raster images to SDF with custom vectorization |
| 2009 | ChemReader: Automated Structure Extraction | Modified Hough transform and chemical spell checking for OCSR |
| 2009 | CLiDE Pro: Optical Chemical Structure Recognition Tool | OCSR system reconstructing chemical graphs with ~90% accuracy |
| 2009 | OSRA: Open Source Optical Structure Recognition | First open-source utility for converting chemical images to SMILES/SD |
| 2011 | ChemInfty: Chemical Structure Recognition in Patent Images | Segment-based approach for challenging Japanese patent images |
| 2012 | Chemical Structure Recognition (Rule-Based) | Rule-based expert system (MolRec) using 18 geometric rewrite rules |
| 2014 | Probabilistic OCSR with Markov Logic Networks | Markov Logic Networks replacing brittle heuristics for robustness |
| 2015 | Research on Chemical Expression Images Recognition | Improved handling of adhesive symbols and wedge bonds |
| 2026 | GraphReco: Probabilistic Structure Recognition (2026) | Markov networks for probabilistic atom/bond ambiguity resolution |







