The earliest OCSR systems converted raster images into vector primitives (lines, arcs, characters) and applied hand-coded chemical rules to assemble those primitives into molecular graphs. Pioneering tools like Kekulé (1992), CLiDE (1993), and the Contreras system (1990) established the core pipeline: binarize, thin, vectorize, classify atoms and bonds, then compile a connection table. Later systems such as OSRA, ChemReader, and CLiDE Pro refined each stage with better segmentation, chemical spell-checking, and support for superatom labels. The approach dominated the field for over two decades, but brittleness in the face of diverse drawing styles, noisy scans, and edge-case notation ultimately motivated the shift to learned representations.

YearPaperKey Idea
1990Graph Perception for Chemical Structure OCREarly OCR system for digitizing chemical structures using C and Prolog
1992Kekulé: OCR-Optical Chemical RecognitionSeminal OCSR system using neural networks and heuristic graph compilation
1993Chemical Literature Data Extraction: The CLiDE ProjectSeminal system converting scanned diagrams into connection tables
1993Optical Recognition of Chemical GraphicsPrototype using vectorization and heuristic-based structure recognition
1996Kekulé-1 System for Chemical Structure RecognitionNeural OCR with chemical rule-based post-processing for OCSR
2003Chemical Machine VisionGabor wavelets and Kohonen networks for chemical image classification
2007Automatic Recognition of Chemical ImagesRule-based extraction validated against commercial baselines
2007Reconstruction of Chemical Molecules from Images5-module system converting raster images to SDF with custom vectorization
2009ChemReader: Automated Structure ExtractionModified Hough transform and chemical spell checking for OCSR
2009CLiDE Pro: Optical Chemical Structure Recognition ToolOCSR system reconstructing chemical graphs with ~90% accuracy
2009OSRA: Open Source Optical Structure RecognitionFirst open-source utility for converting chemical images to SMILES/SD
2011ChemInfty: Chemical Structure Recognition in Patent ImagesSegment-based approach for challenging Japanese patent images
2012Chemical Structure Recognition (Rule-Based)Rule-based expert system (MolRec) using 18 geometric rewrite rules
2014Probabilistic OCSR with Markov Logic NetworksMarkov Logic Networks replacing brittle heuristics for robustness
2015Research on Chemical Expression Images RecognitionImproved handling of adhesive symbols and wedge bonds
2026GraphReco: Probabilistic Structure Recognition (2026)Markov networks for probabilistic atom/bond ambiguity resolution