MolRec at CLEF 2012
Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.
Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.
Benchmarks OSRA on CLEF-IP 2012 patent data, demonstrating that native image processing significantly outperforms external splitting tools. Introduces a pairwise distance algorithm for segmentation that handles overlapping molecules better than bounding boxes.
This resource paper details the third TREC Chemical IR campaign, introducing a novel Image-to-Structure task and analyzing 36 runs from 9 groups to benchmark chemical information retrieval.
This paper introduces MLOCSR, a system that pipelines low-level image vectorization with a high-level probabilistic Markov Logic Network to recognize chemical structures. It replaces brittle heuristics with weighted logic rules, significantly outperforming state-of-the-art systems like OSRA on degraded or low-resolution images.

Proposes a new OCSR workflow that improves recognition rates by separating adhesive chemical symbols and specifically handling virtual/real wedge bonds using vectorization, achieving 90% exact match vs 82.2% for OSRA baseline.
This paper introduces MolRec, a rule-based system for Optical Chemical Structure Recognition (OCSR). It defines a set of 18 geometric rewrite rules to disambiguate bonds and atoms in vectorised diagram images, demonstrating higher accuracy than the contemporary state-of-the-art (OSRA).

This paper introduces CLiDE Pro, an advanced OCSR system that segments document images and reconstructs chemical connection tables. It features novel handling for crossing bonds and generic structures, validating performance on a publicly released benchmark of 454 scanned images.
Imago is an open-source, cross-platform C++ toolkit designed to recognize 2D chemical structure images from scientific papers and convert them into machine-readable molecule formats using a rule-based pipeline.
This paper introduces Kekulé-1, one of the first successful Optical Chemical Structure Recognition (OCSR) systems. It details a hybrid approach using neural networks for character recognition and heuristic vectorization for bond detection, achieving 98.9% accuracy on a test set of 524 structures.
This paper details the algorithmic pipeline of OSRA, an open-source tool that converts raster images of chemical diagrams into connection tables (SMILES/SDF). It outlines specific heuristics for page segmentation, vectorization, and atom recognition used in the TREC-CHEM Image2Structure task.
This paper proposes a strategy for interpreting handwritten chemical formulas by converting bitmap images into a dynamic structural graph of quadrilaterals. It achieves ~97% recognition on graphical elements by using recursive ‘specialists’ to identify chemical bonds and rings.

This methodological paper presents a system for digitizing chemical images into SDF files. It utilizes a custom vectorization algorithm and chemical rule validation, achieving 94% accuracy on benchmark datasets compared to 50% for commercial tools.