Recognition of On-line Handwritten Chemical Expressions
Proposes a novel two-level algorithm for on-line handwritten chemical expression recognition, combining substance-level matching with character-level segmentation to achieve 96% accuracy.
Proposes a novel two-level algorithm for on-line handwritten chemical expression recognition, combining substance-level matching with character-level segmentation to achieve 96% accuracy.
This paper reviews three decades of OCSR development, transitioning from rule-based heuristics to early deep learning approaches. It includes a benchmark study comparing the performance of three open-source tools (OSRA, Imago, MolVec) on four diverse datasets.
This paper proposes a double-stage architecture using SVM for rough classification and HMM for fine recognition, featuring a novel Point Sequence Reordering (PSR) algorithm that significantly improves accuracy on organic ring structures.

Proposes a unified statistical framework for recognizing both inorganic and organic handwritten chemical expressions. Introduces the Chemical Expression Structure Graph (CESG) and uses a weighted direction graph search for structural analysis, achieving 83.1% top-5 accuracy on a large proprietary dataset.

Describes chemoCR, a system that converts bitmap chemical diagrams into connection tables using a pipeline of texture-based vectorization, OCR, and a rule-based expert system, achieving 65.6% perfect recall on the TREC 2011 task.

ChemReader achieved 93% accuracy on the TREC 2011 Image-to-Structure task, with detailed error analysis revealing the need for improved chemical intelligence in bond recognition and node merging algorithms.

A resource paper detailing the CLEF-IP 2012 benchmarking lab. It introduces three specific IR tasks (claims-based passage retrieval, flowchart recognition, and chemical structure recognition) along with the construction of their respective ground-truth datasets and evaluation metrics.
Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.
Benchmarks OSRA on CLEF-IP 2012 patent data, demonstrating that native image processing significantly outperforms external splitting tools. Introduces a pairwise distance algorithm for segmentation that handles overlapping molecules better than bounding boxes.
This resource paper details the third TREC Chemical IR campaign, introducing a novel Image-to-Structure task and analyzing 36 runs from 9 groups to benchmark chemical information retrieval.
This paper introduces MLOCSR, a system that pipelines low-level image vectorization with a high-level probabilistic Markov Logic Network to recognize chemical structures. It replaces brittle heuristics with weighted logic rules, significantly outperforming state-of-the-art systems like OSRA on degraded or low-resolution images.

Proposes a new OCSR workflow that improves recognition rates by separating adhesive chemical symbols and specifically handling virtual/real wedge bonds using vectorization, achieving 90% exact match vs 82.2% for OSRA baseline.