Survey papers, competition reports, comparative analyses, and TREC/CLEF system descriptions for OCSR evaluation.
This group collects work that evaluates, compares, or surveys OCSR methods rather than proposing new ones. It includes the two major review papers (Rajan et al. 2020 covering rule-based methods, Musazade et al. 2022 covering the deep learning transition), benchmark studies like Krasnov et al.’s 2024 comparison of eight tools on patent images, and ablation work on output representations (Rajan et al. 2022 on SMILES vs. SELFIES vs. InChI). The shared evaluation campaigns, TREC-Chem 2011 and CLEF-IP 2012, are represented both by their overview papers and by individual system descriptions (OSRA, ChemReader, Imago, chemoCR, and MolRec entries), providing a snapshot of the field’s state at those points in time.
Imago: Open-Source Chemical Structure Recognition (2011)
Imago is an open-source, cross-platform C++ toolkit designed to recognize 2D chemical structure images from scientific papers and convert them into machine-readable molecule formats using a rule-based pipeline.
OSRA at TREC-CHEM 2011: Optical Structure Recognition
This paper details the algorithmic pipeline of OSRA, an open-source tool that converts raster images of chemical diagrams into connection tables (SMILES/SDF). It outlines specific heuristics for page segmentation, vectorization, and atom recognition used in the TREC-CHEM Image2Structure task.
A comprehensive categorization of OCSR methods, organizing techniques by their fundamental approach: deep learning, traditional ML, and rule-based systems.
MolRec: Chemical Structure Recognition at CLEF 2012
Performance evaluation of MolRec at the CLEF 2012 competition reveals a large performance gap between the automatic evaluation set (94-96% accuracy) and the manual evaluation set of complex patent structures (46-59% accuracy), with systematic analysis of failure modes including character grouping bugs, touching characters, and four-way junction vectorization.
MolRec: Rule-Based OCSR System at TREC 2011 Benchmark
Details the MolRec system for converting chemical diagram images into MOL files using vectorization, geometric rules, and graph construction. Achieved 95% accuracy on 1000 TREC 2011 benchmark images with comprehensive failure analysis of limitations.