Benchmarks and Reviews

This group collects work that evaluates, compares, or surveys OCSR methods rather than proposing new ones. It includes the two major review papers (Rajan et al. 2020 covering rule-based methods, Musazade et al. 2022 covering the deep learning transition), benchmark studies like Krasnov et al.’s 2024 comparison of eight tools on patent images, and ablation work on output representations (Rajan et al. 2022 on SMILES vs. SELFIES vs. InChI). The shared evaluation campaigns, TREC-Chem 2011 and CLEF-IP 2012, are represented both by their overview papers and by individual system descriptions (OSRA, ChemReader, Imago, chemoCR, and MolRec entries), providing a snapshot of the field’s state at those points in time.

Computational Chemistry

Uni-Parser pipeline diagram showing document pre-processing, layout detection, semantic parsing, content gathering, and format conversion stages

Uni-Parser: Industrial-Grade Multi-Modal PDF Parsing (2025)

Technical report on Uni-Parser, an industrial-grade document parsing engine that uses a modular multi-expert architecture to parse scientific PDFs into structured representations. Integrates MolParser 1.5 for OCSR, achieving 88.6% accuracy on chemical structures while processing up to 20 pages per second.

Computational Chemistry

Comparative analysis of image-to-sequence OCSR methods

Image-to-Sequence OCSR: A Comparative Analysis

Deep dive into 24 image-to-sequence OCSR methods (2019-2025), comparing encoder-decoder architectures, molecular string representations, training scale, and hardware requirements.

Computational Chemistry

Precision and recall comparison of 8 OCSR tools on patent images

Benchmarking Eight OCSR Tools on Patent Images (2024)

Comprehensive evaluation of 8 optical chemical structure recognition tools using a newly curated dataset of 2,702 patent images. Proposes ChemIC, a ResNet-50 classifier to route images to specialized tools based on content type, demonstrating that no single tool excels at all tasks.

Computational Chemistry

Review of OCSR Techniques and Models (Musazade 2022)

This systematization paper traces the history of OCSR, comparing early rule-based systems like OSRA with modern deep learning approaches like DECIMER. It highlights the shift from image classification to image captioning and identifies critical gaps in dataset standardization and evaluation metrics.

Computational Chemistry

String Representations for Chemical Image Recognition

This empirical study isolates the impact of chemical string representations on image-to-text translation models. It finds that while SMILES offers the highest overall accuracy, SELFIES provides a guarantee of structural validity, offering a trade-off for OCSR tasks.

Computational Chemistry

A Review of Optical Chemical Structure Recognition Tools

This paper reviews three decades of OCSR development, transitioning from rule-based heuristics to early deep learning approaches. It includes a benchmark study comparing the performance of three open-source tools (OSRA, Imago, MolVec) on four diverse datasets.

Computational Chemistry

Diagram of the chemoCR pipeline converting a bitmap chemical structure into a connection table

Chemical Structure Reconstruction with chemoCR (2011)

Describes chemoCR, a system that converts bitmap chemical diagrams into connection tables using a pipeline of texture-based vectorization, OCR, and a rule-based expert system, achieving 65.6% perfect recall on the TREC 2011 task.

Computational Chemistry

Pipeline diagram of ChemReader chemical structure recognition from image to connection table

ChemReader Image-to-Structure OCR at TREC 2011 Chemical IR

ChemReader achieved 93% accuracy on the TREC 2011 Image-to-Structure task, with detailed error analysis revealing the need for improved chemical intelligence in bond recognition and node merging algorithms.

Computational Chemistry

Overview of CLEF-IP 2012 tasks including patent passage retrieval, flowchart recognition, and chemical structure extraction

CLEF-IP 2012: Patent and Chemical Structure Benchmark

A resource paper detailing the CLEF-IP 2012 benchmarking lab. It introduces specific IR tasks for patent processing along with ground-truth datasets.

Computational Chemistry

MolRec at CLEF 2012: Rule-Based Structure Recognition

Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.

Computational Chemistry

OSRA at CLEF-IP 2012: Native TIFF Processing for Patents

Benchmarks OSRA on CLEF-IP 2012 patent data, showing native image processing improves precision from 0.433 to 0.708 over external splitting tools. Describes OSRA’s pairwise distance algorithm for segmentation that handles overlapping molecules better than bounding boxes.

Computational Chemistry

Overview of the TREC 2011 Chemical IR Track Benchmark

This resource paper details the third TREC Chemical IR campaign, introducing a novel Image-to-Structure task and analyzing 36 runs from 9 groups to benchmark chemical information retrieval.