Optical Chemical Structure Recognition
4-chlorofluorobenzene molecular structure diagram for SwinOCSR

SwinOCSR: End-to-End Chemical OCR with Swin Transformers

Proposes an end-to-end architecture replacing standard CNN backbones with Swin Transformer to capture global image context. Introduces Multi-label Focal Loss to handle severe token imbalance in chemical datasets.

Optical Chemical Structure Recognition

A Review of Optical Chemical Structure Recognition Tools

This paper reviews three decades of OCSR development, transitioning from rule-based heuristics to early deep learning approaches. It includes a benchmark study comparing the performance of three open-source tools (OSRA, Imago, MolVec) on four diverse datasets.

Optical Chemical Structure Recognition
Diagram of the chemoCR pipeline converting a bitmap chemical structure into a connection table

Chemical Structure Reconstruction with chemoCR (2011)

Describes chemoCR, a system that converts bitmap chemical diagrams into connection tables using a pipeline of texture-based vectorization, OCR, and a rule-based expert system, achieving 65.6% perfect recall on the TREC 2011 task.

Optical Chemical Structure Recognition
Pipeline diagram of ChemReader chemical structure recognition from image to connection table

ChemReader Image-to-Structure OCR at TREC 2011 Chemical IR

ChemReader achieved 93% accuracy on the TREC 2011 Image-to-Structure task, with detailed error analysis revealing the need for improved chemical intelligence in bond recognition and node merging algorithms.

Optical Chemical Structure Recognition
Overview of CLEF-IP 2012 tasks including patent passage retrieval, flowchart recognition, and chemical structure extraction

CLEF-IP 2012: Patent and Chemical Structure Benchmark

A resource paper detailing the CLEF-IP 2012 benchmarking lab. It introduces specific IR tasks for patent processing along with ground-truth datasets.

Optical Chemical Structure Recognition

MolRec at CLEF 2012: Rule-Based Structure Recognition

Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.

Optical Chemical Structure Recognition

OSRA at CLEF-IP 2012: Native TIFF Processing for Patents

Benchmarks OSRA on CLEF-IP 2012 patent data, showing native image processing improves precision from 0.433 to 0.708 over external splitting tools. Describes OSRA’s pairwise distance algorithm for segmentation that handles overlapping molecules better than bounding boxes.

Optical Chemical Structure Recognition

Overview of the TREC 2011 Chemical IR Track Benchmark

This resource paper details the third TREC Chemical IR campaign, introducing a novel Image-to-Structure task and analyzing 36 runs from 9 groups to benchmark chemical information retrieval.

Optical Chemical Structure Recognition
Diagram of the ChemInk sketch recognition system converting freehand chemical drawings into structured molecular data

ChemInk: Real-Time Recognition for Chemical Drawings

ChemInk introduces a sketch recognition system for chemical diagrams that combines multi-level visual features via a joint Conditional Random Field (CRF), achieving 97.4% accuracy and outperforming CAD tools in user speed.

Optical Chemical Structure Recognition
Diagram of the CLiDE Pro system for segmenting document images and reconstructing chemical connection tables

CLiDE Pro: Optical Chemical Structure Recognition Tool

This paper introduces CLiDE Pro, an advanced OCSR system that segments document images and reconstructs chemical connection tables. It features novel handling for crossing bonds and generic structures, validating performance on a publicly released benchmark of 454 scanned images.

Optical Chemical Structure Recognition

OSRA at TREC-CHEM 2011: Optical Structure Recognition

This paper details the algorithmic pipeline of OSRA, an open-source tool that converts raster images of chemical diagrams into connection tables (SMILES/SDF). It outlines specific heuristics for page segmentation, vectorization, and atom recognition used in the TREC-CHEM Image2Structure task.

Optical Chemical Structure Recognition
Automatic chemical image recognition pipeline from raster image to structured file

Automatic Recognition of Chemical Images

This methodological paper presents a system for digitizing chemical images into SDF files. It utilizes a custom vectorization algorithm and chemical rule validation, achieving 94% accuracy on benchmark datasets compared to 50% for commercial tools.