Computational Chemistry
Chemical Structure Reconstruction with chemoCR

Chemical Structure Reconstruction with chemoCR

Describes chemoCR, a system that converts bitmap chemical diagrams into connection tables using a pipeline of texture-based vectorization, OCR, and a rule-based expert system, achieving 65.6% perfect recall on the TREC 2011 task.

Computational Chemistry
ChemReader at TREC 2011 Chemical IR Track

ChemReader at TREC 2011 Chemical IR Track

ChemReader achieved 93% accuracy on the TREC 2011 Image-to-Structure task, with detailed error analysis revealing the need for improved chemical intelligence in bond recognition and node merging algorithms.

Computational Chemistry
CLEF-IP 2012 Benchmark Overview

CLEF-IP 2012 Benchmark Overview

A resource paper detailing the CLEF-IP 2012 benchmarking lab. It introduces three specific IR tasks (claims-based passage retrieval, flowchart recognition, and chemical structure recognition) along with the construction of their respective ground-truth datasets and evaluation metrics.

Computational Chemistry

MolRec at CLEF 2012

Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.

Computational Chemistry

OSRA at CLEF-IP 2012

Benchmarks OSRA on CLEF-IP 2012 patent data, demonstrating that native image processing significantly outperforms external splitting tools. Introduces a pairwise distance algorithm for segmentation that handles overlapping molecules better than bounding boxes.

Computational Chemistry

Overview of TREC 2011 Chemical IR Track

This resource paper details the third TREC Chemical IR campaign, introducing a novel Image-to-Structure task and analyzing 36 runs from 9 groups to benchmark chemical information retrieval.

Computational Chemistry
ChemInk: Real-Time Recognition for Chemical Drawings

ChemInk: Real-Time Recognition for Chemical Drawings

ChemInk introduces a sketch recognition system for chemical diagrams that combines multi-level visual features via a joint Conditional Random Field (CRF), achieving 97.4% accuracy and outperforming CAD tools in user speed.

Computational Chemistry
CLiDE Pro: Optical Chemical Structure Recognition Tool

CLiDE Pro: Optical Chemical Structure Recognition Tool

This paper introduces CLiDE Pro, an advanced OCSR system that segments document images and reconstructs chemical connection tables. It features novel handling for crossing bonds and generic structures, validating performance on a publicly released benchmark of 454 scanned images.

Computational Chemistry

OSRA: Optical Structure Recognition Application

This paper details the algorithmic pipeline of OSRA, an open-source tool that converts raster images of chemical diagrams into connection tables (SMILES/SDF). It outlines specific heuristics for page segmentation, vectorization, and atom recognition used in the TREC-CHEM Image2Structure task.

Computational Chemistry
Automatic chemical image recognition pipeline from raster image to structured file

Automatic Recognition of Chemical Images

This methodological paper presents a system for digitizing chemical images into SDF files. It utilizes a custom vectorization algorithm and chemical rule validation, achieving 94% accuracy on benchmark datasets compared to 50% for commercial tools.

Computational Chemistry
ChemReader: Automated Structure Extraction

ChemReader: Automated Structure Extraction

This paper presents ChemReader, a fully automated optical structure recognition tool that converts raster images of chemical diagrams into machine-readable formats. It introduces a modified Hough transform for bond detection and a chemical spell checker that improves OCR accuracy from 66% to 87%.

Computational Chemistry

Hand Drawn Chemical Diagram Recognition

An early method paper (AAAI ‘07) proposing a multi-stage sketch recognition pipeline. It introduces a domain verification step that uses chemical rules to refine ink parsing, achieving a 27% error reduction over geometric-only baselines.