Computational Chemistry
Diagram of the chemoCR pipeline converting a bitmap chemical structure into a connection table

Chemical Structure Reconstruction with chemoCR (2011)

Describes chemoCR, a system that converts bitmap chemical diagrams into connection tables using a pipeline of texture-based vectorization, OCR, and a rule-based expert system, achieving 65.6% perfect recall on the TREC 2011 task.

Computational Chemistry
Pipeline diagram of ChemReader chemical structure recognition from image to connection table

ChemReader Image-to-Structure OCR at TREC 2011 Chemical IR

ChemReader achieved 93% accuracy on the TREC 2011 Image-to-Structure task, with detailed error analysis revealing the need for improved chemical intelligence in bond recognition and node merging algorithms.

Computational Chemistry
Overview of CLEF-IP 2012 tasks including patent passage retrieval, flowchart recognition, and chemical structure extraction

CLEF-IP 2012: Patent and Chemical Structure Benchmark

A resource paper detailing the CLEF-IP 2012 benchmarking lab. It introduces specific IR tasks for patent processing along with ground-truth datasets.

Computational Chemistry

MolRec at CLEF 2012: Rule-Based Structure Recognition

Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.

Computational Chemistry

OSRA at CLEF-IP 2012: Native TIFF Processing for Patents

Benchmarks OSRA on CLEF-IP 2012 patent data, showing native image processing improves precision from 0.433 to 0.708 over external splitting tools. Describes OSRA’s pairwise distance algorithm for segmentation that handles overlapping molecules better than bounding boxes.

Computational Chemistry

Overview of the TREC 2011 Chemical IR Track Benchmark

This resource paper details the third TREC Chemical IR campaign, introducing a novel Image-to-Structure task and analyzing 36 runs from 9 groups to benchmark chemical information retrieval.

Computational Chemistry

Probabilistic OCSR with Markov Logic Networks

This paper introduces MLOCSR, a system that pipelines low-level image vectorization with a high-level probabilistic Markov Logic Network to recognize chemical structures. It replaces brittle heuristics with weighted logic rules, significantly outperforming state-of-the-art systems like OSRA on degraded or low-resolution images.

Computational Chemistry
Optical Chemical Structure Recognition workflow visualization

Research on Chemical Expression Images Recognition

Proposes a new OCSR workflow that improves recognition rates by separating adhesive chemical symbols and specifically handling virtual/real wedge bonds using vectorization, achieving 90% exact match vs 82.2% for OSRA baseline.

Computational Chemistry

Chemical Structure Recognition (Rule-Based)

This paper introduces MolRec, a rule-based system for Optical Chemical Structure Recognition (OCSR). It defines a set of 18 geometric rewrite rules to disambiguate bonds and atoms in vectorised diagram images, demonstrating higher accuracy than the contemporary state-of-the-art (OSRA).

Computational Chemistry
ChemInk: Real-Time Recognition for Chemical Drawings

ChemInk: Real-Time Recognition for Chemical Drawings

ChemInk introduces a sketch recognition system for chemical diagrams that combines multi-level visual features via a joint Conditional Random Field (CRF), achieving 97.4% accuracy and outperforming CAD tools in user speed.

Computational Chemistry
CLiDE Pro: Optical Chemical Structure Recognition Tool

CLiDE Pro: Optical Chemical Structure Recognition Tool

This paper introduces CLiDE Pro, an advanced OCSR system that segments document images and reconstructs chemical connection tables. It features novel handling for crossing bonds and generic structures, validating performance on a publicly released benchmark of 454 scanned images.

Computational Chemistry

Imago: Open-Source Chemical Structure Recognition (2011)

Imago is an open-source, cross-platform C++ toolkit designed to recognize 2D chemical structure images from scientific papers and convert them into machine-readable molecule formats using a rule-based pipeline.