Computational Chemistry

Review of OCSR Tools (2020)

This paper reviews three decades of OCSR development, transitioning from rule-based heuristics to early deep learning approaches. It includes a benchmark study comparing the performance of three open-source tools (OSRA, Imago, MolVec) on four diverse datasets.

Computational Chemistry

SVM-HMM Online Classifier for Chemical Symbols

This paper proposes a double-stage architecture using SVM for rough classification and HMM for fine recognition, featuring a novel Point Sequence Reordering (PSR) algorithm that significantly improves accuracy on organic ring structures.

Computational Chemistry
Unified framework converts handwritten chemical expressions to structured graph representations

Unified Framework for Handwritten Chemical Expressions

Proposes a unified statistical framework for recognizing both inorganic and organic handwritten chemical expressions. Introduces the Chemical Expression Structure Graph (CESG) and uses a weighted direction graph search for structural analysis, achieving 83.1% top-5 accuracy on a large proprietary dataset.

Computational Chemistry
Chemical Structure Reconstruction with chemoCR

Chemical Structure Reconstruction with chemoCR

Describes chemoCR, a system that converts bitmap chemical diagrams into connection tables using a pipeline of texture-based vectorization, OCR, and a rule-based expert system, achieving 65.6% perfect recall on the TREC 2011 task.

Computational Chemistry
ChemReader at TREC 2011 Chemical IR Track

ChemReader at TREC 2011 Chemical IR Track

ChemReader achieved 93% accuracy on the TREC 2011 Image-to-Structure task, with detailed error analysis revealing the need for improved chemical intelligence in bond recognition and node merging algorithms.

Computational Chemistry
CLEF-IP 2012 Benchmark Overview

CLEF-IP 2012 Benchmark Overview

A resource paper detailing the CLEF-IP 2012 benchmarking lab. It introduces three specific IR tasks (claims-based passage retrieval, flowchart recognition, and chemical structure recognition) along with the construction of their respective ground-truth datasets and evaluation metrics.

Computational Chemistry

MolRec at CLEF 2012

Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.

Computational Chemistry

OSRA at CLEF-IP 2012

Benchmarks OSRA on CLEF-IP 2012 patent data, demonstrating that native image processing significantly outperforms external splitting tools. Introduces a pairwise distance algorithm for segmentation that handles overlapping molecules better than bounding boxes.

Computational Chemistry

Overview of TREC 2011 Chemical IR Track

This resource paper details the third TREC Chemical IR campaign, introducing a novel Image-to-Structure task and analyzing 36 runs from 9 groups to benchmark chemical information retrieval.

Computational Chemistry

Probabilistic OCSR with Markov Logic Networks

This paper introduces MLOCSR, a system that pipelines low-level image vectorization with a high-level probabilistic Markov Logic Network to recognize chemical structures. It replaces brittle heuristics with weighted logic rules, significantly outperforming state-of-the-art systems like OSRA on degraded or low-resolution images.

Computational Chemistry
Optical Chemical Structure Recognition workflow visualization

Research on Chemical Expression Images Recognition

Proposes a new OCSR workflow that improves recognition rates by separating adhesive chemical symbols and specifically handling virtual/real wedge bonds using vectorization, achieving 90% exact match vs 82.2% for OSRA baseline.

Computational Chemistry

Chemical Structure Recognition (Rule-Based)

This paper introduces MolRec, a rule-based system for Optical Chemical Structure Recognition (OCSR). It defines a set of 18 geometric rewrite rules to disambiguate bonds and atoms in vectorised diagram images, demonstrating higher accuracy than the contemporary state-of-the-art (OSRA).