Computational Chemistry

HMM-based Online Recognition of Chemical Symbols

HMM-based method for recognizing online handwritten chemical symbols using 11-dimensional local features including derivatives, curvature, and linearity. Achieves 89.5% top-1 accuracy and 98.7% top-3 accuracy on a custom dataset of 64 chemical symbols.

Computational Chemistry
Ibuprofen molecular structure diagram for Img2Mol OCSR

Img2Mol: Accurate SMILES Recognition from Depictions

A 2021 deep learning system using a two-stage approach for OCSR, encoding images into continuous CDDD embeddings before decoding to SMILES. It leverages extensive data augmentation to handle rotations, distortions, and rendering variations for fast and robust molecular structure recognition.

Computational Chemistry

On-line Handwritten Chemical Expression Recognition

Yang et al. propose a two-level recognition system for handwritten chemical formulas, combining global structural analysis to identify substances with local character recognition using ANNs, achieving ~96% accuracy on a dataset of 1197 expressions.

Computational Chemistry

Online Handwritten Chemical Formula Structure Analysis

A three-level grammatical framework (formula, molecule, text) for parsing online handwritten chemical formulas, generating semantic graphs that capture both connectivity and layout using context-free grammars and HMMs.

Computational Chemistry

Recognition of On-line Handwritten Chemical Expressions

Proposes a novel two-level algorithm for on-line handwritten chemical expression recognition, combining substance-level matching with character-level segmentation to achieve 96% accuracy.

Computational Chemistry

SVM-HMM Online Classifier for Chemical Symbols

This paper proposes a double-stage architecture using SVM for rough classification and HMM for fine recognition. It features a novel Point Sequence Reordering (PSR) algorithm that significantly improves accuracy on organic ring structures.

Computational Chemistry
Unified framework converts handwritten chemical expressions to structured graph representations

Unified Framework for Handwritten Chemical Expressions

Proposes a unified statistical framework for recognizing both inorganic and organic handwritten chemical expressions. Introduces the Chemical Expression Structure Graph (CESG) and uses a weighted direction graph search for structural analysis, achieving 83.1% top-5 accuracy on a large proprietary dataset.

Computational Chemistry
Diagram of the chemoCR pipeline converting a bitmap chemical structure into a connection table

Chemical Structure Reconstruction with chemoCR (2011)

Describes chemoCR, a system that converts bitmap chemical diagrams into connection tables using a pipeline of texture-based vectorization, OCR, and a rule-based expert system, achieving 65.6% perfect recall on the TREC 2011 task.

Computational Chemistry
Pipeline diagram of ChemReader chemical structure recognition from image to connection table

ChemReader Image-to-Structure OCR at TREC 2011 Chemical IR

ChemReader achieved 93% accuracy on the TREC 2011 Image-to-Structure task, with detailed error analysis revealing the need for improved chemical intelligence in bond recognition and node merging algorithms.

Computational Chemistry
Overview of CLEF-IP 2012 tasks including patent passage retrieval, flowchart recognition, and chemical structure extraction

CLEF-IP 2012: Patent and Chemical Structure Benchmark

A resource paper detailing the CLEF-IP 2012 benchmarking lab. It introduces specific IR tasks for patent processing along with ground-truth datasets.

Computational Chemistry

MolRec at CLEF 2012: Rule-Based Structure Recognition

Describes the MolRec system’s performance in the CLEF 2012 Chemical Structure Recognition task, detailing its rule-based vectorization engine and analyzing failure modes like touching characters and complex bond types.

Computational Chemistry

OSRA at CLEF-IP 2012: Native TIFF Processing for Patents

Benchmarks OSRA on CLEF-IP 2012 patent data, showing native image processing improves precision from 0.433 to 0.708 over external splitting tools. Describes OSRA’s pairwise distance algorithm for segmentation that handles overlapping molecules better than bounding boxes.