Computational Chemistry

Chemical Structure Recognition (Rule-Based)

This paper introduces MolRec, a rule-based system for Optical Chemical Structure Recognition (OCSR). It defines a set of 18 geometric rewrite rules to disambiguate bonds and atoms in vectorised diagram images, demonstrating higher accuracy than the contemporary state-of-the-art (OSRA).

Computational Chemistry
ChemInk: Real-Time Recognition for Chemical Drawings

ChemInk: Real-Time Recognition for Chemical Drawings

ChemInk introduces a sketch recognition system for chemical diagrams that combines multi-level visual features via a joint Conditional Random Field (CRF), achieving 97.4% accuracy and outperforming CAD tools in user speed.

Computational Chemistry
CLiDE Pro: Optical Chemical Structure Recognition Tool

CLiDE Pro: Optical Chemical Structure Recognition Tool

This paper introduces CLiDE Pro, an advanced OCSR system that segments document images and reconstructs chemical connection tables. It features novel handling for crossing bonds and generic structures, validating performance on a publicly released benchmark of 454 scanned images.

Computational Chemistry

Imago: Structure Recognition at TREC-CHEM 2011

Imago is an open-source, cross-platform C++ toolkit designed to recognize 2D chemical structure images from scientific papers and convert them into machine-readable molecule formats using a rule-based pipeline.

Computational Chemistry

Kekulé-1 System for Chemical Structure Recognition

This paper introduces Kekulé-1, one of the first successful Optical Chemical Structure Recognition (OCSR) systems. It details a hybrid approach using neural networks for character recognition and heuristic vectorization for bond detection, achieving 98.9% accuracy on a test set of 524 structures.

Computational Chemistry

OSRA: Optical Structure Recognition Application

This paper details the algorithmic pipeline of OSRA, an open-source tool that converts raster images of chemical diagrams into connection tables (SMILES/SDF). It outlines specific heuristics for page segmentation, vectorization, and atom recognition used in the TREC-CHEM Image2Structure task.

Computational Chemistry

Structural Analysis of Handwritten Chemical Formulas

This paper proposes a strategy for interpreting handwritten chemical formulas by converting bitmap images into a dynamic structural graph of quadrilaterals. It achieves ~97% recognition on graphical elements by using recursive ‘specialists’ to identify chemical bonds and rings.

Computational Chemistry
Automatic chemical image recognition pipeline from raster image to structured file

Automatic Recognition of Chemical Images

This methodological paper presents a system for digitizing chemical images into SDF files. It utilizes a custom vectorization algorithm and chemical rule validation, achieving 94% accuracy on benchmark datasets compared to 50% for commercial tools.

Computational Chemistry
Chemical Literature Data Extraction: The CLiDE Project

Chemical Literature Data Extraction: The CLiDE Project

The CLiDE project (Chemical Literature Data Extraction) presents a foundational architecture for Optical Chemical Structure Recognition (OCSR). It details a three-phase pipeline - primitive recognition, text grouping, and interpretation - to convert bitmapped journal pages into chemically significant connection tables, handling complex features like stereochemistry and crossing bonds.

Computational Chemistry
Visualization of Gabor wavelets and Kohonen networks for chemical image classification

Chemical Machine Vision

This 2003 paper introduces a machine vision approach for extracting chemical metadata from raster images. By using Gabor wavelets for feature extraction and Kohonen networks for classification, it distinguishes between chemical and non-chemical images, as well as ring and non-ring systems, without requiring high-resolution inputs.

Computational Chemistry
ChemReader: Automated Structure Extraction

ChemReader: Automated Structure Extraction

This paper presents ChemReader, a fully automated optical structure recognition tool that converts raster images of chemical diagrams into machine-readable formats. It introduces a modified Hough transform for bond detection and a chemical spell checker that improves OCR accuracy from 66% to 87%.

Computational Chemistry
Graph of the Lennard-Jones 12-6 potential showing the characteristic attractive and repulsive forces

Dynamical Corrections to TST for Surface Diffusion

This paper bridges Molecular Dynamics and Transition State Theory by applying a dynamical corrections formalism to surface diffusion, identifying a low-temperature bounce-back mechanism causing non-Arrhenius behavior.