Computational Chemistry

String Representations for Chemical Image Recognition

This methodological study isolates the impact of chemical string representations on image-to-text translation models. It finds that while SMILES offers the highest overall accuracy, SELFIES provides a guarantee of structural validity, offering a trade-off for OCSR tasks.

Computational Chemistry

SwinOCSR: Vision Transformers for Chemical OCR

Proposes an end-to-end architecture replacing standard CNN backbones with Swin Transformer to capture global image context. Introduces Multi-label Focal Loss to handle severe token imbalance in chemical datasets.

Computational Chemistry
ChemGrapher pipeline overview showing segmentation and classification stages

ChemGrapher: Deep Learning for Chemical OCR

ChemGrapher replaces rule-based chemical OCR with a deep learning pipeline using semantic segmentation to identify atom and bond candidates, followed by specialized classification networks to resolve stereochemistry and bond multiplicity, significantly outperforming OSRA.

Computational Chemistry
DECIMER: Deep Learning for Chemical Image Recognition

DECIMER: Deep Learning for Chemical Image Recognition

DECIMER adapts the “Show, Attend and Tell” image captioning architecture to translate chemical structure images into SMILES strings. By leveraging massive synthetic datasets generated from PubChem, it demonstrates that deep learning can perform optical chemical recognition without complex, hand-engineered rule systems.

Computational Chemistry

Deep Learning for Molecular Structure Extraction

This paper presents a two-stage deep learning pipeline to extract chemical structures from documents and convert them to SMILES strings. By training on large-scale synthetic data, the method overcomes the brittleness of rule-based systems and demonstrates high accuracy even on low-resolution and noisy input images.

Computational Chemistry
Handwritten chemical ring recognition neural network architecture

Handwritten Chemical Ring Recognition with NNs

Proposes a specialized Classifier-Recognizer architecture that first categorizes rings by heteroatom (S, N, O) and then identifies the specific ring using optimized grid inputs.

Computational Chemistry

Handwritten Chemical Symbol Recognition Using SVMs

A 2013 paper introducing a hybrid recognition system for handwritten chemical symbols on touch devices. Combines Support Vector Machines (SVM) for classification with elastic matching for geometric verification, achieving 89.7% top-1 accuracy on pen-based input for chemical structure drawing applications.

Computational Chemistry

HMM-based Online Recognition of Chemical Symbols

HMM-based method for recognizing online handwritten chemical symbols using 11-dimensional local features including derivatives, curvature, and linearity. Achieves 89.5% top-1 accuracy and 98.7% top-3 accuracy on a custom dataset of 64 chemical symbols.

Computational Chemistry
Optical chemical structure recognition example

Img2Mol: Accurate SMILES from Molecular Depictions

A 2021 deep learning system using a two-stage approach for OCSR - first encoding images into continuous CDDD embeddings, then decoding to SMILES - with extensive data augmentation handling rotations, distortions, and rendering variations to achieve fast and robust molecular structure recognition.

Computational Chemistry

On-line Handwritten Chemical Expression Recognition

Yang et al. propose a two-level recognition system for handwritten chemical formulas, combining global structural analysis to identify substances with local character recognition using ANNs, achieving ~96% accuracy on a dataset of 1197 expressions.

Computational Chemistry

Online Handwritten Chemical Formula Structure Analysis

A three-level grammatical framework (formula, molecule, text) for parsing online handwritten chemical formulas, generating semantic graphs that capture both connectivity and layout using context-free grammars and HMMs.

Computational Chemistry

Recognition of On-line Handwritten Chemical Expressions

Proposes a novel two-level algorithm for on-line handwritten chemical expression recognition, combining substance-level matching with character-level segmentation to achieve 96% accuracy.