Computational Biology
InvMSAFold generates diverse protein sequences from structure using a Potts model

InvMSAFold: Generative Inverse Folding with Potts Models

InvMSAFold replaces autoregressive decoding with a Potts model parameter generator, enabling diverse protein sequence sampling orders of magnitude faster than ESM-IF1.

Computational Chemistry
MERMaid pipeline diagram showing PDF processing through VisualHeist segmentation, DataRaider VLM mining, and KGWizard graph construction to produce chemical knowledge graphs

MERMaid: Multimodal Chemical Reaction Mining from PDFs

MERMaid leverages fine-tuned vision models and VLM reasoning to mine chemical reaction data directly from PDF figures and tables. By handling context inference and coreference resolution, it builds high-fidelity knowledge graphs with 87% end-to-end accuracy.

Optical Chemical Structure Recognition
Overview of the OCSAug pipeline showing DDPM training, masked RePaint augmentation, and OCSR fine-tuning phases.

OCSAug: Diffusion-Based Augmentation for Hand-Drawn OCSR

OCSAug uses Denoising Diffusion Probabilistic Models (DDPM) and the RePaint algorithm with custom masking to generate synthetic hand-drawn chemical structure images, improving OCSR performance by 1.918-3.820x on the DECIMER benchmark.

Molecular Representations
Diagram showing molecular structure passing through a neural network to produce IUPAC chemical nomenclature document

STOUT V2.0: Transformer-Based SMILES to IUPAC Translation

STOUT V2.0 uses Transformers trained on ~1 billion SMILES-IUPAC pairs to accurately translate chemical structures into systematic names (and vice-versa), outperforming its RNN predecessor.

Optical Chemical Structure Recognition
Handwritten chemical ring recognition neural network architecture

Handwritten Chemical Ring Recognition with Neural Networks

Proposes a specialized Classifier-Recognizer architecture that first categorizes rings by heteroatom (S, N, O) and then identifies the specific ring using optimized grid inputs.

Optical Chemical Structure Recognition

Handwritten Chemical Symbol Recognition Using SVMs

A 2013 paper introducing a hybrid recognition system for handwritten chemical symbols on touch devices. Combines Support Vector Machines (SVM) for classification with elastic matching for geometric verification, achieving 89.7% top-1 accuracy on pen-based input for chemical structure drawing applications.

Optical Chemical Structure Recognition

HMM-based Online Recognition of Chemical Symbols

HMM-based method for recognizing online handwritten chemical symbols using 11-dimensional local features including derivatives, curvature, and linearity. Achieves 89.5% top-1 accuracy and 98.7% top-3 accuracy on a custom dataset of 64 chemical symbols.

Optical Chemical Structure Recognition

On-line Handwritten Chemical Expression Recognition

Yang et al. propose a two-level recognition system for handwritten chemical formulas, combining global structural analysis to identify substances with local character recognition using ANNs, achieving ~96% accuracy on a dataset of 1197 expressions.

Optical Chemical Structure Recognition

Online Handwritten Chemical Formula Structure Analysis

A three-level grammatical framework (formula, molecule, text) for parsing online handwritten chemical formulas, generating semantic graphs that capture both connectivity and layout using context-free grammars and HMMs.

Optical Chemical Structure Recognition

Recognition of On-line Handwritten Chemical Expressions

Proposes a novel two-level algorithm for on-line handwritten chemical expression recognition, combining substance-level matching with character-level segmentation to achieve 96% accuracy.

Optical Chemical Structure Recognition

SVM-HMM Online Classifier for Chemical Symbols

This paper proposes a double-stage architecture using SVM for rough classification and HMM for fine recognition. It features a novel Point Sequence Reordering (PSR) algorithm that significantly improves accuracy on organic ring structures.

Optical Chemical Structure Recognition
Unified framework converts handwritten chemical expressions to structured graph representations

Unified Framework for Handwritten Chemical Expressions

Proposes a unified statistical framework for recognizing both inorganic and organic handwritten chemical expressions. Introduces the Chemical Expression Structure Graph (CESG) and uses a weighted direction graph search for structural analysis, achieving 83.1% top-5 accuracy on a large proprietary dataset.