Molecular Sets (MOSES): A Generative Modeling Benchmark
The Reliability Trap: The Limits of 99% Accuracy
The Evolution of Page Stream Segmentation: Rules to LLMs
GutenOCR: A Grounded Vision-Language Front-End for Documents
PubMed-OCR: PMC Open Access OCR Annotations
ChemBERTa-3: Open Source Training Framework
ChemDFM-R: Chemical Reasoner LLM
ChemBERTa-2: Scaling Molecular Transformers to 77M
GP-MoLFormer: Molecular Generation via Transformers
ChemBERTa: Molecular Property Prediction via Transformers
Chemformer: Pre-trained Transformer for Comp Chem
A Convexity Principle for Interacting Gases: Theory
Building Normalizing Flows with Stochastic Interpolants
Flow Matching for Generative Modeling: Scalable CNFs
Neural ODEs: Continuous-Depth Deep Learning
Rectified Flow: Learning to Generate and Transfer Data
Score Matching and Denoising Autoencoders
Score-Based Generative Modeling with SDEs
ChemDFM-X: Large Multimodal Model for Chemistry
DynamicFlow: Integrating Protein Dynamics into Drug Design
Image-to-Sequence OCSR: A Comparative Analysis
InstructMol: Multi-Modal Molecular Assistant
InvMSAFold: Generative Inverse Folding with Potts Models
MERMaid: Multimodal Reaction Mining
MOFFlow: Flow Matching for MOF Structure Prediction
Multimodal Search in Chemical Documents
OCSAug: Diffusion-Based Augmentation for Hand-Drawn OCSR
STOUT V2.0: SMILES to IUPAC Name Conversion
STOUT: SMILES to IUPAC names using NMT
Struct2IUPAC: Transformers for SMILES to IUPAC
Translating InChI to IUPAC Names with Transformers
AtomLenz: Atom-Level OCSR with Limited Supervision
ChemReco: Hand-Drawn Chemical Structure Recognition
ChemVLM: Multimodal LLM for Chemistry
Comparing OCSR Tools (Krasnov et al. 2024)
DECIMER.ai: Optical Chemical Structure Recognition
Dual-Path Global Awareness Transformer (DGAT)
Enhanced DECIMER for Hand-Drawn Structure Recognition
Image2InChI: SwinTransformer for Molecular Recognition
MarkushGrapher: Multi-modal Markush Structure Recognition
MMSSC-Net: Multi-Stage Sequence Cognitive Networks
MolGrapher: Graph-based Chemical Recognition
MolMole: Unified Vision Pipeline for Molecule Mining
MolScribe: Image-to-Graph Molecular Recognition
MolSight: OCSR with RL and Multi-Granularity Learning
OCSU: Optical Chemical Structure Understanding
RFL: Simplifying Chemical Structure Recognition
ABC-Net: Divide-and-Conquer SMILES Recognition
ChemPix: Hand-Drawn Hydrocarbon Recognition
DECIMER 1.0: Transformers for Chemical Image Recognition
End-to-End Transformer for Molecular Image Captioning
Handwritten Chemical Structure Recognition with RCGD
ICMDT: Automated Chemical Image Recognition
Image-to-Graph Transformers
Image2SMILES: Transformer OCSR with Synthetic Data Pipeline
MICER: Molecular Image Captioning with Transfer Learning
MolMiner: Deep Learning OCSR with YOLOv5 Detection
One Strike, You’re Out: Detecting Markush Structures
Review of OCSR Techniques (2022)
String Representations for Chemical Image Recognition
SwinOCSR: Vision Transformers for Chemical OCR
ChemGrapher: Deep Learning for Chemical OCR
DECIMER: Deep Learning for Chemical Image Recognition
Deep Learning for Molecular Structure Extraction
Handwritten Chemical Ring Recognition with NNs
Handwritten Chemical Symbol Recognition Using SVMs
HMM-based Online Recognition of Chemical Symbols
Img2Mol: Accurate SMILES from Molecular Depictions
On-line Handwritten Chemical Expression Recognition
Online Handwritten Chemical Formula Structure Analysis
Recognition of On-line Handwritten Chemical Expressions
Review of OCSR Tools (2020)
SVM-HMM Online Classifier for Chemical Symbols
Unified Framework for Handwritten Chemical Expressions
Chemical Structure Reconstruction with chemoCR
ChemReader at TREC 2011 Chemical IR Track
CLEF-IP 2012 Benchmark Overview
Overview of TREC 2011 Chemical IR Track
Probabilistic OCSR with Markov Logic Networks
Research on Chemical Expression Images Recognition
Chemical Structure Recognition (Rule-Based)
ChemInk: Real-Time Recognition for Chemical Drawings
CLiDE Pro: Optical Chemical Structure Recognition Tool
Imago: Structure Recognition at TREC-CHEM 2011
Kekulé-1 System for Chemical Structure Recognition
OSRA: Optical Structure Recognition Application
Structural Analysis of Handwritten Chemical Formulas
A Spatial Model for Legislative Roll Call Analysis
Automatic Recognition of Chemical Images
Chaotic Evolution of the Solar System (1992)
Chemical Literature Data Extraction: The CLiDE Project
ChemReader: Automated Structure Extraction
Distributed Representations: A Foundational Theory
Dynamical Corrections to TST for Surface Diffusion
EAM User Guide: Voter’s Handbook Chapter
Embedded-Atom Method: Theory and Applications Review
Funnels, Pathways, and Energy Landscapes of Protein Folding
Graph Perception for Chemical Structure OCR
Hand Drawn Chemical Diagram Recognition
IMG2SMI: Translating Molecular Structure Images to SMILES
Kekulé: OCR-Optical Chemical Recognition
Kinetic Oscillations on Pt(100): Theory
MD Study of Self-Diffusion on Metal Surfaces
Mixture Density Networks: Modeling Multimodal Distributions
OCSR Methods: A Taxonomy of Approaches
Optical Recognition of Chemical Graphics
Oscillatory CO Oxidation on Pt(110)
OSRA: Open Source Optical Structure Recognition
Oxidation/Reduction Oscillations on Pt/SiO2
Party Matters: Enhancing Legislative Embeddings
Reconstruction of Chemical Molecules from Images
Second-Order Langevin Equation for Field Simulations
Stillinger-Weber Potential for Silicon
The Drive to Life on Wet and Icy Worlds
Thermal Conductivity of the Lennard-Jones Fluid
Three Domains of Life: Woese’s Phylogenetic Revolution
AI & Physical Sciences Taxonomy: A Six-Vector Framework
Correlations in Motion of Atoms in Liquid Argon
Diffusion of Adatom Dimers on (111) Surfaces
Terraforming Venus: The Cloud Continent Proposal
Venus Evolution Through Time
Life on Venus? Astrobiology and Habitability Limits
Molecular String Renderer: Robust Visualization Tool
Auto-Encoding Variational Bayes: VAE Paper Summary
Importance Weighted Autoencoders: Beyond the Standard VAE
IWAE: Importance Weighted Autoencoders
InChI and Tautomerism: Toward Comprehensive Treatment
InChI: The Worldwide Chemical Structure Identifier Standard
Making InChI FAIR and Sustainable for Inorganic Chemistry
Mixfile & MInChI: Machine-Readable Mixture Formats
NInChI: Toward a Chemical Identifier for Nanomaterials
Recent Advances in the SELFIES Library (2023)
RInChI: Reaction International Chemical Identifier
SELFIES: The Original Paper (Krenn et al. 2020)
SMILES: The Original Paper (Weininger 1988)
GTR-CoT: Graph Traversal Chain-of-Thought for Molecules
MolRec: Chemical Structure Recognition at CLEF 2012
MolRec: Rule-Based OCSR System
SubGrapher: Visual Fingerprinting of Chemical Structures
What is Optical Chemical Structure Recognition (OCSR)?
αExtractor: Chemical Info from Biomedical Literature
ChemInfty: Chemical Structure Recognition in Patent Images
MolNexTR: Dual-Stream Molecular Image Recognition
MolParser-7M & WildMol: Large-Scale OCSR Datasets
MolParser: End-to-End Molecular Structure Recognition
ZINC-22: A Multi-Billion Scale Database for Ligand Discovery
Converting SMILES and SELFIES to 2D Molecular Images
SELFIES (Self-Referencing Embedded Strings)
Communication in the Presence of Noise: Shannon’s 1949 Paper
How to Fold Graciously: The Levinthal Paradox
MARCEL: Molecular Representation & Conformers
SMILES: Compact Notation for Chemical Structures
The Number of Isomeric Hydrocarbons of the Methane Series
The Surface of Venus: Stratigraphy and Resurfacing
GEOM: Energy-Annotated Molecular Conformations
Exponential Random Numbers: Two Classic Algorithms
GDB-11: Chemical Universe Database (26.4M Molecules)
Implementing the Müller-Brown Potential in PyTorch
Müller-Brown Potential: A PyTorch ML Testbed
DenoiseVAE: Adaptive Noise for Molecular Pre-training
Beyond Atoms: 3D Space Modeling for Molecular Pretraining
Dark Side of Forces: Non-Conservative ML Force Models
Efficient DFT Hamiltonian Prediction via Adaptive Sparsity
Learning Smooth Interatomic Potentials with eSEN
Modernizing Rahman’’s 1964 Argon Simulation
Modernizing Rahman’s 1964 Argon Simulation
Embedded-Atom Method: Impurities and Defects in Metals
Umbrella Sampling: Monte Carlo Free-Energy Estimation
Adsorption and Diffusion on Surfaces
Contrastive Learning for Variational Autoencoder Priors
GDB-13: Chemical Universe Database (970M Molecules)
GDB-17: Chemical Universe Database (166.4B Molecules)
High-Performance Word2Vec in Pure PyTorch
GEOM Dataset: 3D Molecular Conformer Generation
3D Steerable CNNs: Rotationally Equivariant Features
LLMs for Insurance Document Automation
Optimizing Sequence Models for Dynamical Systems
LLMs for Page Stream Segmentation
The Nature of LUCA and Early Earth System
Invalid SMILES Benefit Chemical Language Models: A Study
Synthetic Isomer Data Generation Pipeline
Modern PyTorch VAEs: A Detailed Implementation Guide
Sarcasm Detection with Transformers: A Cautionary Tale
Hearing Molecular Shape via Coulomb Matrix Eigenvalues
Classifying Congressional Bills with Machine Learning
Coulomb Matrices for Molecular Machine Learning
How Does Congress Actually Work? Data from 15K Bills
Kabsch Algorithm: NumPy, PyTorch, TensorFlow, and JAX
LAMMPS Tutorial: Copper and Platinum Adatom Diffusion
Automated Adatom Diffusion Workflow
Generating Mini-Protein Trajectories with GROMACS
Mini-Protein Trajectory Generation
Congressional Knowledge Graph & Policy Classification
SELFIES and the Future of Molecular String Representations
IQCRNN: Certified Stability for Neural Networks
Analytical Solution to Word2Vec Softmax & Bias Probing
EigenNoise: Data-Free Word Vector Initialization
Look, Don’t Tweet: Unified Data Models for Social NLP
PyConversations: Social Media Conversational Analysis
GPT-2 Susceptibility to Universal Adversarial Triggers
5 Axes of Multi-Arm Bandit Problems: A Practical Guide
NewsTweet Dataset: Social Media in Digital Journalism
Coordinated Social Targeting on Twitter
Data-Driven WordNet Construction from Wiktionary
A Guide to Neuroevolution: NEAT and HyperNEAT
Breaking Down Machine Learning for the Average Person
Foundations of AI: Knowledge-Based Agents and Logic
Cartesian Genetic Programming in Julia
QuAC: Question Answering in Context Dataset
CoQA Dataset: Advancing Conversational Question Answering
Understanding GANs: From Fundamentals to Objective Functions
Word Embeddings in NLP: An Introduction
Rubik’s Cube Sonification