Block-Recurrent Transformers for Long Sequences
Ewald Message Passing for Molecular Graphs
Lagrangian Neural Networks for Physics
Liquid-S4: Input-Dependent State-Space Models
RWKV: Linear-Cost RNN with Transformer Training
InChI: The International Chemical Identifier
MarkushGrapher-2: End-to-End Markush Recognition
Materials Representations for ML Review
NaViT: Native Resolution Vision Transformer
BioT5: Cross-Modal Integration of Biology and Chemistry
ChatDrug: Conversational Drug Editing with ChatGPT
ChemCrow: Augmenting LLMs with 18 Chemistry Tools
ChemGE: Molecule Generation via Grammatical Evolution
ChemLLM: A Chemical Large Language Model Framework
Coscientist: Autonomous Chemistry with LLM Agents
Data Transfer Approaches for Seq-to-Seq Retrosynthesis
DrugAssist: Interactive LLM Molecule Optimization
DrugChat: Conversational QA on Drug Molecule Graphs
DrugEx v2: Pareto Multi-Objective RL for Drug Design
Fine-Tuning GPT-3 for Predictive Chemistry Tasks
Galactica: A Curated Scientific LLM from Meta AI
Grammar VAE: Generating Valid Molecules via CFGs
LatentGAN: Latent-Space GAN for Molecular Generation
LlaSMol: Instruction-Tuned LLMs for Chemistry Tasks
LSTM Neural Network for Drug-Like Molecule Generation
Memory-Assisted RL for Diverse De Novo Mol. Design
MolecularRNN: Graph-Based Molecular Generation and RL
MolFM: Trimodal Molecular Foundation Pre-training
MoMu: Bridging Molecular Graphs and Natural Language
Neural Machine Translation for Reaction Prediction
ORGAN: Objective-Reinforced GANs for Molecule Design
PharmaGPT: Domain-Specific LLMs for Pharma and Chem
PharMolixFM: Multi-Modal All-Atom Molecular Models
ReactionT5: Pre-trained T5 for Reaction Prediction
REINVENT: Reinforcement Learning for Mol. Design
Transformers and LLMs for Chemistry Drug Discovery
DMP: Dual-View Molecule Pre-training (SMILES+GNN)
MAT: Graph-Augmented Transformer for Molecules (2020)
Maxsmi: SMILES Augmentation for Property Prediction
MG-BERT: Graph BERT for Molecular Property Prediction
Mol2vec: Unsupervised ML with Chemical Intuition
MTL-BERT: Multitask BERT for Property Prediction
AlphaDrug: MCTS-Guided Target-Specific Drug Design
Atom-in-SMILES: Better Tokens for Chemical Models
Augmented Hill-Climb for RL-Based Molecule Design
Avoiding Failure Modes in Goal-Directed Generation
BindGPT: GPT for 3D Molecular Design and Docking
CDDD: Learning Descriptors by Translating SMILES
Chemical Language Models for De Novo Drug Design Review
ChemLLMBench: Benchmarking LLMs on Chemistry Tasks
CogMol: Controlled Molecule Generation for COVID-19
Curriculum Learning for De Novo Drug Design (REINVENT)
DeepSMILES: Adapting SMILES Syntax for Machine Learning
DrugEx v3: Scaffold-Constrained Graph Transformer
Evolutionary Molecular Design via Deep Learning + GA
Fine-Tuning GPT-3 for Molecular Property Prediction
Foundation Models in Chemistry: A 2025 Perspective
Generative AI Survey for De Novo Molecule and Protein Design
Group SELFIES: Fragment-Based Molecular Strings
Inverse Molecular Design with ML Generative Models
Lingo3DMol: Language Model for 3D Molecule Design
Link-INVENT: RL-Driven Molecular Linker Generation
LLM-Prop: Predicting Crystal Properties from Text
LLM4Mol: ChatGPT Captions as Molecular Representations
LMs Generate 3D Molecules from XYZ, CIF, PDB Files
MaCBench: Multimodal Chemistry and Materials Benchmark
MolBERT: Auxiliary Tasks for Molecular BERT Models
MolPMoFiT: Inductive Transfer Learning for QSAR
nach0: A Multimodal Chemical and NLP Foundation Model
Neural Machine Translation of Chemical Nomenclature
NLP Models That Automate Programming for Chemistry
PASITHEA: Gradient-Based Molecular Design via Dreaming
PrefixMol: Prefix Embeddings for Drug Molecule Design
Protein-to-Drug Molecule Translation via Transformer
Randomized SMILES Improve Molecular Generative Models
Re-evaluating Sample Efficiency in Molecule Generation
REINVENT 4: Open-Source Generative Molecule Design
Review: Deep Learning for Molecular Design (2019)
RNNs vs Transformers for Molecular Generation Tasks
S4 Structured State Space Models for De Novo Drug Design
Seq2seq Fingerprint: Unsupervised Molecular Embedding
SMI-TED: Encoder-Decoder Foundation Models for Chemistry
SMI+AIS: Hybridizing SMILES with Environment Tokens
SMILES Transformer: Low-Data Molecular Fingerprints
SMILES vs SELFIES Tokenization for Chemical LMs
SMILES-BERT: BERT-Style Pre-Training for Molecules
SMILES2Vec: Interpretable Chemical Property Prediction
Smirk: Complete Tokenization for Molecular Models
SPE: Data-Driven SMILES Substructure Tokenization
SPMM: A Bidirectional Molecular Foundation Model
Survey of Scientific LLMs in Bio and Chem Domains
Survey of Transformer Architectures in Molecular Science
Systematic Review of Deep Learning CLMs (2020-2024)
t-SMILES: Tree-Based Fragment Molecular Encoding
Transformer CLMs for SMILES: Literature Review 2024
Transformer Name-to-SMILES with Atom Count Losses
Transformer-CNN: SMILES Embeddings for QSAR Modeling
Transformers for Molecular Property Prediction Review
VAE for Automatic Chemical Design (2018 Seminal)
X-MOL: Pre-training on 1.1B Molecules for SMILES
AMORE: Testing ChemLLM Robustness to SMILES Variants
Back Translation for Semi-Supervised Molecule Generation
Benchmarking Chemistry Knowledge in Code-Gen LLMs
Benchmarking LLMs for Molecular Property Prediction
Benchmarking Molecular Property Prediction at Scale
ChemBench: Evaluating LLM Chemistry Against Experts
ChemEval: Fine-Grained LLM Evaluation for Chemistry
ChemSafetyBench: Benchmarking LLM Safety in Chemistry
DOCKSTRING: Docking-Based Benchmarks for Drug Design
Failure Modes in Molecule Generation & Optimization
Frechet ChemNet Distance for Molecular Generation
Graph-Based GA and MCTS Generative Model for Molecules
GuacaMol: Benchmarking Models for De Novo Molecular Design
MoleculeNet: Benchmarking Molecular Machine Learning
MolGenBench: Benchmarking Molecular Generative Models
MolScore: Scoring and Benchmarking for Drug Design
Perplexity for Molecule Ranking and CLM Bias Detection
PMO: Benchmarking Sample-Efficient Molecular Design
Review of Molecular Representation Learning Models
SPECTRA: Evaluating Generalizability of Molecular AI
STONED: Training-Free Molecular Design with SELFIES
TamGen: GPT-Based Target-Aware Drug Design and Generation
Neural Scaling of Deep Chemical Models
ROGI-XD: Roughness of Pretrained Molecular Representations
Genetic Algorithms as Baselines for Molecule Generation
MolGenSurvey: Systematic Survey of ML for Molecule Design
SMINA Docking Benchmark for De Novo Drug Design Models
Tartarus: Realistic Inverse Molecular Design Benchmarks
Tied Two-Way Transformers for Diverse Retrosynthesis
BARTSmiles: BART Pre-Training for Molecular SMILES
Language Models Learn Complex Molecular Distributions
LIMO: Latent Inceptionism for Targeted Molecule Generation
Regression Transformer: Prediction Meets Generation
RetMol: Retrieval-Based Controllable Molecule Generation
UnCorrupt SMILES: Post Hoc Correction for De Novo Design
Kabsch-Horn Cookbook: Differentiable Alignment
MolGen: Molecular Generation with Chemical Feedback
Molecular Transformer: Calibrated Reaction Prediction
Arun et al.: SVD-Based Least-Squares Fitting of 3D Points
Exposing Limitations of Molecular ML with Activity Cliffs
Horn et al.: Absolute Orientation Using Orthonormal Matrices
MoLFormer: Large-Scale Chemical Language Representations
SELFormer: A SELFIES-Based Molecular Language Model
Umeyama’s Method: Corrected SVD for Point Alignment
AdaptMol: Domain Adaptation for Molecular OCSR (2026)
Consistency Models: Fast One-Step Diffusion Generation
D3PM: Discrete Denoising Diffusion Probabilistic Models
GraphReco: Probabilistic Structure Recognition (2026)
GraSP: Graph Recognition via Subgraph Prediction (2026)
Horn’s Method: Absolute Orientation via Unit Quaternions
Kabsch Algorithm: Optimal Rotation for Point Set Alignment
Latent Diffusion Models for High-Res Image Synthesis
Uni-Parser: Industrial-Grade Multi-Modal PDF Parsing (2025)
Can Recurrent Neural Networks Warp Time? (ICLR 2018)
GTR-CoT: Graph Traversal Chain-of-Thought for Molecules
OCSU: Optical Chemical Structure Understanding (2025)
Relational Inductive Biases in Deep Learning (2018)
Scaling Laws vs Model Architectures: Inductive Bias
SE(3)-Transformers: Equivariant Attention for 3D Data
Spherical CNNs: Rotation-Equivariant Networks on the Sphere
The Quarks of Attention: Building Blocks of Attention
The Nature of LUCA and Its Impact on the Early Earth System
Molecular Sets (MOSES): A Generative Modeling Benchmark
The Reliability Trap: The Limits of 99% Accuracy
The Evolution of Page Stream Segmentation: Rules to LLMs
GutenOCR: A Grounded Vision-Language Front-End for Documents
PubMed-OCR: PMC Open Access OCR Annotations
ChemBERTa-3: Open Source Chemical Foundation Models
ChemDFM-R: Chemical Reasoning LLM with Atomized Knowledge
ChemBERTa-2: Scaling Molecular Transformers to 77M
GP-MoLFormer: Molecular Generation via Transformers
ChemBERTa: Molecular Property Prediction via Transformers
Chemformer: A Pre-trained Transformer for Comp Chem
A Convexity Principle for Interacting Gases (McCann 1997)
Building Normalizing Flows with Stochastic Interpolants
Flow Matching for Generative Modeling: Scalable CNFs
Neural ODEs: Continuous-Depth Deep Learning Models
Rectified Flow: Learning to Generate and Transfer Data
Score Matching and Denoising Autoencoders: A Connection
Score-Based Generative Modeling with SDEs (Song 2021)
ChemDFM-X: Multimodal Foundation Model for Chemistry
DynamicFlow: Integrating Protein Dynamics into Drug Design
Image-to-Sequence OCSR: A Comparative Analysis
InstructMol: Multi-Modal Molecular LLM for Drug Discovery
InvMSAFold: Generative Inverse Folding with Potts Models
MERMaid: Multimodal Chemical Reaction Mining from PDFs
MOFFlow: Flow Matching for MOF Structure Prediction
Multimodal Search in Chemical Documents and Reactions
OCSAug: Diffusion-Based Augmentation for Hand-Drawn OCSR
STOUT V2.0: Transformer-Based SMILES to IUPAC Translation
STOUT: SMILES to IUPAC Names via Neural Machine Translation
Struct2IUPAC: Translating SMILES to IUPAC via Transformers
Translating InChI to IUPAC Names with Transformers
AtomLenz: Atom-Level OCSR with Limited Supervision
Benchmarking Eight OCSR Tools on Patent Images (2024)
ChemReco: Hand-Drawn Chemical Structure Recognition
ChemVLM: A Multimodal Large Language Model for Chemistry
DECIMER.ai: Optical Chemical Structure Recognition
Dual-Path Global Awareness Transformer (DGAT) for OCSR
Enhanced DECIMER for Hand-Drawn Structure Recognition
Image2InChI: SwinTransformer for Molecular Recognition
MarkushGrapher: Multi-modal Markush Structure Recognition
MMSSC-Net: Multi-Stage Sequence Cognitive Networks
MolGrapher: Graph-based Chemical Structure Recognition
MolMole: Unified Vision Pipeline for Molecule Mining
MolScribe: Robust Image-to-Graph Molecular Recognition
MolSight: OCSR with RL and Multi-Granularity Learning
ABC-Net: Keypoint-Based Molecular Image Recognition
ChemPix: Hand-Drawn Hydrocarbon Structure Recognition
DECIMER 1.0: Transformers for Chemical Image Recognition
End-to-End Transformer for Molecular Image Captioning
Handwritten Chemical Structure Recognition with RCGD
ICMDT: Automated Chemical Structure Image Recognition
Image-to-Graph Transformers for Chemical Structures
Image2SMILES: Transformer OCSR with Synthetic Data Pipeline
MICER: Molecular Image Captioning with Transfer Learning
MolMiner: Deep Learning OCSR with YOLOv5 Detection
One Strike, You’re Out: Detecting Markush Structures
Review of OCSR Techniques and Models (Musazade 2022)
String Representations for Chemical Image Recognition
SwinOCSR: End-to-End Chemical OCR with Swin Transformers
A Review of Optical Chemical Structure Recognition Tools
ChemGrapher: Deep Learning for Chemical Graph OCSR
DECIMER: Deep Learning for Chemical Image Recognition
Deep Learning for Molecular Structure Extraction (2019)
Handwritten Chemical Ring Recognition with Neural Networks
Handwritten Chemical Symbol Recognition Using SVMs
HMM-based Online Recognition of Chemical Symbols
Img2Mol: Accurate SMILES Recognition from Depictions
On-line Handwritten Chemical Expression Recognition
Online Handwritten Chemical Formula Structure Analysis
Recognition of On-line Handwritten Chemical Expressions
SVM-HMM Online Classifier for Chemical Symbols
Unified Framework for Handwritten Chemical Expressions
Chemical Structure Reconstruction with chemoCR (2011)
ChemReader Image-to-Structure OCR at TREC 2011 Chemical IR
CLEF-IP 2012: Patent and Chemical Structure Benchmark
MolRec at CLEF 2012: Rule-Based Structure Recognition
OSRA at CLEF-IP 2012: Native TIFF Processing for Patents
Overview of the TREC 2011 Chemical IR Track Benchmark
Probabilistic OCSR with Markov Logic Networks
Research on Chemical Expression Images Recognition
Chemical Structure Recognition (Rule-Based)
ChemInk: Real-Time Recognition for Chemical Drawings
CLiDE Pro: Optical Chemical Structure Recognition Tool
Imago: Open-Source Chemical Structure Recognition (2011)
Kekulé-1 System for Chemical Structure Recognition
OSRA at TREC-CHEM 2011: Optical Structure Recognition
Structural Analysis of Handwritten Chemical Formulas
A Spatial Model for Legislative Roll Call Analysis
Automatic Recognition of Chemical Images
Chaotic Evolution of the Solar System (Sussman 1992)
Chemical Literature Data Extraction: The CLiDE Project
ChemReader: Automated Structure Extraction
Distributed Representations: A Foundational Theory
Drive to Life on Wet and Icy Worlds: Alkaline Vent Theory
Dynamical Corrections to TST for Surface Diffusion
Embedded-Atom Method User Guide: Voter’s 1994 Chapter
Embedded-Atom Method: Theory and Applications Review
Evans 1986: Thermal Conductivity of Lennard-Jones Fluid
Funnels, Pathways, and Energy Landscapes of Protein Folding
Graph Perception for Chemical Structure OCR
Hand-Drawn Chemical Diagram Recognition (AAAI 2007)
IMG2SMI: Translating Molecular Structure Images to SMILES
In Situ XRD of Oxidation-Reduction Oscillations on Pt/SiO2
Kekulé: OCR-Optical Chemical Recognition
Kinetic Oscillations in CO Oxidation on Pt(100): Theory
MD Simulation of Self-Diffusion on Metal Surfaces (1994)
Mixture Density Networks: Modeling Multimodal Distributions
OCSR Methods: A Taxonomy of Approaches
Optical Recognition of Chemical Graphics
Oscillatory CO Oxidation on Pt(110): Temporal Modeling
OSRA: Open Source Optical Structure Recognition
Party Matters: Enhancing Legislative Vote Embeddings
Reconstruction of Chemical Molecules from Images
Second-Order Langevin Equation for Field Simulations
Stillinger-Weber Potential for Silicon Simulation
Tea Party in the House: Legislative Ideology via HIPTM
Three Domains of Life: Woese’s Phylogenetic Revolution
Adatom Dimer Diffusion on fcc(111) Crystal Surfaces
AI & Physical Sciences Taxonomy: A Seven-Vector Framework
Correlations in the Motion of Atoms in Liquid Argon
Terraforming Venus With the Cloud Continent Proposal
Venus Evolution Through Time: Key Questions and Missions
Life on Venus? Astrobiology and the Habitability Limits
Invalid SMILES Benefit Chemical Language Models: A Study
SELFIES and the Future of Molecular String Representations
Molecular String Renderer: Robust Visualization Tool
Auto-Encoding Variational Bayes: VAE Paper Summary
Importance Weighted Autoencoders (IWAE) for Tighter Bounds
Importance Weighted Autoencoders: Beyond the Standard VAE
InChI and Tautomerism: Toward Comprehensive Treatment
InChI: The Worldwide Chemical Structure Identifier Standard
Making InChI FAIR and Sustainable for Inorganic Chemistry
Mixfile & MInChI: Machine-Readable Mixture Formats
NInChI: Toward a Chemical Identifier for Nanomaterials
Recent Advances in the SELFIES Library: 2023 Update
RInChI: The Reaction International Chemical Identifier
SELFIES: The Original Paper on Robust Molecular Strings
SMILES Notation: The Original Paper by Weininger (1988)
MolRec: Chemical Structure Recognition at CLEF 2012
MolRec: Rule-Based OCSR System at TREC 2011 Benchmark
What is Optical Chemical Structure Recognition (OCSR)?
αExtractor: Chemical Info from Biomedical Literature
ChemInfty: Chemical Structure Recognition in Patent Images
MolNexTR: A Dual-Stream Molecular Image Recognition
MolParser-7M & WildMol: Large-Scale OCSR Datasets
MolParser: End-to-End Molecular Structure Recognition
ZINC-22: A Multi-Billion Scale Database for Ligand Discovery
Converting SMILES and SELFIES to 2D Molecular Images
SELFIES: A Robust Molecular String Representation
Communication in the Presence of Noise: Shannon’s 1949 Paper
How to Fold Graciously: Levinthal’s Paradox (1969)
MARCEL: Molecular Conformer Ensemble Learning Benchmark
SMILES: A Compact Notation for Chemical Structures
The Müller-Brown Potential: A 2D Benchmark Surface
The Number of Isomeric Hydrocarbons of the Methane Series
The Surface of Venus: Stratigraphy and Resurfacing History
GEOM: Energy-Annotated Molecular Conformations Dataset
Exponential Random Numbers: Two Classic Algorithms
GDB-11: Chemical Universe Database (26.4M Molecules)
Implementing the Müller-Brown Potential in PyTorch
Müller-Brown Potential: A PyTorch ML Testbed
DenoiseVAE: Adaptive Noise for Molecular Pre-training
Beyond Atoms: 3D Space Modeling for Molecular Pretraining
Dark Side of Forces: Non-Conservative ML Force Models
Efficient DFT Hamiltonian Prediction via Adaptive Sparsity
eSEN: Smooth Interatomic Potentials (ICML Spotlight)
Modernizing Rahman’’s 1964 Argon Simulation
Modernizing Rahman’s 1964 Argon Simulation
Embedded-Atom Method: Impurities and Defects in Metals
Umbrella Sampling: Monte Carlo Free-Energy Estimation
Contrastive Learning for Variational Autoencoder Priors
Lennard-Jones on Adsorption and Diffusion on Surfaces
GDB-13: Chemical Universe Database (970M Molecules)
GDB-17: Chemical Universe Database (166.4B Molecules)
High-Performance Word2Vec in Pure PyTorch
GEOM Dataset: 3D Molecular Conformer Generation
SubGrapher: Visual Fingerprinting of Chemical Structures
3D Steerable CNNs: Rotationally Equivariant Features
LLMs for Insurance Document Automation
RFL: Simplifying Chemical Structure Recognition (AAAI 2025)
Optimizing Sequence Models for Dynamical Systems
LLMs for Page Stream Segmentation
Synthetic Isomer Data Generation Pipeline
Modern PyTorch VAEs: A Detailed Implementation Guide
Sarcasm Detection with Transformers: A Cautionary Tale
Hearing Molecular Shape via Coulomb Matrix Eigenvalues
Classifying Congressional Bills with Machine Learning
Coulomb Matrices for Molecular Machine Learning
How Does Congress Actually Work? Data from 15K Bills
Kabsch Algorithm: NumPy, PyTorch, TensorFlow, and JAX
LAMMPS Tutorial: Copper and Platinum Adatom Diffusion
Automated Adatom Diffusion Workflow
Generating Mini-Protein Trajectories with GROMACS
Mini-Protein Trajectory Generation
Congressional Knowledge Graph & Policy Classification
IQCRNN: Certified Stability for Neural Networks
Analytical Solution to Word2Vec Softmax & Bias Probing
EigenNoise: Data-Free Word Vector Initialization
Look, Don’t Tweet: Unified Data Models for Social NLP
PyConversations: Social Media Conversational Analysis
GPT-2 Susceptibility to Universal Adversarial Triggers
5 Axes of Multi-Arm Bandit Problems: A Practical Guide
NewsTweet Dataset: Social Media in Digital Journalism
Coordinated Social Targeting on Twitter
Data-Driven WordNet Construction from Wiktionary
A Guide to Neuroevolution: NEAT and HyperNEAT
Breaking Down Machine Learning for the Average Person
Foundations of AI: Knowledge-Based Agents and Logic
Cartesian Genetic Programming in Julia
QuAC: Question Answering in Context Dataset
CoQA Dataset: Advancing Conversational Question Answering
Understanding GANs: From Fundamentals to Objective Functions
Word Embeddings in NLP: An Introduction
Rubik’s Cube Sonification