Computational Chemistry
Benzene in SELFIES notation

Recent Advances in the SELFIES Library: 2023 Update

A 2023 software update paper documenting improvements to the SELFIES Python library (v2.1.1), including a streamlined context-free grammar, expanded support for aromatic systems and stereochemistry, customizable semantic constraints, ML utility functions, and performance benchmarks on 300K+ molecules.

Computational Chemistry
SELFIES molecular representation overview

SELFIES: The Original Paper on Robust Molecular Strings

The 2020 paper that introduced SELFIES: Mario Krenn and colleagues created a molecular representation that solves SMILES validity problems. It guarantees every generated string corresponds to a valid chemical structure.

Computational Chemistry
The transformation from a 2D chemical structure image to a SMILES representation

What is Optical Chemical Structure Recognition (OCSR)?

Discover how OCSR technology bridges the gap between molecular images and machine-readable data, evolving from rule-based systems to modern deep learning models for chemical knowledge extraction.

Computational Chemistry
αExtractor extracts structured chemical information from biomedical literature

αExtractor: Chemical Info from Biomedical Literature

A 2024 deep learning system for optical chemical structure recognition designed specifically for biomedical literature mining, using ResNet-Transformer architecture to handle challenging conditions including low-resolution images, noise, distortions, and even hand-drawn molecular diagrams from scientific documents.

Computational Chemistry
A colored molecule with annotations, representing the diverse drawing styles found in scientific papers that OCSR models must handle.

MolParser-7M & WildMol: Large-Scale OCSR Datasets

The MolParser project introduces two key datasets: MolParser-7M, the largest training dataset for Optical Chemical Structure Recognition (OCSR) with 7.7M pairs of images and E-SMILES strings, and WildMol, a new 20k-sample benchmark for evaluating models on challenging real-world data. The training data uniquely combines millions of diverse synthetic molecules with 400,000 manually annotated in-the-wild samples.

Computational Chemistry
SELFIES representation of 2-Fluoroethenimine molecule

SELFIES: A Robust Molecular String Representation

SELFIES is a molecular string representation where every possible string decodes to a valid molecule, solving the invalid-output problem that limits SMILES in generative machine learning.

Machine Learning Fundamentals
Sphere packing illustration showing Shannon's geometric interpretation of channel capacity

Communication in the Presence of Noise: Shannon's 1949 Paper

Shannon’s foundational 1949 paper establishing the mathematical framework for modern information theory, defining channel capacity as the fundamental limit for reliable communication over noisy channels and introducing the sampling theorem (Nyquist-Shannon) that underpins all digital signal processing.

Computational Chemistry
Müller-Brown Potential Energy Surface showing the three minima and two saddle points

Implementing the Müller-Brown Potential in PyTorch

Step-by-step implementation of the classic Müller-Brown potential in PyTorch, with performance comparisons between analytical and automatic differentiation approaches for molecular dynamics and machine learning applications.

Computational Chemistry
Potential energy surface showing molecular conformation space with equilibrium and low energy conformations

DenoiseVAE: Adaptive Noise for Molecular Pre-training

ICLR 2025 paper introducing DenoiseVAE, which learns adaptive, atom-specific noise distributions through a VAE framework to improve denoising-based pre-training for molecular force field prediction, outperforming fixed Gaussian noise approaches on quantum chemistry benchmarks.

Computational Chemistry
Adaptive grid merging visualization for benzene molecule showing multi-resolution spatial discretization

Beyond Atoms: 3D Space Modeling for Molecular Pretraining

ICML 2025 paper introducing SpaceFormer, a Transformer architecture that challenges the atom-centric paradigm by modeling the continuous 3D space surrounding molecules using adaptive multi-resolution grids, ranking first in 10 of 15 molecular property prediction tasks.

Computational Chemistry
A mathematical representation of a potential energy surface (PES)

Dark Side of Forces: Non-Conservative ML Force Models

ICML 2025 analysis rigorously quantifying when non-conservative force models (which predict forces directly) fail in molecular dynamics, demonstrating simulation instabilities and proposing hybrid architectures that capture speed benefits without sacrificing physical correctness.

Computational Chemistry
Spherical harmonics visualization

Efficient DFT Hamiltonian Prediction via Adaptive Sparsity

ICML 2025 methodological paper introducing SPHNet, which uses adaptive network sparsification to overcome the computational bottleneck of tensor products in SE(3)-equivariant networks, achieving up to 7x speedup and 75% memory reduction on DFT Hamiltonian prediction tasks.