Method Papers: New Algorithms, Architectures, and Mechanisms

Visualization of the Stillinger-Weber potential showing the two-body radial term and three-body angular penalty

Stillinger-Weber Potential for Silicon Simulation

Stillinger and Weber propose a 3-body interaction potential that stabilizes the diamond crystal structure of silicon and reproduces liquid properties through molecular dynamics, addressing the inability of standard pair potentials to model tetrahedral semiconductors.

Computational Social Science

Hierarchical Ideal Point Topic Model visualization showing political polarization

Tea Party in the House: Legislative Ideology via HIPTM

This paper introduces the Hierarchical Ideal Point Topic Model (HIPTM) to analyze the 112th U.S. Congress. By jointly modeling votes and text, it uncovers how Tea Party Republicans and establishment Republicans differ in both voting records and how they frame specific policy issues.

Molecular Simulation

Delayed convolution approximation for distinct Van Hove function showing comparison between simulated data and theoretical model

Correlations in the Motion of Atoms in Liquid Argon

This work validated classical Molecular Dynamics for simulating liquids, revealing the ‘cage effect’ in velocity autocorrelation and establishing predictor-corrector integration algorithms for N-body problems.

Generative Modeling

Diagram comparing standard stochastic sampling (gradient blocked) vs the reparameterization trick (gradient flows)

Auto-Encoding Variational Bayes: VAE Paper Summary

Kingma and Welling’s 2013 paper introducing Variational Autoencoders and the reparameterization trick, enabling end-to-end gradient-based training of generative models with continuous latent variables by moving the stochasticity outside the computational graph so that gradients can flow through a deterministic path.

Generative Modeling

Flowchart comparing VAE and IWAE computation showing the key difference in where averaging occurs relative to the log operation

Importance Weighted Autoencoders (IWAE) for Tighter Bounds

Burda et al.’s ICLR 2016 paper introducing Importance Weighted Autoencoders, which use importance sampling to derive a strictly tighter log-likelihood lower bound than standard VAEs, addressing posterior collapse and improving generative quality. The model architecture remains the same.

Molecular Representations

SELFIES molecular representation overview

SELFIES: The Original Paper on Robust Molecular Strings

The 2020 paper that introduced SELFIES: Mario Krenn and colleagues created a molecular representation that solves SMILES validity problems. It guarantees every generated string corresponds to a valid chemical structure.

Molecular Representations

SMILES Notation: The Original Paper by Weininger (1988)

David Weininger introduced SMILES notation in 1988, establishing encoding rules for representing chemical structures as compact, human-readable strings.

Optical Chemical Structure Recognition

MolRec: Rule-Based OCSR System at TREC 2011 Benchmark

Details the MolRec system for converting chemical diagram images into MOL files using vectorization, geometric rules, and graph construction. Achieved 95% accuracy on 1000 TREC 2011 benchmark images with comprehensive failure analysis of limitations.

Optical Chemical Structure Recognition

αExtractor extracts structured chemical information from biomedical literature

αExtractor: Chemical Info from Biomedical Literature

A 2024 deep learning system for optical chemical structure recognition designed specifically for biomedical literature mining, using ResNet-Transformer architecture to handle challenging conditions including low-resolution images, noise, distortions, and even hand-drawn molecular diagrams from scientific documents.

Optical Chemical Structure Recognition

Segment-based chemical structure recognition pipeline for low-quality patent images with touching characters and broken lines

ChemInfty: Chemical Structure Recognition in Patent Images

A 2011 rule-based OCSR system designed specifically for the challenging low-quality images in Japanese patent applications, using segment-based methods to handle pervasive problems like touching characters, merged atom labels with bonds, and broken lines.

Optical Chemical Structure Recognition

Diagram showing MolNexTR's dual-stream architecture: a molecular image feeds into parallel ConvNext and Vision Transformer encoders, producing a SMILES string.

MolNexTR: A Dual-Stream Molecular Image Recognition

MolNexTR proposes a dual-stream architecture combining ConvNext and Vision Transformers to improve molecular image recognition (OCSR). It achieves 81-97% accuracy across diverse benchmarks utilizing simultaneous local and global feature extraction alongside specialized image contamination augmentations.

Optical Chemical Structure Recognition

MolParser: End-to-End Molecular Structure Recognition

A 2025 end-to-end OCSR system addressing both technical and data challenges, introducing MolParser-7M (7M+ image-text pairs) and MolDet (YOLO-based detector) for extracting and recognizing molecular structures from real-world documents with diverse quality and styles.