Document Processing
Chart showing the trade-off between accuracy and throughput in document automation

The Reliability Trap: The Limits of 99% Accuracy

We explore the ‘Silent Failure’ mode of LLMs in production: the limits of 99% accuracy for reliability, how confidence decays in long documents, and why standard calibration techniques struggle to fix it.

Document Processing
Conceptual diagram of page stream segmentation sorting pages into documents

The Evolution of Page Stream Segmentation: Rules to LLMs

We trace the history of Page Stream Segmentation (PSS) through three eras (Heuristic, Encoder, and Decoder) and explain how privacy-preserving, localized LLMs enable true semantic processing.

Generative Modeling
MNIST digit samples generated from a Variational Autoencoder latent space

Importance Weighted Autoencoders: Beyond the Standard VAE

Discover how Importance Weighted Autoencoders (IWAEs) use the same architecture as VAEs with a fundamentally more powerful objective to leverage multiple samples effectively.

Computational Chemistry
The transformation from a 2D chemical structure image to a SMILES representation

What is Optical Chemical Structure Recognition (OCSR)?

Discover how OCSR technology bridges the gap between molecular images and machine-readable data, evolving from rule-based systems to modern deep learning models for chemical knowledge extraction.

Computational Chemistry
Aspirin molecular structure generated from SMILES string

Converting SMILES and SELFIES to 2D Molecular Images

Build a robust Python CLI tool that converts both SMILES and SELFIES notation into publication-quality 2D molecular images, complete with formulas and legends.

Scientific Computing
Comparison of exponential sampling methods showing histograms from both inverse transform and von Neumann methods overlaid with the theoretical exponential distribution

Exponential Random Numbers: Two Classic Algorithms

Explore two fundamental approaches to generating exponentially distributed random numbers: the modern inverse transform method using logarithms and von Neumann’s ingenious 1951 comparison-based algorithm that avoids transcendental functions entirely.

Computational Chemistry
Müller-Brown Potential Energy Surface showing the three minima and two saddle points

Implementing the Müller-Brown Potential in PyTorch

Step-by-step implementation of the classic Müller-Brown potential in PyTorch, with performance comparisons between analytical and automatic differentiation approaches for molecular dynamics and machine learning applications.

Scientific Computing
Velocity Autocorrelation Function showing the signature negative region characteristic of liquid dynamics

Modernizing Rahman's 1964 Argon Simulation

I replicated Rahman’s landmark 1964 liquid argon molecular dynamics simulation using modern tools, building a production-grade Python analysis pipeline with intelligent caching, vectorization, and type safety to bridge vintage science with contemporary software engineering.

Computational Chemistry
Comparison of 2D molecular graph versus 3D conformer ensemble showing latanoprost molecule in multiple conformations

GEOM Dataset: 3D Molecular Conformer Generation

Get a practical overview of the GEOM dataset and learn how it’s advancing 3D molecular machine learning by bridging static graphs and dynamic reality.

Generative Modeling
Variational Autoencoder architecture diagram showing encoder, latent space, and decoder

Modern PyTorch VAEs: A Detailed Implementation Guide

A complete guide to implementing modern Variational Autoencoders in PyTorch. Includes a copy-pasteable implementation, explanation of KL annealing to fix posterior collapse, and a deep dive into stable standard deviation parameterizations.

Natural Language Processing
Word vector illustration showing text classification and NLP concepts

Sarcasm Detection with Transformers: A Cautionary Tale

What happens when you achieve 99.8% accuracy on sarcasm detection? You might have accidentally built a domain classifier. A cautionary ML tale about dataset bias.

Computational Chemistry
3D ball-and-stick model of butane molecule showing linear carbon chain structure

Hearing Molecular Shape via Coulomb Matrix Eigenvalues

Can mathematical signatures capture molecular shape? We test whether Coulomb matrix eigenvalues can distinguish alkane constitutional isomers, from unsupervised clustering failures to supervised learning successes.