Generative-Models

Grid of complex molecular structures rendered from SELFIES and SMILES strings

Molecular String Renderer: Robust Visualization Tool

A fault-tolerant RDKit wrapper treating molecular visualization as a software engineering problem, implementing strategy pattern for SVG generation with automatic raster fallback, native SELFIES support for generative AI workflows, and strict type safety for reliable batch processing of millions of molecules in training pipelines.

Generative Modeling

Diagram comparing standard stochastic sampling (gradient blocked) vs the reparameterization trick (gradient flows)

Auto-Encoding Variational Bayes: VAE Paper Summary

Kingma and Welling’s foundational 2013 paper introducing Variational Autoencoders and the reparameterization trick, enabling end-to-end gradient-based training of generative models with continuous latent variables by making the sampling operation differentiable through a clever mathematical transformation.

Generative Modeling

MNIST digit samples generated from a Variational Autoencoder latent space

Importance Weighted Autoencoders: Beyond the Standard VAE

Discover how Importance Weighted Autoencoders (IWAEs) use the same architecture as VAEs with a fundamentally more powerful objective to leverage multiple samples effectively.

Generative Modeling

Flowchart comparing VAE and IWAE computation showing the key difference in where averaging occurs relative to the log operation

IWAE: Importance Weighted Autoencoders

Burda et al.’s ICLR 2016 paper introducing Importance Weighted Autoencoders, which use importance sampling to derive a strictly tighter log-likelihood lower bound than standard VAEs, addressing posterior collapse and improving generative quality. The model architecture remains the same.

Computational Chemistry

Recent Advances in the SELFIES Library (2023)

A 2023 software update paper documenting major improvements to the SELFIES Python library, including architectural redesign using directed molecular graphs for faster performance, expanded chemical feature support, semantic constraints for validity, and user-friendly customization APIs that transform SELFIES from proof-of-concept into production-ready tool.

Computational Chemistry

SELFIES molecular representation overview

SELFIES: The Original Paper (Krenn et al. 2020)

The 2020 paper that introduced SELFIES: how Mario Krenn and colleagues created a molecular representation that guarantees every generated string corresponds to a valid chemical structure.

Computational Chemistry

Aspirin molecular structure generated from SMILES string

Converting SMILES and SELFIES to 2D Molecular Images

Build a robust Python CLI tool that converts both SMILES and SELFIES notation into publication-quality 2D molecular images, complete with formulas and legends.

Computational Chemistry

SELFIES representation of 2-Fluoroethenimine molecule

SELFIES (Self-Referencing Embedded Strings)

An in-depth overview of SELFIES, the 100% robust molecular string representation designed to overcome SMILES limitations in machine learning, where every possible string (even random ones) decodes to a valid molecule through local operations, customizable valence rules, and graph-based internal representations.

Computational Chemistry

Potential energy surface showing molecular conformation space with equilibrium and low energy conformations

DenoiseVAE: Adaptive Noise for Molecular Pre-training

ICLR 2025 paper introducing DenoiseVAE, which learns adaptive, atom-specific noise distributions through a VAE framework to improve denoising-based pre-training for molecular force field prediction, outperforming fixed Gaussian noise approaches on quantum chemistry benchmarks.

Generative Modeling

Visualization of the VAE prior hole problem showing a ring-shaped aggregate posterior with an empty center where the Gaussian prior has highest density

Contrastive Learning for Variational Autoencoder Priors

A NeurIPS 2021 method paper introducing Noise Contrastive Priors to address the VAE ‘prior hole’ problem, where standard Gaussian priors assign high density to regions of latent space that don’t correspond to realistic data, using energy-based models trained with contrastive learning to match the aggregate posterior.

Computational Chemistry

Invalid SMILES Benefit Chemical Language Models: A Study

A provocative 2024 Nature Machine Intelligence paper challenging the assumption that invalid SMILES are failures, showing empirically that the ability to generate invalid outputs actually improves chemical language model performance by enabling quality filtering and providing richer training signals.

Generative Modeling

Variational Autoencoder architecture diagram showing encoder, latent space, and decoder

Modern PyTorch VAEs: A Detailed Implementation Guide

A complete guide to implementing modern Variational Autoencoders in PyTorch. Includes a copy-pasteable implementation, explanation of KL annealing to fix posterior collapse, and a deep dive into stable standard deviation parameterizations.