
Optimizing Sequence Models for Dynamical Systems
Ablation study deconstructing sequence models. Attention-augmented Recurrent Highway Networks outperform Transformers on …

Ablation study deconstructing sequence models. Attention-augmented Recurrent Highway Networks outperform Transformers on …
Hinton's 1984 technical report establishing the theoretical efficiency of distributed representations over local …

Seminal 1994 paper introducing MDNs to model arbitrary conditional probability distributions using neural networks.

A method for improving legislative vote prediction across sessions by augmenting bill text embeddings with sponsor …

A hierarchical probabilistic model combining roll call votes, bill text, and legislative speeches to analyze political …

Summary of Kingma & Welling's foundational VAE paper introducing the reparameterization trick and variational …

The key difference between multi-sample VAEs and IWAEs: how log-of-averages creates a tighter bound on log-likelihood.

Summary of Burda, Grosse & Salakhutdinov's ICLR 2016 paper introducing Importance Weighted Autoencoders for tighter …

GTR-CoT uses graph traversal chain-of-thought reasoning to improve optical chemical structure recognition accuracy.

Novel OCSR method creating molecular fingerprints from images through functional group segmentation for database …

αExtractor uses ResNet-Transformer to extract chemical structures from literature images, including noisy and hand-drawn …

Two-stage CNN approach for converting molecular images to SMILES using CDDD embeddings and extensive data augmentation.