Time Series Forecasting
Forecasting comparison of different neural architectures on the Multiscale Lorenz-96 system

Optimizing Sequence Models for Dynamical Systems

We systematically ablate core mechanisms of Transformers and RNNs, finding that attention-augmented Recurrent Highway Networks outperform standard Transformers on forecasting high-dimensional chaotic systems.

Computational Chemistry
Density plot showing training vs generated physicochemical property distribution

Molecular Sets (MOSES): A Generative Modeling Benchmark

MOSES introduces a comprehensive benchmarking platform for molecular generative models, offering standardized datasets, evaluation metrics, and baselines. By providing a unified measuring stick, it aims to resolve reproducibility challenges in chemical distribution learning.

Document Processing
Chart showing the trade-off between accuracy and throughput in document automation

The Reliability Trap: The Limits of 99% Accuracy

We explore the ‘Silent Failure’ mode of LLMs in production: the limits of 99% accuracy for reliability, how confidence decays in long documents, and why standard calibration techniques struggle to fix it.

Computational Chemistry

String Representations for Chemical Image Recognition

This methodological study isolates the impact of chemical string representations on image-to-text translation models. It finds that while SMILES offers the highest overall accuracy, SELFIES provides a guarantee of structural validity, offering a trade-off for OCSR tasks.

Computational Chemistry

Imago: Structure Recognition at TREC-CHEM 2011

Imago is an open-source, cross-platform C++ toolkit designed to recognize 2D chemical structure images from scientific papers and convert them into machine-readable molecule formats using a rule-based pipeline.

Planetary Science
Conceptual cross-section of the Cloud Continent proposal showing three layers: the CO2 atmosphere below, the nitrogen-filled honeycomb structure at 50 km altitude, and the habitable atmosphere above

Terraforming Venus: The Cloud Continent Proposal

A speculative 2022 engineering proposal for terraforming Venus by constructing a nitrogen-filled honeycomb structure floating at 50 km altitude where temperature and pressure are Earth-like, avoiding the need to remove Venus’s massive atmosphere while using photosynthesis to convert CO2 into breathable air and structural materials.

Computational Chemistry
Optical chemical structure recognition example

MolRec: Chemical Structure Recognition at CLEF 2012

Performance evaluation of MolRec at the CLEF 2012 competition reveals a stark performance gap between simple (95%+ accuracy) and complex molecular structures (46-59% accuracy), providing systematic analysis of rule-based OCSR limitations including touching characters, stereochemistry recognition, and four-way junction failures.

Computational Chemistry
Optical chemical structure recognition example

MolRec: Rule-Based OCSR System

Details the MolRec system for converting chemical diagram images into MOL files using vectorization, geometric rules, and graph construction. Achieved 95% accuracy on 1000 TREC 2011 benchmark images with comprehensive failure analysis of limitations.

Scientific Computing
Velocity Autocorrelation Function showing the signature negative region characteristic of liquid dynamics and the cage effect discovered by Rahman

Modernizing Rahman''s 1964 Argon Simulation

A digital restoration of Rahman’s seminal 1964 molecular dynamics paper using LAMMPS and a production-grade Python analysis pipeline featuring intelligent decorator-based caching, fully vectorized NumPy computations for O(N^2) operations, and modern tooling (uv, type hints, Makefile automation) transforming academic scripts into reproducible research toolkit.

Natural Language Processing
Word vector illustration showing text classification and NLP concepts

Sarcasm Detection with Transformers: A Cautionary Tale

What happens when you achieve 99.8% accuracy on sarcasm detection? You might have accidentally built a domain classifier. A cautionary ML tale about dataset bias.

Computational Chemistry
3D ball-and-stick model of butane molecule showing linear carbon chain structure

Hearing Molecular Shape via Coulomb Matrix Eigenvalues

Can mathematical signatures capture molecular shape? We test whether Coulomb matrix eigenvalues can distinguish alkane constitutional isomers, from unsupervised clustering failures to supervised learning successes.

Computational Social Science
Top features for Armed Forces and National Security policy classification showing veterans, defense, military keywords

Classifying Congressional Bills with Machine Learning

We test three ML models on 48K congressional bills to see how well they can predict policy areas from bill text. Results show logistic regression achieves 89% F1 score.