
GutenOCR: A Grounded Vision-Language Front-End for Documents
GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

Optimizing Sequence Models for Dynamical Systems
We systematically ablate core mechanisms of Transformers and RNNs, finding that attention-augmented Recurrent Highway Networks outperform standard Transformers on forecasting high-dimensional chaotic systems.

Ewald Message Passing for Molecular Graphs
Proposes Ewald message passing, a Fourier-space scheme inspired by Ewald summation that captures long-range interactions in molecular graphs. The method is architecture-agnostic and improves energy MAEs by 10% on OC20 and 16% on OE62 across four baseline GNN models.

RWKV: Linear-Cost RNN with Transformer Training
RWKV is a novel sequence model that achieves transformer-level performance while maintaining linear time and constant memory complexity during inference, scaled up to 14 billion parameters.

InChI: The International Chemical Identifier
InChI (International Chemical Identifier) is an open standard from IUPAC that represents molecular structures as hierarchical, layered strings optimized for database interoperability, unique identification, and web search via its hashed InChIKey.

MarkushGrapher-2: End-to-End Markush Recognition
An 831M-parameter encoder-decoder model that jointly encodes image, OCR text, and layout information through a two-stage training strategy, achieving state-of-the-art multimodal Markush structure recognition while remaining competitive on standard molecular structure recognition.

Materials Representations for ML Review
A comprehensive review of how solid-state materials can be numerically represented for machine learning, spanning structural features, graph neural networks, compositional descriptors, transfer learning, and generative models for inverse design.

NaViT: Native Resolution Vision Transformer
NaViT applies sequence packing (Patch n’ Pack) to Vision Transformers, enabling training on images of arbitrary resolution and aspect ratio while improving training efficiency by up to 4x over standard ViT.

BioT5: Cross-Modal Integration of Biology and Chemistry
BioT5 uses SELFIES representations and separate tokenization to pre-train a unified T5 model across molecules, proteins, and text, achieving state-of-the-art results on 10 of 15 downstream tasks.

ChatDrug: Conversational Drug Editing with ChatGPT
ChatDrug is a parameter-free framework that combines ChatGPT with retrieval-augmented domain feedback and iterative conversation to edit drugs across small molecules, peptides, and proteins.

ChemCrow: Augmenting LLMs with 18 Chemistry Tools
ChemCrow augments GPT-4 with 18 chemistry tools to autonomously plan and execute syntheses, discover novel chromophores, and solve diverse chemical reasoning tasks.

ChemGE: Molecule Generation via Grammatical Evolution
ChemGE uses grammatical evolution over SMILES context-free grammars to generate diverse drug-like molecules in parallel, outperforming deep learning baselines in throughput and molecular diversity.