
GutenOCR: A Grounded Vision-Language Front-End for Documents
GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

Optimizing Sequence Models for Dynamical Systems
We systematically ablate core mechanisms of Transformers and RNNs, finding that attention-augmented Recurrent Highway Networks outperform standard Transformers on forecasting high-dimensional chaotic systems.

Umeyama's Method: Corrected SVD for Point Alignment
Corrects a flaw in prior SVD-based alignment methods (Arun et al., Horn et al.) that could produce reflections instead of rotations under noisy data, and provides a complete closed-form solution for similarity transformations in arbitrary dimensions.

AdaptMol: Domain Adaptation for Molecular OCSR (2026)
AdaptMol combines an end-to-end graph reconstruction model with unsupervised domain adaptation via class-conditional MMD on bond features and SMILES-validated self-training. Achieves 82.6% accuracy on hand-drawn molecules (10.7 points above prior best) while maintaining state-of-the-art results on four literature benchmarks, using only 4,080 real hand-drawn images for adaptation.

Consistency Models: Fast One-Step Diffusion Generation
This paper introduces consistency models, a new family of generative models that map any point on a Probability Flow ODE trajectory to its origin. They support fast one-step generation by design, while allowing multi-step sampling for improved quality and zero-shot editing tasks like inpainting and colorization.

D3PM: Discrete Denoising Diffusion Probabilistic Models
This paper introduces Discrete Denoising Diffusion Probabilistic Models (D3PMs), which generalize diffusion to discrete state-spaces using structured Markov transition matrices. D3PMs include uniform, absorbing-state, and discretized Gaussian corruption processes, drawing a connection between diffusion and masked language models.

GraphReco: Probabilistic Structure Recognition (2026)
GraphReco presents a rule-based OCSR system with two key innovations: a Fragment Merging line detection algorithm for precise bond identification and a Markov network for probabilistic resolution of atom/bond ambiguity during graph assembly. Achieves 94.2% accuracy on USPTO-10K, outperforming both traditional rule-based and some ML-based methods.

GraSP: Graph Recognition via Subgraph Prediction (2026)
GraSP introduces a general framework for recognizing graphs in images by framing it as sequential subgraph prediction with a binary classifier. A GNN conditions a CNN via FiLM layers to predict whether a candidate graph is a subgraph of the target. Applied to OCSR on QM9, GraSP achieves 67.5% accuracy with no domain-specific modifications.

Horn's Method: Absolute Orientation via Unit Quaternions
Derives the optimal rotation between two 3D point sets as the eigenvector of a 4x4 symmetric matrix built from cross-covariance sums, using unit quaternions to enforce the orthogonality constraint.

Kabsch Algorithm: Optimal Rotation for Point Set Alignment
A foundational 1976 short communication presenting a direct, non-iterative method for finding the best rotation matrix between two point sets via eigendecomposition of a cross-covariance matrix.

Latent Diffusion Models for High-Res Image Synthesis
This paper introduces Latent Diffusion Models (LDMs), which apply denoising diffusion in the latent space of pretrained autoencoders. By separating perceptual compression from generative learning and adding cross-attention conditioning, LDMs achieve FID 1.50 on Places inpainting and FID 3.60 on ImageNet class-conditional synthesis, with competitive text-to-image generation, at a fraction of the compute cost of pixel-space diffusion.

Uni-Parser: Industrial-Grade Multi-Modal PDF Parsing (2025)
Technical report on Uni-Parser, an industrial-grade document parsing engine that uses a modular multi-expert architecture to parse scientific PDFs into structured representations. Integrates MolParser 1.5 for OCSR, achieving 88.6% accuracy on chemical structures while processing up to 20 pages per second.