Computer-Vision

GutenOCR: A Grounded Vision-Language Front-End for Documents

GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

Molecular Representations

Two paired slope charts showing retrieval falling from synthetic to real depictions

Molecular Depiction Alignment: Contrastive vs Predictive

Two ways of aligning molecular depictions into a frozen chemistry model’s embedding space, contrastive and predictive, compared on one data and evaluation stack with a pre-registered prediction.

Molecular Representations

Two paired slope charts showing retrieval falling from synthetic to real depictions, with the ordering of the two arms inverting between the frozen and unfrozen regimes

What Surprised Me About Aligning Pictures of Molecules

I aligned a vision model into a frozen chemistry model’s embedding space two ways, contrastive and predictive, and held everything else identical. Four things surprised me, including a metric defect that produced exactly the result I had registered in advance.

Machine Learning

Three-panel diagram showing DGCNN point cloud processing: input space k-NN graph, EdgeConv operation, and semantic feature space clustering

DGCNN: Dynamic Graph CNN for Point Cloud Learning

DGCNN introduces the EdgeConv operator, which constructs k-nearest neighbor graphs dynamically in feature space at each network layer. This enables the model to capture both local geometry and long-range semantic relationships for point cloud classification and segmentation.

Optical Chemical Structure Recognition

Dual-encoder architecture diagram for MarkushGrapher-2 showing vision and VTL encoding pipelines

MarkushGrapher-2: End-to-End Markush Recognition

An 831M-parameter encoder-decoder model that jointly encodes image, OCR text, and layout information through a two-stage training strategy, achieving state-of-the-art multimodal Markush structure recognition while remaining competitive on standard molecular structure recognition.

Machine Learning

Diagram showing NaViT packing variable-resolution image patches into a single sequence

NaViT: Native Resolution Vision Transformer

NaViT applies sequence packing (Patch n’ Pack) to Vision Transformers, enabling training on images of arbitrary resolution and aspect ratio while improving training efficiency by up to 4x over standard ViT.

Computational Chemistry

Bar chart showing vision language model performance across chemistry tasks including equipment identification, molecule matching, spectroscopy, and laboratory safety

MaCBench: Multimodal Chemistry and Materials Benchmark

MaCBench evaluates frontier vision language models across 1,153 chemistry and materials science tasks spanning data extraction, experimental execution, and data interpretation, uncovering fundamental limitations in spatial reasoning and cross-modal integration.

Computational Biology

Three-panel diagram showing input point sets, SVD factorization of the cross-covariance matrix, and the aligned result

Arun et al.: SVD-Based Least-Squares Fitting of 3D Points

Presents a concise SVD-based algorithm for finding the optimal rotation and translation between two 3D point sets, with analysis of the degenerate reflection case that Umeyama later corrected.

Computational Biology

Diagram showing the polar decomposition of the cross-covariance matrix M into orthonormal factor U and positive semidefinite square root

Horn et al.: Absolute Orientation Using Orthonormal Matrices

The matrix-based companion to Horn’s 1987 quaternion method, deriving the optimal rotation as the orthonormal factor in the polar decomposition of the cross-covariance matrix via eigendecomposition of a 3x3 symmetric matrix.

Computational Biology

Side-by-side comparison showing naive SVD producing a reflected alignment versus Umeyama's corrected proper rotation

Umeyama's Method: Corrected SVD for Point Alignment

Corrects a flaw in prior SVD-based alignment methods (Arun et al., Horn et al.) that could produce reflections instead of rotations under noisy data, and provides a complete closed-form solution for similarity transformations in arbitrary dimensions.

Optical Chemical Structure Recognition

AdaptMol domain adaptation pipeline showing encoder-decoder with MMD alignment between labeled source and unlabeled target domain images

AdaptMol: Domain Adaptation for Molecular OCSR (2026)

AdaptMol combines an end-to-end graph reconstruction model with unsupervised domain adaptation via class-conditional MMD on bond features and SMILES-validated self-training. Achieves 82.6% accuracy on hand-drawn molecules (10.7 points above prior best) while maintaining state-of-the-art results on four literature benchmarks, using only 4,080 real hand-drawn images for adaptation.

Optical Chemical Structure Recognition

GraphReco system architecture showing component extraction, atom and bond ambiguity resolution, and graph reconstruction stages

GraphReco: Probabilistic Structure Recognition (2026)

GraphReco presents a rule-based OCSR system with two key innovations: a Fragment Merging line detection algorithm for precise bond identification and a Markov network for probabilistic resolution of atom/bond ambiguity during graph assembly. Achieves 94.2% accuracy on USPTO-10K, outperforming both traditional rule-based and some ML-based methods.