Document Processing
Diagram showing page stream segmentation workflow: an input stream of pages is processed through binary classification of page pairs to predict document breaks, producing segmented output documents

LLMs for Page Stream Segmentation

Enhanced TabMe benchmark for page stream segmentation, creating TabMe++, showing fine-tuned decoder-based LLMs …

Computational Chemistry
ChemDFM-X architecture showing five modalities (2D graphs, 3D conformations, images, MS2 spectra, IR spectra) feeding through separate encoders into unified LLM decoder

ChemDFM-X: Large Multimodal Model for Chemistry

Multimodal chemical model integrating 5 modalities (2D graphs, 3D conformations, images, MS2/IR spectra) trained on 7.6M …

Computational Chemistry

Image-to-Sequence OCSR: A Comparative Analysis

Comparative analysis of image-to-sequence OCSR methods across architecture, output format, training data, and compute …

Computational Chemistry
Diagram showing text, molecular structures, and reactions feeding into a multimodal index and search system that outputs passages with context

Multimodal Search in Chemical Documents

A multimodal search engine that integrates text passages, molecular diagrams, and reaction data to enable passage-level …

Computational Chemistry

OCSAug: Diffusion-Based Augmentation for Hand-Drawn OCSR

A diffusion-based data augmentation pipeline (OCSAug) using DDPM and RePaint to improve optical chemical structure …

Computational Chemistry

AtomLenz: Atom-Level OCSR with Limited Supervision

Weakly supervised OCSR framework combining object detection and graph construction to recognize chemical structures from …

Computational Chemistry

ChemReco: Hand-Drawn Chemical Structure Recognition

A deep learning method using EfficientNet and Transformer to convert hand-drawn chemical structures into SMILES codes, …

Computational Chemistry
ChemVLM architecture showing molecular structure and text inputs flowing through vision encoder and language model into multimodal LLM for chemical reasoning

ChemVLM: Multimodal LLM for Chemistry

A 26B parameter multimodal LLM for chemistry, combining InternViT-6B and ChemLLM-20B for molecular structure …

Computational Chemistry

Comparing OCSR Tools (Krasnov et al. 2024)

Benchmark of 8 open-access OCSR methods on 2702 manually curated patent images, with ChemIC classifier for hybrid …

Computational Chemistry

DECIMER.ai: Optical Chemical Structure Recognition

Open-source OCSR platform combining Mask R-CNN segmentation and Transformer recognition, trained on 450M+ synthetic …

Computational Chemistry

Dual-Path Global Awareness Transformer (DGAT)

A Transformer-based OCSR model introducing dual-path modules (CGFE and SDGLA) to improve global context awareness and …

Computational Chemistry

Enhanced DECIMER for Hand-Drawn Structure Recognition

An improved encoder-decoder model (EfficientNetV2 + Transformer) for converting hand-drawn chemical structures into …