
LLMs for Page Stream Segmentation
Enhanced TabMe benchmark for page stream segmentation, creating TabMe++, showing fine-tuned decoder-based LLMs …

Enhanced TabMe benchmark for page stream segmentation, creating TabMe++, showing fine-tuned decoder-based LLMs …

Multimodal chemical model integrating 5 modalities (2D graphs, 3D conformations, images, MS2/IR spectra) trained on 7.6M …
Comparative analysis of image-to-sequence OCSR methods across architecture, output format, training data, and compute …

A multimodal search engine that integrates text passages, molecular diagrams, and reaction data to enable passage-level …
A diffusion-based data augmentation pipeline (OCSAug) using DDPM and RePaint to improve optical chemical structure …
Weakly supervised OCSR framework combining object detection and graph construction to recognize chemical structures from …
A deep learning method using EfficientNet and Transformer to convert hand-drawn chemical structures into SMILES codes, …

A 26B parameter multimodal LLM for chemistry, combining InternViT-6B and ChemLLM-20B for molecular structure …
Benchmark of 8 open-access OCSR methods on 2702 manually curated patent images, with ChemIC classifier for hybrid …
Open-source OCSR platform combining Mask R-CNN segmentation and Transformer recognition, trained on 450M+ synthetic …
A Transformer-based OCSR model introducing dual-path modules (CGFE and SDGLA) to improve global context awareness and …
An improved encoder-decoder model (EfficientNetV2 + Transformer) for converting hand-drawn chemical structures into …