Document Processing
Diagram showing page stream segmentation workflow: an input stream of pages is processed through binary classification of page pairs to predict document breaks, producing segmented output documents

LLMs for Page Stream Segmentation

Enhanced TabMe benchmark for page stream segmentation, creating TabMe++, showing fine-tuned decoder-based LLMs …

Computational Chemistry
ChemBERTa-3 visualization showing muscular arms lifting a stack of building blocks representing molecular data with SMILES notation, symbolizing the power and scalability of the open-source training framework

ChemBERTa-3: Open Source Training Framework

An open-source framework integrating DeepChem and Ray for training and benchmarking chemical foundation models like …

Computational Chemistry
Chemical structures and molecular representations feeding into a neural network model that processes atomized chemical knowledge

ChemDFM-R: Chemical Reasoner LLM

A 14B-parameter chemical reasoning LLM enhanced with atomized functional group knowledge and mix-sourced distillation …

Computational Chemistry
ChemBERTa-2 visualization showing flowing SMILES strings in blue tones representing molecular data streams

ChemBERTa-2

Optimizing transformer pretraining for molecules using MLM vs MTR objectives, scaling to 77M compounds from PubChem for …

Computational Chemistry
ChemBERTa masked language modeling visualization showing SMILES string CC(=O)O with masked tokens

ChemBERTa: Molecular Property Prediction via Transformers

A systematic evaluation of RoBERTa transformers pretrained on 77M PubChem SMILES for molecular property prediction …

Computational Chemistry
MERMaid pipeline diagram showing PDF processing through VisualHeist segmentation, DataRaider VLM mining, and KGWizard graph construction to produce chemical knowledge graphs

MERMaid: Multimodal Reaction Mining

Vision-language pipeline extracting chemical reaction data from PDF figures and tables into structured knowledge graphs …

Computational Chemistry

ChemReco: Hand-Drawn Chemical Structure Recognition

A deep learning method using EfficientNet and Transformer to convert hand-drawn chemical structures into SMILES codes, …

Computational Chemistry

Comparing OCSR Tools (Krasnov et al. 2024)

Benchmark of 8 open-access OCSR methods on 2702 manually curated patent images, with ChemIC classifier for hybrid …

Computational Chemistry

DECIMER.ai: Optical Chemical Structure Recognition

Open-source OCSR platform combining Mask R-CNN segmentation and Transformer recognition, trained on 450M+ synthetic …

Computational Chemistry

Dual-Path Global Awareness Transformer (DGAT)

A Transformer-based OCSR model introducing dual-path modules (CGFE and SDGLA) to improve global context awareness and …

Computational Chemistry

Enhanced DECIMER for Hand-Drawn Structure Recognition

An improved encoder-decoder model (EfficientNetV2 + Transformer) for converting hand-drawn chemical structures into …

Computational Chemistry

MMSSC-Net: Multi-Stage Sequence Cognitive Networks

A deep learning model for Optical Chemical Structure Recognition (OCSR) using SwinV2 and GPT-2 to convert molecular …