Computational Chemistry
ChemBERTa-3 visualization showing muscular arms lifting a stack of building blocks representing molecular data with SMILES notation, symbolizing the power and scalability of the open-source training framework

ChemBERTa-3: Open Source Training Framework

An open-source framework integrating DeepChem and Ray for training and benchmarking chemical foundation models like …

Computational Chemistry
Chemical structures and molecular representations feeding into a neural network model that processes atomized chemical knowledge

ChemDFM-R: Chemical Reasoner LLM

A 14B-parameter chemical reasoning LLM enhanced with atomized functional group knowledge and mix-sourced distillation …

Computational Chemistry
ChemBERTa-2 visualization showing flowing SMILES strings in blue tones representing molecular data streams

ChemBERTa-2

Optimizing transformer pretraining for molecules using MLM vs MTR objectives, scaling to 77M compounds from PubChem for …

Generative Modeling
GP-MoLFormer architecture showing large-scale SMILES input, linear-attention transformer decoder, and property optimization via pair-tuning soft prompts

GP-MoLFormer: Molecular Generation via Transformers

A 46.8M parameter transformer for molecular generation trained on 1.1B SMILES, introducing pair-tuning for efficient …

Computational Chemistry
ChemBERTa masked language modeling visualization showing SMILES string CC(=O)O with masked tokens

ChemBERTa: Molecular Property Prediction via Transformers

A systematic evaluation of RoBERTa transformers pretrained on 77M PubChem SMILES for molecular property prediction …

Computational Chemistry
Chemformer pre-training on 100M SMILES strings flowing into BART model, which then enables reaction prediction and property prediction tasks

Chemformer: Pre-trained Transformer for Comp Chem

BART-based Transformer pre-trained on 100M molecules using self-supervision to accelerate convergence on chemical …

Computational Chemistry
ChemDFM-X architecture showing five modalities (2D graphs, 3D conformations, images, MS2 spectra, IR spectra) feeding through separate encoders into unified LLM decoder

ChemDFM-X: Large Multimodal Model for Chemistry

Multimodal chemical model integrating 5 modalities (2D graphs, 3D conformations, images, MS2/IR spectra) trained on 7.6M …

Computational Chemistry
Comparative analysis of image-to-sequence OCSR methods

Image-to-Sequence OCSR: A Comparative Analysis

Comparative analysis of image-to-sequence OCSR methods across architecture, output format, training data, and compute …

Computational Chemistry
InstructMol architecture showing molecular graph and text inputs feeding through two-stage training to produce property predictions, descriptions, and reactions

InstructMol: Multi-Modal Molecular Assistant

A multi-modal LLM aligning 2D molecular graphs with text via two-stage instruction tuning for drug discovery tasks.

Computational Chemistry
MERMaid pipeline diagram showing PDF processing through VisualHeist segmentation, DataRaider VLM mining, and KGWizard graph construction to produce chemical knowledge graphs

MERMaid: Multimodal Reaction Mining

Vision-language pipeline extracting chemical reaction data from PDF figures and tables into structured knowledge graphs …

Computational Chemistry
MOFFlow assembles metal nodes and organic linkers into Metal-Organic Framework structures

MOFFlow: Flow Matching for MOF Structure Prediction

A Riemannian flow matching framework for generating Metal-Organic Framework structures by treating building blocks as …

Computational Chemistry
Diagram showing text, molecular structures, and reactions feeding into a multimodal index and search system that outputs passages with context

Multimodal Search in Chemical Documents

A multimodal search engine that integrates text passages, molecular diagrams, and reaction data to enable passage-level …