Computational Chemistry
ChemBERTa-3 visualization showing muscular arms lifting a stack of building blocks representing molecular data with SMILES notation, symbolizing the power and scalability of the open-source training framework

ChemBERTa-3: Open Source Training Framework

An open-source framework integrating DeepChem and Ray for training and benchmarking chemical foundation models like …

Computational Chemistry
ChemBERTa-2 visualization showing flowing SMILES strings in blue tones representing molecular data streams

ChemBERTa-2

Optimizing transformer pretraining for molecules using MLM vs MTR objectives, scaling to 77M compounds from PubChem for …

Generative Modeling
GP-MoLFormer architecture showing large-scale SMILES input, linear-attention transformer decoder, and property optimization via pair-tuning soft prompts

GP-MoLFormer: Molecular Generation via Transformers

A 46.8M parameter transformer for molecular generation trained on 1.1B SMILES, introducing pair-tuning for efficient …

Computational Chemistry
Chemformer pre-training on 100M SMILES strings flowing into BART model, which then enables reaction prediction and property prediction tasks

Chemformer: Pre-trained Transformer for Comp Chem

BART-based Transformer pre-trained on 100M molecules using self-supervision to accelerate convergence on chemical …

Computational Chemistry
ChemDFM-X architecture showing five modalities (2D graphs, 3D conformations, images, MS2 spectra, IR spectra) feeding through separate encoders into unified LLM decoder

ChemDFM-X: Large Multimodal Model for Chemistry

Multimodal chemical model integrating 5 modalities (2D graphs, 3D conformations, images, MS2/IR spectra) trained on 7.6M …

Computational Chemistry
Diagram showing text, molecular structures, and reactions feeding into a multimodal index and search system that outputs passages with context

Multimodal Search in Chemical Documents

A multimodal search engine that integrates text passages, molecular diagrams, and reaction data to enable passage-level …

Computational Chemistry
Diagram showing molecular structure passing through a neural network to produce IUPAC chemical nomenclature document

STOUT V2.0: SMILES to IUPAC Name Conversion

A Transformer-based model for translating SMILES to IUPAC names, trained on ~1 billion molecules, achieving ~99% …

Computational Chemistry
Vintage wooden device labeled 'The Molecular Interpreter - Model 1974' with vacuum tubes, showing SMILES to IUPAC name translation

STOUT: SMILES to IUPAC names using NMT

A deep-learning neural machine translation approach to translate between SMILES strings and IUPAC names using the STOUT …

Computational Chemistry
Diagram showing Struct2IUPAC workflow: molecular structure (SMILES) passing through Transformer to generate IUPAC name, with round-trip verification loop

Struct2IUPAC: Transformers for SMILES to IUPAC

A Transformer-based model for translating between SMILES strings and IUPAC names, trained on 47M PubChem examples, …

Computational Chemistry
Transformer encoder-decoder architecture processing InChI string character-by-character to produce IUPAC chemical name

Translating InChI to IUPAC Names with Transformers

Sequence-to-sequence Transformer translating InChI identifiers to IUPAC names with 91% accuracy on organic compounds.

Computational Chemistry
ChemVLM architecture showing molecular structure and text inputs flowing through vision encoder and language model into multimodal LLM for chemical reasoning

ChemVLM: Multimodal LLM for Chemistry

A 26B parameter multimodal LLM for chemistry, combining InternViT-6B and ChemLLM-20B for molecular structure …

Computational Chemistry
Image2InChI: SwinTransformer for Molecular Recognition

Image2InChI: SwinTransformer for Molecular Recognition

Deep learning model using improved SwinTransformer encoder and attention-based feature fusion to convert molecular …