Computational Chemistry
Bar chart comparing MAT average ROC-AUC against D-MPNN, GCN, and Weave baselines

MAT: Graph-Augmented Transformer for Molecules (2020)

Molecule Attention Transformer (MAT) augments Transformer self-attention with inter-atomic distances and graph adjacency, achieving strong property prediction across diverse molecular tasks with minimal hyperparameter tuning after self-supervised pretraining.

Computational Chemistry
Bar chart showing RMSE improvement from SMILES augmentation across ESOL, FreeSolv, and lipophilicity datasets

Maxsmi: SMILES Augmentation for Property Prediction

A systematic study of SMILES augmentation strategies for molecular property prediction, showing that augmentation consistently improves CNN and RNN performance and that prediction variance across SMILES correlates with model uncertainty.

Computational Chemistry
Bar chart comparing Mol2vec ESOL RMSE against ECFP4, MACCS, and Neural Fingerprint baselines

Mol2vec: Unsupervised ML with Chemical Intuition

Mol2vec treats molecular substructures as words and compounds as sentences, training Word2vec on 19.9M molecules to produce dense embeddings that capture chemical intuition and enable competitive property prediction.

Computational Chemistry
Bar chart showing MTL-BERT combining pretraining, multitask learning, and SMILES enumeration for best improvement

MTL-BERT: Multitask BERT for Property Prediction

MTL-BERT pretrains a BERT model on 1.7M unlabeled SMILES, then fine-tunes jointly on 60 ADMET and molecular property tasks using SMILES enumeration as data augmentation in all phases.

Computational Chemistry
Bar chart comparing SMILES tokens vs Atom-in-SMILES across molecular generation, retrosynthesis, and reaction prediction

Atom-in-SMILES: Better Tokens for Chemical Models

Introduces Atom-in-SMILES (AIS), a tokenization scheme that encodes local chemical environments into SMILES tokens, improving prediction quality across canonicalization, retrosynthesis, and property prediction tasks.

Computational Chemistry
Bar chart showing BindGPT RL achieves best Vina binding scores compared to baselines

BindGPT: GPT for 3D Molecular Design and Docking

BindGPT formulates 3D molecular design as autoregressive text generation over combined SMILES and XYZ tokens, using large-scale pre-training and reinforcement learning to achieve competitive pocket-conditioned molecule generation.

Computational Chemistry
Bar chart comparing CDDD virtual screening AUC against ECFP4, Mol2vec, Seq2seq FP, and VAE baselines

CDDD: Learning Descriptors by Translating SMILES

Winter et al. propose CDDD, a translation-based encoder-decoder that learns continuous molecular descriptors by translating between equivalent chemical representations like SMILES and InChI, pretrained on 72 million compounds.

Computational Chemistry
Bar chart showing GPT-4 relative performance across eight chemistry tasks grouped by understanding, reasoning, and explaining capabilities

ChemLLMBench: Benchmarking LLMs on Chemistry Tasks

A comprehensive benchmark evaluating GPT-4, GPT-3.5, Davinci-003, Llama, and Galactica on eight practical chemistry tasks, revealing that LLMs are competitive on classification and text tasks but struggle with SMILES-dependent generation.

Computational Chemistry
Bar chart comparing SMILES and DeepSMILES error types, showing DeepSMILES eliminates parenthesis errors

DeepSMILES: Adapting SMILES Syntax for Machine Learning

DeepSMILES replaces paired parentheses and ring closure symbols in SMILES with a postfix notation and single ring-size digits, making it easier for generative models to produce syntactically valid molecular strings.

Computational Chemistry
Bar chart showing peak absorption wavelength increasing across evolutionary generations

Evolutionary Molecular Design via Deep Learning + GA

An evolutionary molecular design framework that evolves ECFP fingerprint vectors using a genetic algorithm, reconstructs valid SMILES via an RNN decoder, and evaluates fitness with a DNN property predictor.

Computational Chemistry
Bar chart comparing GPT-3 ada and GNN accuracy across molecular classification tasks

Fine-Tuning GPT-3 for Molecular Property Prediction

This paper fine-tunes GPT-3’s ada model on SMILES strings for classifying electronic properties (HOMO, LUMO) of organic semiconductor molecules, finding competitive accuracy with graph neural networks and exploring robustness through ablation studies.

Computational Chemistry
Bar chart comparing Group SELFIES vs SELFIES on MOSES benchmark metrics

Group SELFIES: Fragment-Based Molecular Strings

Group SELFIES extends SELFIES with group tokens representing functional groups and substructures, maintaining chemical robustness while improving distribution learning and molecular generation quality.