Models for predicting molecular or crystal properties from chemical string representations, plus benchmark suites and evaluation studies for assessing prediction quality.

Prediction Methods

PaperYearApproachKey Idea
SMILES2Vec2017CNN-GRUInterpretable property prediction from raw SMILES embeddings
Transformer-CNN2020Transformer + CNNTransformer SMILES embeddings with CNN for interpretable QSAR
MolPMoFiT2020Transfer learningULMFiT-style inductive transfer for QSAR on small datasets
Maxsmi2021CNN/RNNSMILES augmentation improves CNN and RNN property prediction
Perplexity Ranking2022LM scoringPerplexity scores rank molecules and detect pretraining bias
LM Distributions2022RNN LMRNN language models capture complex molecular distributions
MTL-BERT2022BERTMultitask pretraining with SMILES enumeration augmentation
Regression Transformer2023TransformerUnifies property prediction and conditional generation in one model
LLM-Prop2025T5Crystal property prediction from text descriptions

Benchmarks, Evaluation & Surveys

PaperYearKey Idea
MoleculeNet2018Benchmark suite across quantum mechanics, physical chemistry, biophysics, and physiology tasks
Activity Cliffs2022Exposes ML limitations where structurally similar molecules have very different activities
ROGI-XD2023Task-independent measure of representation quality via structure-activity landscape roughness
Benchmarking at Scale2023Large-scale systematic comparison of molecular property prediction approaches
Transformers for Property Prediction2024Review of transformer architectures applied to molecular property prediction