This group covers models specifically designed for predicting molecular or crystal properties from chemical string representations. It includes SMILES-based QSAR architectures, transfer learning approaches, multitask prediction, hybrid prediction-generation models, and text-based crystal property prediction. Also included are studies on using language model perplexity as an intrinsic molecular scoring method and evaluations of how well language models capture complex molecular distributions.

PaperYearApproachKey Idea
SMILES2Vec2017CNN-GRUInterpretable property prediction from raw SMILES embeddings
Transformer-CNN2020Transformer + CNNTransformer SMILES embeddings with CNN for interpretable QSAR
MolPMoFiT2020Transfer learningULMFiT-style inductive transfer for QSAR on small datasets
Maxsmi2021CNN/RNNSMILES augmentation improves CNN and RNN property prediction
Perplexity Ranking2022LM scoringPerplexity scores rank molecules and detect pretraining bias
LM Distributions2022RNN LMRNN language models capture complex molecular distributions
MTL-BERT2022BERTMultitask pretraining with SMILES enumeration augmentation
Regression Transformer2023TransformerUnifies property prediction and conditional generation in one model
LLM-Prop2025T5Crystal property prediction from text descriptions

All Notes