Computational Chemistry
Bar chart comparing RNN and GPT architectures with SMILES and Graph representations on desirability scores

DrugEx v3: Scaffold-Constrained Graph Transformer

DrugEx v3 extends scaffold-constrained drug design by introducing a Graph Transformer with adjacency-matrix-based positional encoding, achieving 100% molecular validity and high predicted affinity for adenosine A2A receptor ligands.

Computational Chemistry
Bar chart showing peak absorption wavelength increasing across evolutionary generations

Evolutionary Molecular Design via Deep Learning + GA

An evolutionary molecular design framework that evolves ECFP fingerprint vectors using a genetic algorithm, reconstructs valid SMILES via an RNN decoder, and evaluates fitness with a DNN property predictor.

Computational Chemistry
Bar chart comparing GPT-3 ada and GNN accuracy across molecular classification tasks

Fine-Tuning GPT-3 for Molecular Property Prediction

This paper fine-tunes GPT-3’s ada model on SMILES strings for classifying electronic properties (HOMO, LUMO) of organic semiconductor molecules, finding competitive accuracy with graph neural networks and exploring robustness through ablation studies.

Computational Chemistry
Bar chart comparing small and big foundation models surveyed across property prediction, MLIPs, inverse design, and multi-domain chemistry applications

Foundation Models in Chemistry: A 2025 Perspective

This perspective from Choi et al. reviews foundation models in chemistry, categorizing them as ‘small’ (domain-specific, e.g., property prediction, MLIPs, inverse design) and ‘big’ (multi-domain, e.g., multimodal and LLM-based). It surveys pretraining strategies, key architectures (GNNs and language models), and outlines future directions for scaling, efficiency, and interpretability.

Computational Chemistry
Taxonomy diagram showing four generative model families (VAE, GAN, Diffusion, Flow) connecting to small molecule generation and protein generation subtasks

Generative AI Survey for De Novo Molecule and Protein Design

This survey organizes generative AI for de novo drug design into two themes: small molecule generation (target-agnostic, target-aware, conformation) and protein generation (structure prediction, sequence generation, backbone design, antibody, peptide). It covers four generative model families (VAEs, GANs, diffusion, flow-based), catalogs key datasets and benchmarks, and provides 12 comparative benchmark tables across all subtasks.

Computational Chemistry
Bar chart comparing Group SELFIES vs SELFIES on MOSES benchmark metrics

Group SELFIES: Fragment-Based Molecular Strings

Group SELFIES extends SELFIES with group tokens representing functional groups and substructures, maintaining chemical robustness while improving distribution learning and molecular generation quality.

Computational Chemistry
Schematic of inverse molecular design paradigm mapping desired properties to molecular structures through generative models

Inverse Molecular Design with ML Generative Models

A foundational review surveying how deep generative models (VAEs, GANs, reinforcement learning) enable inverse molecular design, covering molecular representations, chemical space navigation, and applications from drug discovery to materials engineering.

Computational Chemistry
Bar chart showing Lingo3DMol achieves best Vina docking scores on DUD-E compared to five baselines

Lingo3DMol: Language Model for 3D Molecule Design

Lingo3DMol introduces FSMILES, a fragment-based SMILES representation with local and global coordinates, to generate drug-like 3D molecules in protein pockets via a transformer language model.

Computational Chemistry
Schematic of Link-INVENT architecture showing encoder-decoder RNN with reinforcement learning scoring loop

Link-INVENT: RL-Driven Molecular Linker Generation

Link-INVENT is an RNN-based generative model for molecular linker design that uses reinforcement learning with a flexible scoring function, demonstrated on fragment linking, scaffold hopping, and PROTAC design.

Computational Chemistry
Bar chart comparing LLM-Prop band gap MAE against CGCNN, SchNet, MEGNet, and ALIGNN

LLM-Prop: Predicting Crystal Properties from Text

LLM-Prop uses the encoder half of T5, fine-tuned on Robocrystallographer text descriptions, to predict crystal properties. It outperforms GNN baselines like ALIGNN on band gap and volume prediction while using fewer parameters.

Computational Chemistry
Diagram showing the CaR pipeline from SMILES to ChatGPT-generated captions to fine-tuned RoBERTa predictions

LLM4Mol: ChatGPT Captions as Molecular Representations

Proposes Captions as Representations (CaR), where ChatGPT generates textual explanations for SMILES strings that are then used to fine-tune small language models for molecular property prediction.

Computational Chemistry
Bar chart showing language model validity rates across XYZ, CIF, and PDB 3D chemical file formats

LMs Generate 3D Molecules from XYZ, CIF, PDB Files

Demonstrates that standard transformer language models, trained with next-token prediction on sequences from XYZ, CIF, and PDB files, can generate valid 3D molecules, crystals, and protein binding sites competitive with domain-specific 3D generative models.