Benchmark

Bar chart comparing nach0 vs T5-base across molecular captioning, Q/A, reaction prediction, retrosynthesis, and generation

nach0: A Multimodal Chemical and NLP Foundation Model

nach0 unifies natural language and SMILES-based chemical tasks in a single encoder-decoder model, achieving competitive results across molecular property prediction, reaction prediction, molecular generation, and biomedical NLP benchmarks.

Computational Chemistry

Bar chart showing randomized SMILES generate more of GDB-13 chemical space than canonical SMILES across training set sizes

Randomized SMILES Improve Molecular Generative Models

An extensive benchmark showing that training RNN generative models with randomized (non-canonical) SMILES strings yields more uniform, complete, and closed molecular output domains than canonical SMILES.

Computational Chemistry

Bar chart comparing PMO benchmark scores with and without chemical quality filters across five generative methods

Re-evaluating Sample Efficiency in Molecule Generation

A critical reassessment of the PMO benchmark for de novo molecule generation, showing that adding molecular weight, LogP, and diversity filters substantially re-ranks generative models, with Augmented Hill-Climb emerging as the top method.

Computational Chemistry

Bar chart comparing Atom Pair Encoding vs BPE tokenization on MoleculeNet classification tasks

SMILES vs SELFIES Tokenization for Chemical LMs

Introduces Atom Pair Encoding (APE), a chemistry-aware tokenizer for SMILES and SELFIES, and shows it consistently outperforms Byte Pair Encoding in RoBERTa-based molecular property classification on BBBP, HIV, and Tox21 benchmarks.

Computational Chemistry

Bar chart comparing SMILES2Vec and Graph Conv scores across five MoleculeNet tasks

SMILES2Vec: Interpretable Chemical Property Prediction

SMILES2Vec is a deep RNN that learns chemical features directly from SMILES strings using a Bayesian-optimized CNN-GRU architecture. It matches graph convolution baselines on toxicity and activity prediction, and its explanation mask identifies chemically meaningful functional groups with 88% accuracy.

Computational Chemistry

Visualization of tokenizer vocabulary coverage across chemical space

Smirk: Complete Tokenization for Molecular Models

Introduces Smirk and Smirk-GPE tokenizers that fully cover the OpenSMILES specification, proposes n-gram language models as low-cost proxies for evaluating tokenizer quality, and benchmarks 34 tokenizers across intrinsic and extrinsic metrics.

Computational Chemistry

Bar chart showing scientific LLM taxonomy across five modalities: textual, molecular, protein, genomic, and multimodal

Survey of Scientific LLMs in Bio and Chem Domains

This survey systematically reviews scientific LLMs (Sci-LLMs) across five modalities: textual, molecular, protein, genomic, and multimodal, analyzing architectures, datasets, evaluation methods, and open challenges for AI-driven scientific discovery.

Computational Chemistry

Overview of 16 transformer models for molecular property prediction organized by architecture type

Transformers for Molecular Property Prediction Review

Sultan et al. review 16 sequence-based transformer models for molecular property prediction, systematically analyzing seven design decisions (database selection, chemical language, tokenization, positional encoding, model size, pre-training objectives, and fine-tuning strategy) and identifying a critical need for standardized evaluation practices.

Computational Chemistry

Bar chart showing retrieval accuracy of chemical language models across four SMILES augmentation types

AMORE: Testing ChemLLM Robustness to SMILES Variants

Introduces AMORE, an embedding-based retrieval framework that evaluates whether chemical language models can recognize the same molecule across different SMILES representations. Results show current models are not robust to identity-preserving augmentations.

Computational Chemistry

Heatmap showing LLM accuracy across nine chemistry coding task categories for four models, with green indicating high accuracy and red indicating low accuracy

Benchmarking Chemistry Knowledge in Code-Gen LLMs

A benchmark of 84 chemistry coding tasks evaluating code-generating LLMs like Codex, showing 72% accuracy with prompt engineering strategies that improve performance by 30 percentage points.

Computational Chemistry

Bar chart comparing LLM, DeBERTa, GCN, and GIN performance on three OGB molecular classification benchmarks

Benchmarking LLMs for Molecular Property Prediction

Benchmarks large language models on six molecular property prediction datasets, finding that LLMs lag behind GNNs but can augment ML models when used collaboratively.

Computational Chemistry

Bar chart comparing fixed molecular representations (RF, SVM, XGBoost) against learned representations (MolBERT, GROVER) across six property prediction benchmarks under scaffold split

Benchmarking Molecular Property Prediction at Scale

This study trains over 62,000 models to systematically evaluate molecular representations and models for property prediction, finding that traditional ML on fixed descriptors often outperforms deep learning approaches.