Computational Chemistry
GDB-17 molecule structure showing complex polycyclic architecture

GDB-17: Chemical Universe Database (166B Molecules)

Dataset card for GDB-17, containing 166 billion small organic molecules representing the largest enumerated chemical …

Computational Chemistry
Comparison of 2D molecular graph versus 3D conformer ensemble showing latanoprost molecule in multiple conformations

GEOM Dataset: 3D Molecular Conformer Generation

Learn how GEOM transforms 2D molecular graphs into dynamic 3D conformer ensembles for molecular machine learning …

Computational Chemistry

Invalid SMILES are Beneficial Rather than Detrimental to Chemical Language Models

Skinnider (2024) shows that generating invalid SMILES actually improves chemical language model performance through …...

Computational Chemistry
3D ball-and-stick model of butane molecule representing the structural isomer generation process

Synthetic Isomer Data Generation Pipeline

An end-to-end cheminformatics pipeline transforming 1D chemical formulas into 3D conformer datasets using graph …...

Computational Chemistry
Comparison chart showing k-NN significantly outperforming logistic regression for molecular classification across different alkane sizes

Can You Hear the Shape of a Molecule? (Part Three)

Supervised learning reveals hidden eigenvalue patterns that clustering missed, testing k-NN and logistic regression on …

Computational Chemistry
Charts showing Dunn Index, distance metrics, and computation time analysis revealing clustering performance degradation with molecular size

Can You Hear the Shape of a Molecule? (Part Two)

Clustering analysis reveals why Coulomb matrix eigenvalues struggle with larger alkanes, using Dunn Index and silhouette …

Computational Chemistry
3D ball-and-stick model of butane molecule showing linear carbon chain structure

Can You Hear the Shape of a Molecule?

Explore molecular shape recognition using Coulomb matrix eigenvalues. Analysis of alkane isomers from data generation to …

Computational Chemistry
Coulomb matrix heatmap visualization showing molecular structure encoding on logarithmic scale

Understanding Coulomb Matrices for Molecular Machine Learning

Learn how Coulomb matrices encode 3D molecular structure for machine learning from basic theory to Python implementation …

Computational Chemistry

SELFIES and the Future of Molecular String Representations

Perspective on SELFIES as a 100% robust SMILES alternative, with 16 future research directions for molecular AI....