Computational Chemistry
GDB-13 molecule structure showing CCCC(O)(CO)CC1CC1CN

GDB-13: Chemical Universe Database (970M Molecules)

A dataset card for the Generated Database 13 (GDB-13), a database of nearly 1 billion small organic molecules for …

Computational Chemistry
GDB-17 molecule structure showing complex polycyclic architecture

GDB-17: Chemical Universe Database (166B Molecules)

Dataset card for GDB-17, containing 166 billion small organic molecules representing the largest enumerated chemical …

Computational Chemistry

Invalid SMILES are Beneficial Rather than Detrimental to Chemical Language Models

Skinnider's 2024 Nature Machine Intelligence paper demonstrates that the ability to generate invalid SMILES is actually …...

Computational Chemistry
Charts showing Dunn Index, distance metrics, and computation time analysis revealing clustering performance degradation with molecular size

Can You Hear the Shape of a Molecule? (Part Two)

Clustering analysis reveals why Coulomb matrix eigenvalues struggle with larger alkanes, using Dunn Index and silhouette …

Computational Chemistry
3D ball-and-stick model of butane molecule showing linear carbon chain structure

Can You Hear the Shape of a Molecule?

Explore molecular shape recognition using Coulomb matrix eigenvalues. Analysis of alkane isomers from data generation to …

Computational Chemistry

SELFIES and the Future of Molecular String Representations

A comprehensive perspective on molecular string representations, focusing on SELFIES as a 100% robust alternative to …...

Computational Chemistry

IMG2SMI: Translating Molecular Structure Images to SMILES

Campos & Ji's method for converting 2D molecular images to SMILES strings using Transformers and SELFIES representation....