Key Contribution

MARCEL provides a benchmark for conformer ensemble learning. It demonstrates that explicitly modeling full conformer distributions improves property prediction across drug-like molecules and organometallic catalysts.

Overview

The Molecular Representation and Conformer Ensemble Learning (MARCEL) dataset provides 722K+ conformations across 76K+ molecules spanning four diverse chemical domains: drug-like molecules (Drugs-75K), organophosphorus ligands (Kraken), chiral catalysts (EE), and organometallic complexes (BDE). MARCEL evaluates conformer ensemble methods across both pharmaceutical and catalysis applications.

Dataset Examples

Example conformer from Drugs-75K
Example conformer from Drugs-75K (SMILES: COC(=O)[C@@]1(Cc2ccc(OC)cc2)[C@H]2c3cc(C(=O)N(C)C)n(Cc4ccc(OC(F)(F)F)cc4)c3C[C@H]2CN1C(=O)c1ccccc1; IUPAC: methyl (2R,3R,6R)-4-benzoyl-10-(dimethylcarbamoyl)-3-[(4-methoxyphenyl)methyl]-9-[[4-(trifluoromethoxy)phenyl]methyl]-4,9-diazatricyclo[6.3.0.02,6]undeca-1(8),10-diene-3-carboxylate)
2D structure of Drugs-75K conformer
2D structure of Drugs-75K conformer above
Example conformer from Kraken in 2D
Example conformer from Kraken (ligand 10, conformer 0) in 2D
Example conformer from Kraken in 3D
Example conformer from Kraken (ligand 10, conformer 0) in 3D
Example substrate from BDE in 3D
Example substrate from BDE in 3D (Pt_9.63)
2D structure of BDE substrate
2D structure of BDE substrate above

Dataset Subsets

SubsetCountDescription
Drugs-75K75,099 moleculesDrug-like molecules with at least 5 rotatable bonds
Kraken1,552 moleculesMonodentate organophosphorus (III) ligands
EE872 moleculesRhodium (Rh)-bound atropisomeric catalysts derived from chiral bisphosphine
BDE5,195 moleculesOrganometallic catalysts ML$_1$L$_2$

Benchmarks

Ionization Potential (Drugs-75K)

Predict ionization potential from molecular structure

Subset: Drugs-75K

RankModelMAE (eV)
πŸ₯‡ 1Ensemble - GemNet
GemNet on full conformer ensemble
0.4066
πŸ₯ˆ 23D - GemNet
Geometry-enhanced message passing (single conformer)
0.4069
πŸ₯‰ 3Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
0.4126
4Ensemble - LEFTNet
LEFTNet on full conformer ensemble
0.4149
53D - LEFTNet
Local Environment Feature Transformer (single conformer)
0.4174
6Ensemble - ClofNet
ClofNet on full conformer ensemble
0.428
72D - GraphGPS
Graph Transformer with positional encodings
0.4351
82D - GIN
Graph Isomorphism Network
0.4354
92D - GIN+VN
GIN with Virtual Nodes
0.4361
103D - ClofNet
Conformation-ensemble learning network (single conformer)
0.4393
113D - SchNet
Continuous-filter convolutional network (single conformer)
0.4394
123D - DimeNet++
Directional message passing network (single conformer)
0.4441
13Ensemble - SchNet
SchNet on full conformer ensemble
0.4452
14Ensemble - PaiNN
PaiNN on full conformer ensemble
0.4466
153D - PaiNN
Polarizable Atom Interaction Network (single conformer)
0.4505
162D - ChemProp
Message Passing Neural Network
0.4595
171D - LSTM
LSTM on SMILES sequences
0.4788
181D - Random forest
Random Forest on Morgan fingerprints
0.4987
191D - Transformer
Transformer on SMILES sequences
0.6617

Electron Affinity (Drugs-75K)

Predict electron affinity from molecular structure

Subset: Drugs-75K

RankModelMAE (eV)
πŸ₯‡ 1Ensemble - GemNet
GemNet on full conformer ensemble
0.391
πŸ₯ˆ 23D - GemNet
Geometry-enhanced message passing (single conformer)
0.3922
πŸ₯‰ 3Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
0.3944
4Ensemble - LEFTNet
LEFTNet on full conformer ensemble
0.3953
53D - LEFTNet
Local Environment Feature Transformer (single conformer)
0.3964
6Ensemble - ClofNet
ClofNet on full conformer ensemble
0.4033
72D - GraphGPS
Graph Transformer with positional encodings
0.4085
82D - GIN
Graph Isomorphism Network
0.4169
92D - GIN+VN
GIN with Virtual Nodes
0.4169
103D - SchNet
Continuous-filter convolutional network (single conformer)
0.4207
113D - DimeNet++
Directional message passing network (single conformer)
0.4233
12Ensemble - SchNet
SchNet on full conformer ensemble
0.4232
133D - ClofNet
Conformation-ensemble learning network (single conformer)
0.4251
14Ensemble - PaiNN
PaiNN on full conformer ensemble
0.4269
152D - ChemProp
Message Passing Neural Network
0.4417
163D - PaiNN
Polarizable Atom Interaction Network (single conformer)
0.4495
171D - LSTM
LSTM on SMILES sequences
0.4648
181D - Random forest
Random Forest on Morgan fingerprints
0.4747
191D - Transformer
Transformer on SMILES sequences
0.585

Electronegativity (Drugs-75K)

Predict electronegativity (Ο‡) from molecular structure

Subset: Drugs-75K

RankModelMAE (eV)
πŸ₯‡ 13D - GemNet
Geometry-enhanced message passing (single conformer)
0.197
πŸ₯ˆ 2Ensemble - GemNet
GemNet on full conformer ensemble
0.2027
πŸ₯‰ 3Ensemble - LEFTNet
LEFTNet on full conformer ensemble
0.2069
43D - LEFTNet
Local Environment Feature Transformer (single conformer)
0.2083
5Ensemble - ClofNet
ClofNet on full conformer ensemble
0.2199
62D - GraphGPS
Graph Transformer with positional encodings
0.2212
73D - SchNet
Continuous-filter convolutional network (single conformer)
0.2243
8Ensemble - SchNet
SchNet on full conformer ensemble
0.2243
92D - GIN
Graph Isomorphism Network
0.226
102D - GIN+VN
GIN with Virtual Nodes
0.2267
11Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
0.2267
12Ensemble - PaiNN
PaiNN on full conformer ensemble
0.2294
133D - PaiNN
Polarizable Atom Interaction Network (single conformer)
0.2324
143D - ClofNet
Conformation-ensemble learning network (single conformer)
0.2378
153D - DimeNet++
Directional message passing network (single conformer)
0.2436
162D - ChemProp
Message Passing Neural Network
0.2441
171D - LSTM
LSTM on SMILES sequences
0.2505
181D - Random forest
Random Forest on Morgan fingerprints
0.2732
191D - Transformer
Transformer on SMILES sequences
0.4073

Bβ‚… Sterimol Parameter (Kraken)

Predict Bβ‚… sterimol descriptor for organophosphorus ligands

Subset: Kraken

RankModelMAE
πŸ₯‡ 1Ensemble - PaiNN
PaiNN on full conformer ensemble
0.2225
πŸ₯ˆ 2Ensemble - GemNet
GemNet on full conformer ensemble
0.2313
πŸ₯‰ 3Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
0.263
4Ensemble - LEFTNet
LEFTNet on full conformer ensemble
0.2644
5Ensemble - SchNet
SchNet on full conformer ensemble
0.2704
63D - GemNet
Geometry-enhanced message passing (single conformer)
0.2789
73D - LEFTNet
Local Environment Feature Transformer (single conformer)
0.3072
82D - GIN
Graph Isomorphism Network
0.3128
9Ensemble - ClofNet
ClofNet on full conformer ensemble
0.3228
103D - SchNet
Continuous-filter convolutional network (single conformer)
0.3293
113D - PaiNN
Polarizable Atom Interaction Network (single conformer)
0.3443
122D - GraphGPS
Graph Transformer with positional encodings
0.345
133D - DimeNet++
Directional message passing network (single conformer)
0.351
142D - GIN+VN
GIN with Virtual Nodes
0.3567
151D - Random forest
Random Forest on Morgan fingerprints
0.476
162D - ChemProp
Message Passing Neural Network
0.485
173D - ClofNet
Conformation-ensemble learning network (single conformer)
0.4873
181D - LSTM
LSTM on SMILES sequences
0.4879
191D - Transformer
Transformer on SMILES sequences
0.9611

L Sterimol Parameter (Kraken)

Predict L sterimol descriptor for organophosphorus ligands

Subset: Kraken

RankModelMAE
πŸ₯‡ 1Ensemble - GemNet
GemNet on full conformer ensemble
0.3386
πŸ₯ˆ 2Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
0.3468
πŸ₯‰ 3Ensemble - PaiNN
PaiNN on full conformer ensemble
0.3619
4Ensemble - LEFTNet
LEFTNet on full conformer ensemble
0.3643
53D - GemNet
Geometry-enhanced message passing (single conformer)
0.3754
62D - GIN
Graph Isomorphism Network
0.4003
73D - DimeNet++
Directional message passing network (single conformer)
0.4174
81D - Random forest
Random Forest on Morgan fingerprints
0.4303
9Ensemble - SchNet
SchNet on full conformer ensemble
0.4322
102D - GIN+VN
GIN with Virtual Nodes
0.4344
112D - GraphGPS
Graph Transformer with positional encodings
0.4363
123D - PaiNN
Polarizable Atom Interaction Network (single conformer)
0.4471
13Ensemble - ClofNet
ClofNet on full conformer ensemble
0.4485
143D - LEFTNet
Local Environment Feature Transformer (single conformer)
0.4493
151D - LSTM
LSTM on SMILES sequences
0.5142
162D - ChemProp
Message Passing Neural Network
0.5452
173D - SchNet
Continuous-filter convolutional network (single conformer)
0.5458
183D - ClofNet
Conformation-ensemble learning network (single conformer)
0.6417
191D - Transformer
Transformer on SMILES sequences
0.8389

Buried Bβ‚… Parameter (Kraken)

Predict buried Bβ‚… sterimol descriptor for organophosphorus ligands

Subset: Kraken

RankModelMAE
πŸ₯‡ 1Ensemble - GemNet
GemNet on full conformer ensemble
0.1589
πŸ₯ˆ 2Ensemble - PaiNN
PaiNN on full conformer ensemble
0.1693
πŸ₯‰ 32D - GIN
Graph Isomorphism Network
0.1719
43D - GemNet
Geometry-enhanced message passing (single conformer)
0.1782
5Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
0.1783
6Ensemble - SchNet
SchNet on full conformer ensemble
0.2024
7Ensemble - LEFTNet
LEFTNet on full conformer ensemble
0.2017
82D - GraphGPS
Graph Transformer with positional encodings
0.2066
93D - DimeNet++
Directional message passing network (single conformer)
0.2097
10Ensemble - ClofNet
ClofNet on full conformer ensemble
0.2178
113D - LEFTNet
Local Environment Feature Transformer (single conformer)
0.2176
123D - SchNet
Continuous-filter convolutional network (single conformer)
0.2295
133D - PaiNN
Polarizable Atom Interaction Network (single conformer)
0.2395
142D - GIN+VN
GIN with Virtual Nodes
0.2422
151D - Random forest
Random Forest on Morgan fingerprints
0.2758
161D - LSTM
LSTM on SMILES sequences
0.2813
173D - ClofNet
Conformation-ensemble learning network (single conformer)
0.2884
182D - ChemProp
Message Passing Neural Network
0.3002
191D - Transformer
Transformer on SMILES sequences
0.4929

Buried L Parameter (Kraken)

Predict buried L sterimol descriptor for organophosphorus ligands

Subset: Kraken

RankModelMAE
πŸ₯‡ 1Ensemble - GemNet
GemNet on full conformer ensemble
0.0947
πŸ₯ˆ 2Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
0.1185
πŸ₯‰ 32D - GIN
Graph Isomorphism Network
0.12
4Ensemble - PaiNN
PaiNN on full conformer ensemble
0.1324
5Ensemble - LEFTNet
LEFTNet on full conformer ensemble
0.1386
6Ensemble - SchNet
SchNet on full conformer ensemble
0.1443
73D - LEFTNet
Local Environment Feature Transformer (single conformer)
0.1486
82D - GraphGPS
Graph Transformer with positional encodings
0.15
91D - Random forest
Random Forest on Morgan fingerprints
0.1521
103D - DimeNet++
Directional message passing network (single conformer)
0.1526
11Ensemble - ClofNet
ClofNet on full conformer ensemble
0.1548
123D - GemNet
Geometry-enhanced message passing (single conformer)
0.1635
133D - PaiNN
Polarizable Atom Interaction Network (single conformer)
0.1673
142D - GIN+VN
GIN with Virtual Nodes
0.1741
153D - SchNet
Continuous-filter convolutional network (single conformer)
0.1861
161D - LSTM
LSTM on SMILES sequences
0.1924
172D - ChemProp
Message Passing Neural Network
0.1948
183D - ClofNet
Conformation-ensemble learning network (single conformer)
0.2529
191D - Transformer
Transformer on SMILES sequences
0.2781

Enantioselectivity (EE)

Predict enantiomeric excess for Rh-catalyzed asymmetric reactions

Subset: EE

RankModelMAE (%)
πŸ₯‡ 1Ensemble - GemNet
GemNet on full conformer ensemble
11.61
πŸ₯ˆ 2Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
12.03
πŸ₯‰ 3Ensemble - PaiNN
PaiNN on full conformer ensemble
13.56
4Ensemble - ClofNet
ClofNet on full conformer ensemble
13.96
5Ensemble - SchNet
SchNet on full conformer ensemble
14.22
63D - DimeNet++
Directional message passing network (single conformer)
14.64
73D - SchNet
Continuous-filter convolutional network (single conformer)
17.74
83D - GemNet
Geometry-enhanced message passing (single conformer)
18.03
9Ensemble - LEFTNet
LEFTNet on full conformer ensemble
18.42
103D - LEFTNet
Local Environment Feature Transformer (single conformer)
19.8
113D - PaiNN
Polarizable Atom Interaction Network (single conformer)
20.24
123D - ClofNet
Conformation-ensemble learning network (single conformer)
33.95
132D - ChemProp
Message Passing Neural Network
61.03
141D - Random forest
Random Forest on Morgan fingerprints
61.3
152D - GraphGPS
Graph Transformer with positional encodings
61.63
161D - Transformer
Transformer on SMILES sequences
62.08
172D - GIN
Graph Isomorphism Network
62.31
182D - GIN+VN
GIN with Virtual Nodes
62.38
191D - LSTM
LSTM on SMILES sequences
64.01

Bond Dissociation Energy (BDE)

Predict metal-ligand bond dissociation energy for organometallic catalysts

Subset: BDE

RankModelMAE (kcal/mol)
πŸ₯‡ 13D - DimeNet++
Directional message passing network (single conformer)
1.45
πŸ₯ˆ 2Ensemble - DimeNet++
DimeNet++ on full conformer ensemble
1.47
πŸ₯‰ 33D - LEFTNet
Local Environment Feature Transformer (single conformer)
1.53
4Ensemble - LEFTNet
LEFTNet on full conformer ensemble
1.53
5Ensemble - GemNet
GemNet on full conformer ensemble
1.61
63D - GemNet
Geometry-enhanced message passing (single conformer)
1.65
7Ensemble - PaiNN
PaiNN on full conformer ensemble
1.87
8Ensemble - SchNet
SchNet on full conformer ensemble
1.97
9Ensemble - ClofNet
ClofNet on full conformer ensemble
2.01
103D - PaiNN
Polarizable Atom Interaction Network (single conformer)
2.13
112D - GraphGPS
Graph Transformer with positional encodings
2.48
123D - SchNet
Continuous-filter convolutional network (single conformer)
2.55
133D - ClofNet
Conformation-ensemble learning network (single conformer)
2.61
142D - GIN
Graph Isomorphism Network
2.64
152D - ChemProp
Message Passing Neural Network
2.66
162D - GIN+VN
GIN with Virtual Nodes
2.74
171D - LSTM
LSTM on SMILES sequences
2.83
181D - Random forest
Random Forest on Morgan fingerprints
3.03
191D - Transformer
Transformer on SMILES sequences
10.08
DatasetRelationshipLink
GEOMSourceNotes

Strengths

  • Domain diversity: Beyond drug-like molecules, includes organometallics and catalysts rarely covered in existing benchmarks
  • Ensemble-based: Provides full conformer ensembles with statistical weights
  • DFT-quality energies: Drugs-75K features DFT-level conformers and energies (higher accuracy than GEOM-Drugs)
  • Realistic scenarios: BDE subset models the practical constraint of lacking DFT-computed conformers for large catalyst systems
  • Comprehensive baselines: Benchmarks 18 models across 1D (SMILES), 2D (graph), 3D (single conformer), and ensemble methods
  • Property diversity: Covers ionization potential, electron affinity, electronegativity, ligand descriptors, and catalytic properties

Limitations

  • Regression only: All tasks evaluate regression metrics exclusively
  • Chemical space coverage: The 76K molecules encapsulate a fraction of the expansive drug-like and catalyst chemical spaces
  • Compute requirements: Working with large conformer ensembles demands significant computational resources
  • Proprietary data: EE subset is proprietary (as of December 2025)
  • DFT bottleneck: BDE demonstrates a practical limitation: single DFT optimization can take 2-3 days, making conformer-level DFT infeasible for large organometallics
  • Uniform sampling baseline: The initial data augmentation strategy tested for handling ensembles samples conformers uniformly rather than by Boltzmann weight. This unprincipled physical assumption likely explains why the strategy occasionally introduces noise and fails to aid complex 3D architectures.
  • Drugs-75K properties: The large-scale benchmark (Drugs-75K) specifically targets electronic properties (Ionization Potential, Electron Affinity, Electronegativity). As the authors explicitly highlight in Section 5.2, these properties are generally less sensitive to conformational rotations compared to steric or spatial interactions. This significantly confounds evaluating whether explicit conformer ensembles actually benefit large-scale regression tasks.
  • Unrealistic single-conformer baselines: The 3D single-conformer models are exclusively evaluated on the lowest-energy conformer. This setup is inherently flawed for real-world application, as knowing the global minimum a priori requires exhaustively searching and computing energies for the entire conformer space.

Technical Notes

Data Generation Pipeline

Drugs-75K

Source: GEOM-Drugs subset

Filtering:

  • Minimum 5 rotatable bonds (focus on flexible molecules)
  • Allowed elements: H, C, N, O, F, Si, P, S, Cl

Conformer generation:

  • DFT-level calculations for both conformers and energies
  • Higher accuracy than original GEOM-Drugs (semi-empirical GFN2-xTB)

Properties: Ionization Potential (IP), Electron Affinity (EA), Electronegativity (Ο‡)

Kraken

Source: Original Kraken dataset (1,552 monodentate organophosphorus(III) ligands)

Properties: 5 of 78 available properties

  • $B_5$: Ligand sterics (buried volume)
  • $L$: Ligand electronics
  • $\text{Bur}B_5$: Buried volume variant
  • $\text{Bur}L$: Electronic parameter variant

EE (Enantiomeric Excess)

Generation method: Q2MM (Quantum-guided Molecular Mechanics)

Molecules: 872 Rhodium (Rh)-bound atropisomeric catalysts from chiral bisphosphine

Property: Enantiomeric excess (EE) for asymmetric catalysis

Availability: Proprietary-only (closed-source as of December 2025)

BDE (Bond Dissociation Energy)

Molecules: 5,195 organometallic catalysts (ML₁Lβ‚‚ structure)

Initial conformers: OpenBabel with geometric optimization

Energies: DFT calculations

Property: Electronic dissociation energy (difference between bound and unbound states)

Key constraint: DFT optimization for full conformer ensembles computationally infeasible (2-3 days per molecule)

Benchmark Setup

Task: Predict molecular properties from structure using different representation strategies (1D/2D/3D/Ensemble). The ground-truth regression targets are calculated as the Boltzmann-averaged value of the property across the conformer ensemble:

$$ \langle y \rangle_{k_B} = \sum_{\mathbf{C}_i \in \mathcal{C}} p_i y_i $$

Where $p_i$ is the conformal probability under experimental conditions derived from the conformer energy $e_i$:

$$ p_i = \frac{\exp(-e_i / k_B T)}{\sum_j \exp(-e_j / k_B T)} $$

Data splits: Datasets are partitioned 70% train, 10% validation, and 20% test.

Model categories:

  1. 1D Models: SMILES-based (Random Forest on Morgan fingerprints, LSTM, Transformer).
  2. 2D Models: Graph-based (GIN, GIN+VN, ChemProp, GraphGPS).
  3. 3D Models: Single conformer (SchNet, DimeNet++, GemNet, PaiNN, ClofNet, LEFTNet). For evaluation, single 3D models exclusively ingest the lowest-energy conformer. This baseline setting often yields strong performance but is unrealistic in practice, as identifying the global minimum requires exhaustively searching the entire conformer space.
  4. Ensemble Models: Full conformer ensemble processing via explicit set encoders. For each conformer embedding $\mathbf{z}_i$, three aggregation strategies are evaluated:

Mean Pooling: $$ \mathbf{s}_{\text{MEAN}} = \frac{1}{|\mathcal{C}|} \sum_{i=1}^{|\mathcal{C}|} \mathbf{z}_i $$

DeepSets: $$ \mathbf{s}_{\text{DS}} = g\left(\sum_{i=1}^{|\mathcal{C}|} h(\mathbf{z}_i)\right) $$

Self-Attention: $$ \begin{aligned} \mathbf{s}_{\text{ATT}} &= \frac{1}{|\mathcal{C}|} \sum_{i=1}^{|\mathcal{C}|} \mathbf{c}_i, \quad \text{where} \quad \mathbf{c}_i = g\left( \sum_{j=1}^{|\mathcal{C}|} \alpha_{ij} h(\mathbf{z}_j) \right) \\ \alpha_{ij} &= \frac{\exp\left((\mathbf{W} h(\mathbf{z}_i))^\top (\mathbf{W} h(\mathbf{z}_j))\right)}{\sum_{k=1}^{|\mathcal{C}|} \exp\left((\mathbf{W} h(\mathbf{z}_i))^\top (\mathbf{W} h(\mathbf{z}_k))\right)} \end{aligned} $$

Evaluation metric: Mean Absolute Error (MAE) for all tasks.

Key Findings

Ensemble superiority (task-dependent): Across benchmarks, explicitly modeling the full conformer set using DeepSets often achieved top performance. However, these improvements are not uniform:

  • Small-Scale Success: Ensemble methods show large improvements on tasks like Kraken (Ensemble PaiNN achieves 0.2225 on $B_5$ vs 0.3443 single) and EE (Ensemble GemNet achieves 11.61% vs 18.03% single).
  • Large-Scale Plateau: The performance improvements did not strongly transfer to massive subsets like Drugs-75K (Ensemble GemNet achieves 0.4066 eV on IP vs 0.4069 eV single). The authors posit the computational burden of processing all ensemble embeddings changes learning dynamics and increases training difficulty.

Conformer Sampling for Noise: Data augmentation (randomly sampling one conformer from an ensemble during training) improves performance and robustness when underlying conformers are imprecise (e.g., the forcefield-generated conformers in the BDE subset).

3D vs 2D: 3D models generally outperform 2D graph models, especially for conformationally-sensitive properties, though 1D and 2D methods remain highly competitive on low-resource datasets or less rotation-sensitive properties.

Model architecture: GemNet and PaiNN architectures consistently top-ranked across tasks.

Reproducibility Details

  • Data: The Drugs-75K, Kraken, and BDE subsets are openly available via Google Drive links on the project’s GitHub repository. However, the EE dataset remains closed-source/proprietary (as of 2026), making the EE suite of the benchmark currently irreproducible.
  • Code: The benchmark suite and PyTorch-Geometric dataset loaders are open-sourced at GitHub (SXKDZ/MARCEL) under the Apache-2.0 license.
  • Hardware: The authors trained these models using Nvidia A100 (40GB) GPUs. Memory-intensive models (e.g., GemNet, LEFTNet) required Nvidia H100 (80GB) GPUs. Total computation across all benchmark experiments was approximately 6,000 GPU hours.
  • Algorithms/Models: Hyperparameters for all 18 evaluated models are provided directly within the repository configuration files (benchmarks/params), ensuring clear algorithmic reproducibility. All baseline models use popular, publicly available frameworks (e.g., PyTorch Geometric, OGB, RDKit).
  • Evaluation: Detailed evaluation scripts are provided in the repository with consistent tracking of Mean Absolute Error (MAE) and proper configuration of benchmark splits.

Citation

@inproceedings{zhu2024learning,
title={Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks},
author={Yanqiao Zhu and Jeehyun Hwang and Keir Adams and Zhen Liu and Bozhao Nan and Brock Stenfors and Yuanqi Du and Jatin Chauhan and Olaf Wiest and Olexandr Isayev and Connor W. Coley and Yizhou Sun and Wei Wang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=NSDszJ2uIV}
}