Overview
This taxonomy provides a systematic framework for understanding and classifying research papers at the intersection of AI and the physical sciences. Rather than treating papers as belonging to a single category, it uses a superposition model where each paper is viewed as a linear combination of six fundamental contribution types (basis vectors).
The framework helps answer: “What is this paper’s primary contribution?” by identifying rhetorical patterns and structural elements that signal different research paradigms.
Core Principle: The Superposition Model
All papers in this domain can be viewed as a linear combination (superposition) of fundamental contribution vectors.
Concept: A single paper is rarely a pure category (e.g., 100% Method). It might be:
$$\text{Paper} = 0.7 \Psi_{\text{Method}} + 0.2 \Psi_{\text{Theory}} + 0.1 \Psi_{\text{Resource}}$$
Goal: To “bin” a paper, you must determine which vector is dominant (the highest coefficient).
The Six Independent Basis Vectors ($\Psi$)
| Basis Vector | Alias/Focus | Core Question | Primary Output |
|---|---|---|---|
| 1. $\Psi_{\text{Method}}$ | The Methodological Basis (Architecture/Algorithm) | How well does this work? | New algorithm, architecture, or approximation |
| 2. $\Psi_{\text{Theory}}$ | The Theoretical Basis (Formal Analysis) | Why does this work? | Formal proof, generalization bound, or physical derivation |
| 3. $\Psi_{\text{Resource}}$ | The Infrastructure Basis (Data/Software) | What resources are available? | Dataset, benchmark, or open-source software ecosystem |
| 4. $\Psi_{\text{Systematization}}$ | The Review Basis (Synthesis) | What do we know? | Comprehensive survey or new organizing taxonomy (SoK) |
| 5. $\Psi_{\text{Position}}$ | The Sociological Basis (Perspective) | Where should the field go? | Opinion piece, perspective, or critique of community practice |
| 6. $\Psi_{\text{Discovery}}$ | The Translational Basis (Application) | What new thing did we find? | Experimentally validated material, molecule, or physical law |
Binning Guide: Rhetorical Indicators
To identify the dominant basis vector, look for these specific rhetorical elements, structural features, and claims in the paper:
1. $\Psi_{\text{Method}}$: The Methodological Paper
Focuses on proposing a novel mechanism, architecture, or approximation (e.g., a new Transformer variant, a GNN with symmetry, a new DFT functional).
Rhetorical Indicators:
- Ablation Study: Authors systematically remove components of their system to prove their specific innovation drives the performance gain
- Baseline Comparison: A prominent table comparing the new method against the State-of-the-Art (SOTA)
- Pseudo-code: An explicit block detailing the algorithmic steps (e.g., for training, sampling, or inference)
2. $\Psi_{\text{Theory}}$: The Theoretical Paper
Focuses on mathematical guarantees, proofs, or derivations from first principles.
Rhetorical Indicators:
- Mathematical Proof Sections: Sections titled “Theorem 1,” “Proof of Equivariance,” or “Formal Bounds”
- Analysis of Limits/Capacity: Investigates the expressivity (e.g., comparing a GNN to the Weisfeiler-Lehman Test) or analyzes the geometry of the optimization landscape
- Generalization/OOD: Derives generalization bounds on test error or formally defines “chemical space coverage” for out-of-distribution (OOD) behavior
- Exact Constraints: Derives exact conditions that true physical functions (like the universal Density Functional) must satisfy
3. $\Psi_{\text{Resource}}$: The Infrastructure Paper
Focuses on creating and sharing foundational tools for the community.
Rhetorical Indicators:
- Curation Description: Detailed steps on how data was generated, filtered, or curated (e.g., describing millions of CPU-hours of DFT calculations for a dataset like QM9)
- “Datasheets” and “Data Cards”: Inclusion of formal documentation detailing provenance, copyright, and potential biases in the data
- Benchmark Definition: Argues that “Metric X on Dataset Y” is the correct proxy for progress in a specific scientific task
4. $\Psi_{\text{Systematization}}$: The Review Paper
Focuses on organizing and synthesizing existing literature.
Rhetorical Indicators:
- Survey Structure: Follows a linear, often chronological, progression or is grouped by architecture (e.g., VAEs, GANs, Diffusion)
- Systematization of Knowledge (SoK): A higher-order contribution that proposes a new taxonomy or a unified framework to connect disparate concepts
5. $\Psi_{\text{Position}}$: The Sociological Paper
Focuses on meta-science, arguing for a change in community norms, or critiquing systemic issues.
Rhetorical Indicators:
- Venue/Track: Often found in “Position Tracks” or called “Blue Sky” or “Forward Looking” papers
- Argumentative Tone: Uses qualitative or quantitative analysis (meta-analysis) to argue for a shift in how research is conducted or funded (e.g., a paper arguing that AI contracts the focus of science)
- Goal: To highlight a systemic issue, not to claim SOTA or provide a proof
6. $\Psi_{\text{Discovery}}$: The Translational Paper
Focuses on using AI/ML as a tool to find a novel scientific artifact, not as the primary object of study.
Rhetorical Indicators:
- Structure: Follows a workflow: (1) Computational Screening (AI selects candidates), (2) Experimental Validation (wet-lab synthesis, physical characterization)
- Core Claim: The primary contribution is a new material, molecule, or measurement, with the AI/ML part serving as the necessary first step
- Key Question: Does the AI’s prediction hold true in reality (e.g., in a physical experiment)?
Usage Notes
When reading a paper:
- Identify which rhetorical indicators appear most prominently
- Estimate the “coefficient” for each basis vector (which aspects dominate)
- Classify by the dominant vector (highest coefficient)
- Note secondary contributions for a complete understanding
This framework is particularly useful for:
- Organizing literature reviews
- Understanding conference/journal acceptance criteria
- Identifying gaps in research portfolios
- Recognizing different types of scientific contributions
