Overview

This taxonomy provides a systematic framework for understanding and classifying research papers at the intersection of AI and the physical sciences. Rather than treating papers as belonging to a single category, it uses a superposition model where each paper is viewed as a linear combination of six fundamental contribution types (basis vectors).

The framework helps answer: “What is this paper’s primary contribution?” by identifying rhetorical patterns and structural elements that signal different research paradigms.

Core Principle: The Superposition Model

All papers in this domain can be viewed as a linear combination (superposition) of fundamental contribution vectors.

Concept: A single paper is rarely a pure category (e.g., 100% Method). It might be:

$$\text{Paper} = 0.7 \Psi_{\text{Method}} + 0.2 \Psi_{\text{Theory}} + 0.1 \Psi_{\text{Resource}}$$

Goal: To “bin” a paper, you must determine which vector is dominant (the highest coefficient).

The Six Independent Basis Vectors ($\Psi$)

Basis VectorAlias/FocusCore QuestionPrimary Output
1. $\Psi_{\text{Method}}$The Methodological Basis (Architecture/Algorithm)How well does this work?New algorithm, architecture, or approximation
2. $\Psi_{\text{Theory}}$The Theoretical Basis (Formal Analysis)Why does this work?Formal proof, generalization bound, or physical derivation
3. $\Psi_{\text{Resource}}$The Infrastructure Basis (Data/Software)What resources are available?Dataset, benchmark, or open-source software ecosystem
4. $\Psi_{\text{Systematization}}$The Review Basis (Synthesis)What do we know?Comprehensive survey or new organizing taxonomy (SoK)
5. $\Psi_{\text{Position}}$The Sociological Basis (Perspective)Where should the field go?Opinion piece, perspective, or critique of community practice
6. $\Psi_{\text{Discovery}}$The Translational Basis (Application)What new thing did we find?Experimentally validated material, molecule, or physical law

Binning Guide: Rhetorical Indicators

To identify the dominant basis vector, look for these specific rhetorical elements, structural features, and claims in the paper:

1. $\Psi_{\text{Method}}$: The Methodological Paper

Focuses on proposing a novel mechanism, architecture, or approximation (e.g., a new Transformer variant, a GNN with symmetry, a new DFT functional).

Rhetorical Indicators:

  • Ablation Study: Authors systematically remove components of their system to prove their specific innovation drives the performance gain
  • Baseline Comparison: A prominent table comparing the new method against the State-of-the-Art (SOTA)
  • Pseudo-code: An explicit block detailing the algorithmic steps (e.g., for training, sampling, or inference)

2. $\Psi_{\text{Theory}}$: The Theoretical Paper

Focuses on mathematical guarantees, proofs, or derivations from first principles.

Rhetorical Indicators:

  • Mathematical Proof Sections: Sections titled “Theorem 1,” “Proof of Equivariance,” or “Formal Bounds”
  • Analysis of Limits/Capacity: Investigates the expressivity (e.g., comparing a GNN to the Weisfeiler-Lehman Test) or analyzes the geometry of the optimization landscape
  • Generalization/OOD: Derives generalization bounds on test error or formally defines “chemical space coverage” for out-of-distribution (OOD) behavior
  • Exact Constraints: Derives exact conditions that true physical functions (like the universal Density Functional) must satisfy

3. $\Psi_{\text{Resource}}$: The Infrastructure Paper

Focuses on creating and sharing foundational tools for the community.

Rhetorical Indicators:

  • Curation Description: Detailed steps on how data was generated, filtered, or curated (e.g., describing millions of CPU-hours of DFT calculations for a dataset like QM9)
  • “Datasheets” and “Data Cards”: Inclusion of formal documentation detailing provenance, copyright, and potential biases in the data
  • Benchmark Definition: Argues that “Metric X on Dataset Y” is the correct proxy for progress in a specific scientific task

4. $\Psi_{\text{Systematization}}$: The Review Paper

Focuses on organizing and synthesizing existing literature.

Rhetorical Indicators:

  • Survey Structure: Follows a linear, often chronological, progression or is grouped by architecture (e.g., VAEs, GANs, Diffusion)
  • Systematization of Knowledge (SoK): A higher-order contribution that proposes a new taxonomy or a unified framework to connect disparate concepts

5. $\Psi_{\text{Position}}$: The Sociological Paper

Focuses on meta-science, arguing for a change in community norms, or critiquing systemic issues.

Rhetorical Indicators:

  • Venue/Track: Often found in “Position Tracks” or called “Blue Sky” or “Forward Looking” papers
  • Argumentative Tone: Uses qualitative or quantitative analysis (meta-analysis) to argue for a shift in how research is conducted or funded (e.g., a paper arguing that AI contracts the focus of science)
  • Goal: To highlight a systemic issue, not to claim SOTA or provide a proof

6. $\Psi_{\text{Discovery}}$: The Translational Paper

Focuses on using AI/ML as a tool to find a novel scientific artifact, not as the primary object of study.

Rhetorical Indicators:

  • Structure: Follows a workflow: (1) Computational Screening (AI selects candidates), (2) Experimental Validation (wet-lab synthesis, physical characterization)
  • Core Claim: The primary contribution is a new material, molecule, or measurement, with the AI/ML part serving as the necessary first step
  • Key Question: Does the AI’s prediction hold true in reality (e.g., in a physical experiment)?

Usage Notes

When reading a paper:

  1. Identify which rhetorical indicators appear most prominently
  2. Estimate the “coefficient” for each basis vector (which aspects dominate)
  3. Classify by the dominant vector (highest coefficient)
  4. Note secondary contributions for a complete understanding

This framework is particularly useful for:

  • Organizing literature reviews
  • Understanding conference/journal acceptance criteria
  • Identifying gaps in research portfolios
  • Recognizing different types of scientific contributions