Overview

This taxonomy provides a systematic framework for understanding and classifying research papers at the intersection of AI and the physical sciences. It uses a superposition model where each paper is viewed as a linear combination of six fundamental contribution types (basis vectors).

The framework helps answer: “What is this paper’s primary contribution?” by identifying rhetorical patterns and structural elements that signal different research paradigms.

Core Principle: The Superposition Model

All papers in this domain can be viewed as a superposition of fundamental contribution vectors.

Concept: Most papers exhibit a complex profile across six basis vectors, blending multiple contribution types (e.g., Method + Theory). While one vector usually provides the primary narrative thrust (the “Why”), secondary vectors provide the necessary supporting evidence (the “How”).

Goal: Classify a paper by identifying its Primary Projection (the dominant contribution) and its Secondary Projections (supporting work).

The Six Independent Basis Vectors ($\Psi$)

Basis VectorAlias/FocusCore QuestionPrimary Output
1. $\Psi_{\text{Method}}$The Methodological Basis (Architecture/Algorithm)How well does this work?New algorithm, architecture, or approximation
2. $\Psi_{\text{Theory}}$The Theoretical Basis (Formal Analysis)Why does this work?Formal proof, generalization bound, or physical derivation
3. $\Psi_{\text{Resource}}$The Infrastructure Basis (Data/Software)What resources are available?Dataset, benchmark, or open-source software ecosystem
4. $\Psi_{\text{Systematization}}$The Review Basis (Synthesis)What do we know?Comprehensive survey or new organizing taxonomy (SoK)
5. $\Psi_{\text{Position}}$The Sociological Basis (Perspective)Where should the field go?Opinion piece, perspective, or critique of community practice
6. $\Psi_{\text{Discovery}}$The Translational Basis (Application)What new thing did we find?Experimentally validated material, molecule, or physical law

Assessment Guide: Rhetorical Indicators

To identify the primary basis vector, look for these specific rhetorical elements, structural features, and claims in the paper:

1. $\Psi_{\text{Method}}$: The Methodological Paper

Focuses on proposing a novel mechanism, architecture, or approximation (e.g., a new Transformer variant, a GNN with symmetry, a new DFT functional).

Rhetorical Indicators:

  • Ablation Study: Authors systematically remove components of their system to prove their specific innovation drives the performance gain
  • Baseline Comparison: A prominent table comparing the new method against the State-of-the-Art (SOTA)
  • Pseudo-code: An explicit block detailing the algorithmic steps (e.g., for training, sampling, or inference)

2. $\Psi_{\text{Theory}}$: The Theoretical Paper

Focuses on mathematical guarantees, proofs, or derivations from first principles.

Rhetorical Indicators:

  • Mathematical Proof Sections: Sections titled “Theorem 1,” “Proof of Equivariance,” or “Formal Bounds”
  • Analysis of Limits/Capacity: Investigates the expressivity (e.g., comparing a GNN to the Weisfeiler-Lehman Test) or analyzes the geometry of the optimization landscape
  • Generalization/OOD: Derives generalization bounds on test error or formally defines “chemical space coverage” for out-of-distribution (OOD) behavior
  • Exact Constraints: Derives exact conditions that true physical functions (like the universal Density Functional) must satisfy

3. $\Psi_{\text{Resource}}$: The Infrastructure Paper

Focuses on creating and sharing foundational tools for the community.

Rhetorical Indicators:

  • Curation Description: Detailed steps on how data was generated, filtered, or curated (e.g., describing millions of CPU-hours of DFT calculations for a dataset like QM9)
  • “Datasheets” and “Data Cards”: Inclusion of formal documentation detailing provenance, copyright, and potential biases in the data
  • Benchmark Definition: Argues that “Metric X on Dataset Y” is the correct proxy for progress in a specific scientific task

4. $\Psi_{\text{Systematization}}$: The Review Paper

Focuses on organizing and synthesizing existing literature.

Rhetorical Indicators:

  • Survey Structure: Follows a linear, often chronological, progression or is grouped by architecture (e.g., VAEs, GANs, Diffusion)
  • Systematization of Knowledge (SoK): A higher-order contribution that proposes a new taxonomy or a unified framework to connect disparate concepts

5. $\Psi_{\text{Position}}$: The Sociological Paper

Focuses on meta-science, arguing for a change in community norms, or critiquing systemic issues.

Rhetorical Indicators:

  • Venue/Track: Often found in “Position Tracks” or called “Blue Sky” or “Forward Looking” papers
  • Argumentative Tone: Uses qualitative or quantitative analysis (meta-analysis) to argue for a shift in how research is conducted or funded (e.g., a paper arguing that AI contracts the focus of science)
  • Goal: To highlight a systemic issue

6. $\Psi_{\text{Discovery}}$: The Translational Paper

Focuses on the discovery of novel scientific artifacts using AI/ML tools.

Rhetorical Indicators:

  • Structure: Follows a workflow: (1) Computational Screening (AI selects candidates), (2) Experimental Validation (wet-lab synthesis, physical characterization)
  • Core Claim: The primary contribution is a new material, molecule, or measurement, with the AI/ML part serving as the necessary first step
  • Key Question: Does the AI’s prediction hold true in reality (e.g., in a physical experiment)?

Usage Notes

When assessing a paper:

  1. Look for the rhetorical indicators to identify the Primary basis vector (the main claim/narrative).
  2. Identify any Secondary basis vectors (heavy supporting work).
  3. Use this “fingerprint” (e.g., Primary: Method, Secondary: Resource) to accurately map the paper’s contribution to the broader field.

This framework is particularly useful for:

  • Organizing literature reviews
  • Understanding conference/journal acceptance criteria
  • Identifying gaps in research portfolios
  • Recognizing different types of scientific contributions