Paper Information

Citation: Nguyen, V., Boyd-Graber, J., Resnik, P., & Miler, K. (2015). Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 1438-1448. https://doi.org/10.3115/v1/P15-1139

Publication: ACL 2015

What kind of paper is this?

Method.

This paper is primarily a Methodological contribution. It proposes a novel probabilistic architecture - the Hierarchical Ideal Point Topic Model (HIPTM) - designed to solve the specific limitations of existing political science models (which typically rely on either voting data or text data in isolation). The paper validates this method by demonstrating its superior performance in predicting “Tea Party” membership compared to text-only baselines and its ability to provide interpretable “framing” analysis.

What is the motivation?

The primary motivation is to better understand political polarization, specifically the “Tea Party” phenomenon within the Republican party during the 112th Congress.

Standard “Ideal Point” models (like DW-NOMINATE) typically project legislators onto a single liberal-conservative dimension using only binary voting data. This is insufficient for capturing complex, multi-dimensional intra-party conflicts where legislators might agree on votes but differ on policy “framing” or specific sub-issues. Furthermore, existing multi-dimensional models often produce dimensions that are difficult for humans to interpret.

What is the novelty here?

The core novelty is the Hierarchical Ideal Point Topic Model (HIPTM). It distinguishes itself from prior work through three main technical innovations:

  1. Joint Modeling of Three Data Sources: It integrates roll call votes, the text of bills, and the floor speeches of legislators into a single probabilistic framework.
  2. Hierarchical Topic Structure: It models “frames” as a second level of the topic hierarchy. “Issues” (level 1) are fixed and non-polarized, while “Frames” (level 2) are discovered dynamically and carry polarity (ideal point weights).
  3. Text-Based Ideal Point Prediction: Unlike models that require voting records for inference, HIPTM regresses ideal points on speech text, allowing it to predict the political alignment of legislators based solely on their writing or speeches.

What experiments were performed?

The authors validated the model using data from the 112th U.S. Congress (Republican legislators only).

  • Prediction Task: Classifying legislators as members of the “Tea Party Caucus”.
  • Baselines: The model was compared against Support Vector Machines (SVM) trained on:
    • TF-IDF vectors (Text only)
    • Normalized TF-IDF (Text only)
    • Binary Vote vectors (Vote only)
  • Metric: Area Under the Receiver Operating Characteristic Curve (AUC-ROC) via 5-fold cross-validation.
  • Qualitative Analysis: The authors examined the “span” of ideal points within specific topics (e.g., Macroeconomics, Health) to identify which issues were most polarized between Tea Party and Establishment Republicans.

What were the outcomes and conclusions drawn?

  • Quantitative Performance: HIPTM features combined with voting data achieved the highest classification performance. Significantly, when voting data was withheld (simulating a candidate without a record), HIPTM’s text-based features outperformed standard TF-IDF baselines in predicting Tea Party affiliation.
  • Political Insight: The model identified “Macroeconomics” and “Government Operations” as the most polarized topics.
  • Framing vs. Voting: The authors constructed a taxonomy of disagreement. They found issues where Republicans voted together but framed arguments differently (e.g., Health/Obamacare) versus issues where both votes and frames were polarized (e.g., Budget/Debt Ceiling).
  • Sub-group Identification: The model successfully identified “Green Tea” Republicans - legislators who are not in the caucus but align with its ideology, such as Justin Amash.

Reproducibility Details

Data

The study focuses on the 112th U.S. Congress (Jan 2011 - Jan 2013).

PurposeDatasetSizeNotes
SubjectsRepublican Legislators240 Reps60 are Tea Party Caucus members.
VotesRoll Call Votes13,856 votesBinary (Yes/No). Includes 60 “Key Votes” identified by Freedom Works.
TextFloor Speeches~5,349 word typesSourced from GovTrack. Vocabulary size after preprocessing.
PriorsCongressional Bills Project19 TopicsUsed to set informed priors $\phi^*_k$ for top-level issues.

Algorithms

The model uses a Stochastic EM approach for inference.

  • Generative Process:
    • Speeches: Modeled as a mixture of $K$ Hierarchical Dirichlet Processes (HDPs). A legislator chooses an issue $z$, then a frame $t$ from a Dirichlet Process, then a word $w$.
    • Bills: Modeled using Latent Dirichlet Allocation (LDA). Each bill is a mixture over $K$ issues.
    • Votes: Modeled via a probabilistic ideal point function (logistic/inverse-logit). The probability of a “Yes” vote depends on the bill’s polarity $x_b$, popularity $y_b$, and the legislator’s issue-specific ideal point $u_{a,k}$.
  • Optimization Steps:
    1. Sampling: Issue assignments $z$ and frame assignments $t$ are sampled for tokens in speeches and bills.
    2. Regression: Frame-specific regression weights $\eta_{k,j}$ are optimized using L-BFGS.
    3. Ideal Points: Legislator ideal points $u_{a,k}$ and bill parameters ($x_b, y_b$) are updated using Gradient Ascent.

Models

  • Ideal Point Definition: A legislator’s ideal point on issue $k$ ($u_{a,k}$) is defined as a linear combination of the ideal points of the _frames_ they use ($\eta_{k,j}$), weighted by their usage frequency ($\hat{\psi}_{a,k,j}$).
  • Topic Hierarchy:
    • Level 1 (Issues): Fixed at $K=19$ (based on Policy Agendas Project major headings). These nodes use informed Dirichlet priors.
    • Level 2 (Frames): Unbounded number of frames per issue, discovered non-parametrically via Dirichlet Process.
  • Prediction Features: For the classification task, the model averages feature values over 10 samples taken every 50 iterations after a 500-iteration burn-in.

Evaluation

  • Primary Metric: AUC-ROC (Area Under the Receiver Operating Characteristic Curve).
  • Classifier: $\text{SVM}^{\text{light}}$ with cost-factor adjustment for class imbalance.
  • Cross-Validation: 5-fold stratified sampling.

Citation

@inproceedings{nguyenTeaPartyHouse2015,
  title = {Tea {{Party}} in the {{House}}: {{A Hierarchical Ideal Point Topic Model}} and {{Its Application}} to {{Republican Legislators}} in the 112th {{Congress}}},
  shorttitle = {Tea {{Party}} in the {{House}}},
  booktitle = {Proceedings of the 53rd {{Annual Meeting}} of the {{Association}} for {{Computational Linguistics}} and the 7th {{International Joint Conference}} on {{Natural Language Processing}} ({{Volume}} 1: {{Long Papers}})},
  author = {Nguyen, Viet-An and {Boyd-Graber}, Jordan and Resnik, Philip and Miler, Kristina},
  year = {2015},
  pages = {1438--1448},
  publisher = {Association for Computational Linguistics},
  address = {Beijing, China},
  doi = {10.3115/v1/P15-1139},
  urldate = {2023-11-02},
  abstract = {We introduce the Hierarchical Ideal Point Topic Model, which provides a rich picture of policy issues, framing, and voting behavior using a joint model of votes, bill text, and the language that legislators use when debating bills. We use this model to look at the relationship between Tea Party Republicans and ``establishment'' Republicans in the U.S. House of Representatives during the 112th Congress.},
  langid = {english}
}