Contents

United States 117th Congress: A Preliminary Data Exploration

A preliminary data exploration of the 117th Congress of the United States

Introduction

For a long time, I’ve wanted to interact with legislation data. I’ve been curious about the process of how a bill becomes a law, and what kind of bills are introduced versus what kind of bills are passed. More than anything else, I’ve been curious about how one could take such data and find ways of exploring and presenting it to people such that they could understand the process better themselves. Add a little transparency to the process, if you will.

Well, I’ve finally gotten around to this project, and today I want to discuss the data I’ve collected and how I’ve gone about exploring it. In the future, I want to drill down into geographic analysis of the data (i.e. what states are introducing what bills, what states are passing what bills, etc.) at the federal level. I also want to see if there’s ways one could observe the politicians themselves and develop a model for predicting how they will vote on a bill.

All of these goals are quite ambitious, so I’m going to start small. Today, I’ll be focusing on the data I’ve collected and some basic exploratory analysis.

Data - How It Was Obtained

My primary data source is Congress.gov. This site is maintained by the Library of Congress and is a great resource for anyone interested in the legislative process.

There’s way more data than I could possibly use, so for now I’m focusing on the 117th Congress. Furthermore, I’m only considering resolutions and joint resolutions, but in the future we may consider concurrent resolutions as well. We omit all amendments for the time being.

Here’s the data I collected:

Bill Type Introduced
House Resolution 9,698
House Joint Resolution 106
Senate Resolution 5,357
Senate Joint Resolution 70
Total 15,231

We’ll dig in a bit deeper into the data in the next section, but for now I want to discuss how I obtained the data.

Funny enough, building out a crawler for this site was pretty easy. The site is well organized and the data is presented in a consistent manner. I used the following technologies:

Because the site dynamically loads data, I had to use Selenium to load the page and then use BeautifulSoup to parse the HTML instead of something like requests. Furthermore, to respect the site’s bandwidth, I added a 5-second delay between each page load. This resulted in the crawler taking about 3 days to run.

The crawler is available on GitHub if you’re interested in checking it out. In the near term future, I’ll also release the pre-processed data publicly so that others may be able to use it and provide their own analysis. I’ll also briefly summarize the process of crawling the site here.

For every bill, we need to query two pages:

  • The bill’s all info page, containing everything but text: f'https://www.congress.gov/bill/{congress}th-congress/{bill_type}/{bill_id}/all-info'
  • The bill’s text page, containing the current text of the bill: f'https://www.congress.gov/bill/{congress}th-congress/{bill_type}/{bill_id}/text{version}?format=txt'

You’ll notice that there are three variables in the URLs: congress, bill_type, and bill_id. For this analysis, we only consider congress=117 (the 117th Congress, 2021-2023). We also only consider bill_type in the set:

  • house-resolution
  • house-joint-resolution
  • senate-resolution

Finally, we only consider bill_id in the set of all bill IDs for the given bill_type and congress. For this, I just manually checked the last bill ID for each bill_type and congress and then iterate from 1 to that number.

I implemented a basic caching mechanism to avoid re-querying the same page multiple times, that way I could re-iterate my post-processing code without having to re-query the site.

Actually parsing the HTML was pretty straightforward. Just a matter of targeting key divs, tables, and spans. Here and there, I use a dash of regex for simple post-processing of text.

Data - What It Looks Like

In this post, I want to focus on all the top-level features I’ve collected from the data. This means that, for now, I won’t be crossing features to see their joint distributions, I’ll also be ignoring the text of the bills for now. The rest of this article will be partitioned into sections based on the feature.

Overview - Tracker Status

First, I want to focus on what people probably care most about. What bills are being introduced and what bills are being passed? What happens to a bill in a given congress?

For every bill that ends up on Congress.gov, there’s a tracker status that indicates where the bill is in the legislative process. Based on the statuses in this current data set, the tracker status is a string that can take on one of the following values:

Tracker Status Description
Introduced The bill has been introduced to the chamber
Passed House The bill has been passed by the House
Passed Senate The bill has been passed by the Senate
Failed House The bill has failed to pass the House
Failed Senate The bill has failed to pass the Senate
Resolving Differences The bill has been passed by both chambers but the chambers have not yet resolved differences
Became Law The bill has been signed by the President and has become law
Became Private Law The bill has been signed by the President and has become private law

If we like, we could further collapse these categories into three coarse categories:

  • Introduced: Any bill that has been introduced to a chamber but never seen a vote (i.e. Introduced)
  • Stalled: Any bill that has seen a vote but has not become law. Especially since the session is over, we can assume that any bill in this category will not become law (i.e. Passed House, Passed Senate, Failed House, Failed Senate, Resolving Differences)
  • Law: Any bill that has become law (i.e. Became Law, Became Private Law)
Introduced Stalled Law
House Resolution 8,977 523 198
House Joint Resolution 102 1 3
Senate Resolution 5,083 114 160
Senate Joint Resolution 57 9 4
Total 14,219 647 365

Some quick observations here:

  • Less than 7% of bills introduced ever see a vote. They’re either introduced and then forgotten about or they’re introduced and then die in committee.
  • Of the bills that see a vote, around 36% become law. Compared to the total, approximately 2% of bills introduced become law.

All of this is conditional on the bills I’ve collected, and of the 117th Congress.

Overview - Sponsor

The sponsor of a bill is the primary member of Congress who introduced the bill. Sponsors have a party affiliation and a state, so we can look at the distribution of bills by party and state. This assumes those factors are relevant to the bill’s success, which may or may not be true.

Party

Introduced Stalled Law
Democrat 8,271 437 235
Republican 5,883 210 130
Independent 65 0 0

Some observations:

  • Democrats:
    • Around 7.5% of bills introduced by Democrats moved out of the Introduced stage.
    • Around 2.6% of bills introduced by Democrats became law.
    • Around 35% of bills that moved out of the Introduced stage became law.
  • Republicans:
    • Around 5.5% of bills introduced by Republicans moved out of the Introduced stage.
    • Around 2.1% of bills introduced by Republicans became law.
    • Around 38% of bills that moved out of the Introduced stage became law.

So, overall, Democrats are introducing more bills and more of their bills are becoming law. However, the percentage of bills that become law is fairly similar between the two parties. Additionally, Republicans have a higher percentage of bills that move out of the Introduced stage and become law.

State

Here’s a table of the top 10 states by number of bills introduced:

Ranking State: Introduced State: Stalled State: Law
1 CA: 1350 CA: 93 CA: 34
2 TX: 879 NY: 44 MI: 30
3 NY: 784 TX: 43 TX: 25
4 FL: 766 MI: 28 NY: 24
5 IL: 660 NJ: 28 MN: 17
6 PA: 521 IL: 27 IL: 16
7 NJ: 478 VA: 26 OH: 11
8 MI: 380 FL: 24 VA: 1
9 OH: 377 PA: 22 FL: 11
10 MA: 361 OH: 19 GA: 9

We might also want to normalize by the number of representatives per state. This would give us a better idea of which states are introducing the most bills per representative.

Here’s a table of the top 10 states by number of bills introduced per representative:

Ranking State: Introduced State: Stalled State: Law
1 DC: 101.0 DC: 7.0 AK: 2.2
2 NH: 47.5 AK: 2.8 NH: 2.0
3 MT: 44.0 IA: 2.3 MT: 2.0
4 OR: 41.0 SD: 2.3 MI: 1.9
5 NV: 40.0 NH: 2.2 MN: 1.5
6 DE: 38.7 VA: 2.0 HI: 1.5
7 SD: 38.3 NJ: 2.0 CT: 1.3
8 IA: 37.7 PR: 2.0 IA: 1.2
9 RI: 36.5 NV: 1.8 OR: 1.1
10 UT: 36.0 MO: 1.8 SD: 1.0

Individual

Why not also look at the top 10 individual sponsors by number of bills introduced, etc.?

Ranking Individual: Introduced Individual: Stalled Individual: Law
1 Sen. Rubio (R-FL): 186 Sen. Peters (D-MI): 11 Sen. Peters (D-MI): 19
2 Sen. Klobuchar (D-MN): 143 Sen. Cornyn (R-TX): 8 Sen. Cornyn (R-TX): 15
3 Sen. Lee (R-UT): 125 Rep. Connolly (D-VA-11): 8 Sen. Klobuchar (D-MN): 7
4 Sen. Markey (D-MA): 118 Rep. Takano (D-CA-41): 8 Sen. Tester (D-MT): 6
5 Sen. Casey (D-PA): 116 Sen. Grassley (R-IA): 7 Sen. Rubio (R-FL): 6
6 Sen. Cortez Masto (D-NV): 109 Del. Norton (D-DC): 7 Rep. DeLauro (D-CT-3): 6
7 Sen. Booker (D-NJ): 106 Rep. Johnson (D-TX-30): 7 Sen. Grassley (R-IA): 5
8 Sen. Durbin (D-IL): 102 Rep. Katko (R-NY-24): 7 Sen. Ossoff (D-GA): 4
9 Del. Norton (D-DC): 101 Rep. Dean (D-PA-4): 6 Sen. Murkowski (R-AK): 4
10 Sen. Menendez (D-NJ): 99 Rep. Wagner (R-MO-2): 6 Sen. Padilla (D-CA): 4

What about a score based on a ratio of bills introduced to bills that became law?

$$ \text{score} = \frac{\text{bills that became law}}{\text{bills introduced} + \text{bills stalled} + \text{bills that became law}} $$

Ranking Individual: Score
1 Rep. Pelosi (D-CA-12): 0.500
2 Rep. Mrvan (D-IN-1): 0.444
3 Rep. Yarmuth (D-KY-3): 0.333
4 Rep. Stivers (R-OH-15): 0.250
5 Rep. Graves (R-MO-6): 0.222
6 Rep. Jeffries (D-NY-8): 0.200
7 Rep. Neal (D-MA-1): 0.200
8 Rep. Palazzo (R-MS-4): 0.200
9 Sen. Peters (D-MI): 0.186
10 Rep. Fischbach (R-MN-7): 0.176

Policy Areas

Every bill is assigned a primary policy area. Here, we’re just doing a quick look at the most common policy areas by bill status.

Ranking Policy Area: Introduced Policy Area: Stalled Policy Area: Law
1 Health: 1885 Government Operations and Politics: 79 Government Operations and Politics: 94
2 Armed Forces and National Security: 1114 Armed Forces and National Security: 60 Armed Forces and National Security: 69
3 Taxation: 1066 International Affairs: 60 Crime and Law Enforcement: 31
4 Government Operations and Politics: 982 Health: 56 Health: 19
5 International Affairs: 866 Crime and Law Enforcement: 44 Native Americans: 17
6 Crime and Law Enforcement: 842 Public Lands and Natural Resources: 44 International Affairs: 14
7 Education: 663 Science, Technology, Communications: 44 Economics and Public Finance: 13
8 Transportation and Public Works: 663 Commerce: 43 Public Lands and Natural Resources: 13
9 Public Lands and Natural Resources: 548 Finance and Financial Sector: 34 Commerce: 13
10 Finance and Financial Sector: 547 Emergency Management: 27 Emergency Management: 11

Wrapping Up

This is just a quick look at the data, and there’s so much more to consider. We haven’t looked at committees or cosponsors. We haven’t looked at voting patterns or the text of the bills themselves. Nevertheless, we can see that there are some interesting patterns in the data, and we can use this to guide our future analysis. Hopefully these tidbits will inspire you to dig deeper into the data and find your own interesting patterns.

If you have any thoughts, questions, feedback, or general concerns, don’t hesitate to reach out!