Introduction

For a long time, I’ve wanted to interact with legislation data. I’ve been curious about the process of how a bill becomes a law, and what kind of bills are introduced versus what kind of bills are passed. More than anything else, I’ve been curious about how one could take such data and find ways of exploring and presenting it to people such that they could understand the process better themselves. Add a little transparency to the process, if you will.

Well, I’ve finally gotten around to this project, and today I want to discuss the data I’ve collected and how I’ve gone about exploring it. In the future, I want to drill down into geographic analysis of the data (i.e. what states are introducing what bills, what states are passing what bills, etc.) at the federal level. I also want to see if there’s ways one could observe the politicians themselves and develop a model for predicting how they will vote on a bill.

All of these goals are quite ambitious, so I’m going to start small. Today, I’ll be focusing on the data I’ve collected and some basic exploratory analysis.

Data - How It Was Obtained

My primary data source is Congress.gov. This site is maintained by the Library of Congress and is a great resource for anyone interested in the legislative process.

There’s way more data than I could possibly use, so for now I’m focusing on the 117th Congress. Furthermore, I’m only considering resolutions and joint resolutions, but in the future we may consider concurrent resolutions as well. We omit all amendments for the time being.

Here’s the data I collected:

Bill TypeIntroduced
House Resolution9,698
House Joint Resolution106
Senate Resolution5,357
Senate Joint Resolution70
Total15,231

We’ll dig in a bit deeper into the data in the next section, but for now I want to discuss how I obtained the data.

Funny enough, building out a crawler for this site was pretty easy. The site is well organized and the data is presented in a consistent manner. I used the following technologies:

Because the site dynamically loads data, I had to use Selenium to load the page and then use BeautifulSoup to parse the HTML instead of something like requests. Furthermore, to respect the site’s bandwidth, I added a 5-second delay between each page load. This resulted in the crawler taking about 3 days to run.

The crawler is available on GitHub if you’re interested in checking it out. In the near term future, I’ll also release the pre-processed data publicly so that others may be able to use it and provide their own analysis. I’ll also briefly summarize the process of crawling the site here.

For every bill, we need to query two pages:

  • The bill’s all info page, containing everything but text: f'https://www.congress.gov/bill/{congress}th-congress/{bill_type}/{bill_id}/all-info'
  • The bill’s text page, containing the current text of the bill: f'https://www.congress.gov/bill/{congress}th-congress/{bill_type}/{bill_id}/text{version}?format=txt'

You’ll notice that there are three variables in the URLs: congress, bill_type, and bill_id. For this analysis, we only consider congress=117 (the 117th Congress, 2021-2023). We also only consider bill_type in the set:

  • house-resolution
  • house-joint-resolution
  • senate-resolution
  • senate-joint-resolution

Finally, we only consider bill_id in the set of all bill IDs for the given bill_type and congress. For this, I just manually checked the last bill ID for each bill_type and congress and then iterate from 1 to that number.

I implemented a basic caching mechanism to avoid re-querying the same page multiple times, that way I could re-iterate my post-processing code without having to re-query the site.

Actually parsing the HTML was pretty straightforward. Just a matter of targeting key divs, tables, and spans. Here and there, I use a dash of regex for simple post-processing of text.

Data - What It Looks Like

In this post, I want to focus on all the top-level features I’ve collected from the data. This means that, for now, I won’t be crossing features to see their joint distributions, I’ll also be ignoring the text of the bills for now. The rest of this article will be partitioned into sections based on the feature.

Overview - Tracker Status

First, I want to focus on what people probably care most about. What bills are being introduced and what bills are being passed? What happens to a bill in a given congress?

For every bill that ends up on Congress.gov, there’s a tracker status that indicates where the bill is in the legislative process. Based on the statuses in this current data set, the tracker status is a string that can take on one of the following values:

Tracker StatusDescription
IntroducedThe bill has been introduced to the chamber
Passed HouseThe bill has been passed by the House
Passed SenateThe bill has been passed by the Senate
Failed HouseThe bill has failed to pass the House
Failed SenateThe bill has failed to pass the Senate
Resolving DifferencesThe bill has been passed by both chambers but the chambers have not yet resolved differences
Became LawThe bill has been signed by the President and has become law
Became Private LawThe bill has been signed by the President and has become private law

If we like, we could further collapse these categories into three coarse categories:

  • Introduced: Any bill that has been introduced to a chamber but never seen a vote (i.e. Introduced)
  • Stalled: Any bill that has seen a vote but has not become law. Especially since the session is over, we can assume that any bill in this category will not become law (i.e. Passed House, Passed Senate, Failed House, Failed Senate, Resolving Differences)
  • Law: Any bill that has become law (i.e. Became Law, Became Private Law)
IntroducedStalledLaw
House Resolution8,977523198
House Joint Resolution10213
Senate Resolution5,083114160
Senate Joint Resolution5794
Total14,219647365

Some quick observations here:

  • Less than 7% of bills introduced ever see a vote. They’re either introduced and then forgotten about or they’re introduced and then die in committee.
  • Of the bills that see a vote, around 36% become law. Compared to the total, approximately 2% of bills introduced become law.

All of this is conditional on the bills I’ve collected, and of the 117th Congress.

Overview - Sponsor

The sponsor of a bill is the primary member of Congress who introduced the bill. Sponsors have a party affiliation and a state, so we can look at the distribution of bills by party and state. This assumes those factors are relevant to the bill’s success, which may or may not be true.

Party

IntroducedStalledLaw
Democrat8,271437235
Republican5,883210130
Independent6500

Some observations:

  • Democrats:
    • Around 7.5% of bills introduced by Democrats moved out of the Introduced stage.
    • Around 2.6% of bills introduced by Democrats became law.
    • Around 35% of bills that moved out of the Introduced stage became law.
  • Republicans:
    • Around 5.5% of bills introduced by Republicans moved out of the Introduced stage.
    • Around 2.1% of bills introduced by Republicans became law.
    • Around 38% of bills that moved out of the Introduced stage became law.

So, overall, Democrats are introducing more bills and more of their bills are becoming law. However, the percentage of bills that become law is fairly similar between the two parties. Additionally, Republicans have a higher percentage of bills that move out of the Introduced stage and become law.

State

Here’s a table of the top 10 states by number of bills introduced:

RankingState: IntroducedState: StalledState: Law
1CA: 1350CA: 93CA: 34
2TX: 879NY: 44MI: 30
3NY: 784TX: 43TX: 25
4FL: 766MI: 28NY: 24
5IL: 660NJ: 28MN: 17
6PA: 521IL: 27IL: 16
7NJ: 478VA: 26OH: 11
8MI: 380FL: 24VA: 1
9OH: 377PA: 22FL: 11
10MA: 361OH: 19GA: 9

We might also want to normalize by the number of representatives per state. This would give us a better idea of which states are introducing the most bills per representative.

Here’s a table of the top 10 states by number of bills introduced per representative:

RankingState: IntroducedState: StalledState: Law
1DC: 101.0DC: 7.0AK: 2.2
2NH: 47.5AK: 2.8NH: 2.0
3MT: 44.0IA: 2.3MT: 2.0
4OR: 41.0SD: 2.3MI: 1.9
5NV: 40.0NH: 2.2MN: 1.5
6DE: 38.7VA: 2.0HI: 1.5
7SD: 38.3NJ: 2.0CT: 1.3
8IA: 37.7PR: 2.0IA: 1.2
9RI: 36.5NV: 1.8OR: 1.1
10UT: 36.0MO: 1.8SD: 1.0

Individual

Why not also look at the top 10 individual sponsors by number of bills introduced, etc.?

RankingIndividual: IntroducedIndividual: StalledIndividual: Law
1Sen. Rubio (R-FL): 186Sen. Peters (D-MI): 11Sen. Peters (D-MI): 19
2Sen. Klobuchar (D-MN): 143Sen. Cornyn (R-TX): 8Sen. Cornyn (R-TX): 15
3Sen. Lee (R-UT): 125Rep. Connolly (D-VA-11): 8Sen. Klobuchar (D-MN): 7
4Sen. Markey (D-MA): 118Rep. Takano (D-CA-41): 8Sen. Tester (D-MT): 6
5Sen. Casey (D-PA): 116Sen. Grassley (R-IA): 7Sen. Rubio (R-FL): 6
6Sen. Cortez Masto (D-NV): 109Del. Norton (D-DC): 7Rep. DeLauro (D-CT-3): 6
7Sen. Booker (D-NJ): 106Rep. Johnson (D-TX-30): 7Sen. Grassley (R-IA): 5
8Sen. Durbin (D-IL): 102Rep. Katko (R-NY-24): 7Sen. Ossoff (D-GA): 4
9Del. Norton (D-DC): 101Rep. Dean (D-PA-4): 6Sen. Murkowski (R-AK): 4
10Sen. Menendez (D-NJ): 99Rep. Wagner (R-MO-2): 6Sen. Padilla (D-CA): 4

What about a score based on a ratio of bills introduced to bills that became law?

$$ \text{score} = \frac{\text{bills that became law}}{\text{bills introduced} + \text{bills stalled} + \text{bills that became law}} $$

RankingIndividual: Score
1Rep. Pelosi (D-CA-12): 0.500
2Rep. Mrvan (D-IN-1): 0.444
3Rep. Yarmuth (D-KY-3): 0.333
4Rep. Stivers (R-OH-15): 0.250
5Rep. Graves (R-MO-6): 0.222
6Rep. Jeffries (D-NY-8): 0.200
7Rep. Neal (D-MA-1): 0.200
8Rep. Palazzo (R-MS-4): 0.200
9Sen. Peters (D-MI): 0.186
10Rep. Fischbach (R-MN-7): 0.176

Policy Areas

Every bill is assigned a primary policy area. Here, we’re just doing a quick look at the most common policy areas by bill status.

RankingPolicy Area: IntroducedPolicy Area: StalledPolicy Area: Law
1Health: 1885Government Operations and Politics: 79Government Operations and Politics: 94
2Armed Forces and National Security: 1114Armed Forces and National Security: 60Armed Forces and National Security: 69
3Taxation: 1066International Affairs: 60Crime and Law Enforcement: 31
4Government Operations and Politics: 982Health: 56Health: 19
5International Affairs: 866Crime and Law Enforcement: 44Native Americans: 17
6Crime and Law Enforcement: 842Public Lands and Natural Resources: 44International Affairs: 14
7Education: 663Science, Technology, Communications: 44Economics and Public Finance: 13
8Transportation and Public Works: 663Commerce: 43Public Lands and Natural Resources: 13
9Public Lands and Natural Resources: 548Finance and Financial Sector: 34Commerce: 13
10Finance and Financial Sector: 547Emergency Management: 27Emergency Management: 11

Wrapping Up

This is just a quick look at the data, and there’s so much more to consider. We haven’t looked at committees or cosponsors. We haven’t looked at voting patterns or the text of the bills themselves. Nevertheless, we can see that there are some interesting patterns in the data, and we can use this to guide our future analysis. Hopefully these tidbits will inspire you to dig deeper into the data and find your own interesting patterns.

If you have any thoughts, questions, feedback, or general concerns, don’t hesitate to reach out!