Introduction
For a long time, I’ve wanted to interact with legislation data. I’ve been curious about the process of how a bill becomes a law, and what kind of bills are introduced versus what kind of bills are passed. More than anything else, I’ve been curious about how one could take such data and find ways of exploring and presenting it to people such that they could understand the process better themselves. Add a little transparency to the process, if you will.
Well, I’ve finally gotten around to this project, and today I want to discuss the data I’ve collected and how I’ve gone about exploring it. In the future, I want to drill down into geographic analysis of the data (i.e. what states are introducing what bills, what states are passing what bills, etc.) at the federal level. I also want to see if there’s ways one could observe the politicians themselves and develop a model for predicting how they will vote on a bill.
All of these goals are quite ambitious, so I’m going to start small. Today, I’ll be focusing on the data I’ve collected and some basic exploratory analysis.
Data - How It Was Obtained
My primary data source is Congress.gov. This site is maintained by the Library of Congress and is a great resource for anyone interested in the legislative process.
There’s way more data than I could possibly use, so for now I’m focusing on the 117th Congress. Furthermore, I’m only considering resolutions and joint resolutions, but in the future we may consider concurrent resolutions as well. We omit all amendments for the time being.
Here’s the data I collected:
Bill Type | Introduced |
---|---|
House Resolution | 9,698 |
House Joint Resolution | 106 |
Senate Resolution | 5,357 |
Senate Joint Resolution | 70 |
Total | 15,231 |
We’ll dig in a bit deeper into the data in the next section, but for now I want to discuss how I obtained the data.
Funny enough, building out a crawler for this site was pretty easy. The site is well organized and the data is presented in a consistent manner. I used the following technologies:
- Python
- BeautifulSoup
- Selenium
- A little bit of regex magic
Because the site dynamically loads data, I had to use Selenium to load the page and then use BeautifulSoup to parse the HTML instead of something like requests. Furthermore, to respect the site’s bandwidth, I added a 5-second delay between each page load. This resulted in the crawler taking about 3 days to run.
The crawler is available on GitHub if you’re interested in checking it out. In the near term future, I’ll also release the pre-processed data publicly so that others may be able to use it and provide their own analysis. I’ll also briefly summarize the process of crawling the site here.
For every bill, we need to query two pages:
- The bill’s all info page, containing everything but text:
f'https://www.congress.gov/bill/{congress}th-congress/{bill_type}/{bill_id}/all-info'
- The bill’s text page, containing the current text of the bill:
f'https://www.congress.gov/bill/{congress}th-congress/{bill_type}/{bill_id}/text{version}?format=txt'
You’ll notice that there are three variables in the URLs: congress
, bill_type
, and bill_id
.
For this analysis, we only consider congress=117
(the 117th Congress, 2021-2023).
We also only consider bill_type
in the set:
house-resolution
house-joint-resolution
senate-resolution
senate-joint-resolution
Finally, we only consider bill_id
in the set of all bill IDs for the given bill_type
and congress
.
For this, I just manually checked the last bill ID for each bill_type
and congress
and then iterate from 1 to that number.
I implemented a basic caching mechanism to avoid re-querying the same page multiple times, that way I could re-iterate my post-processing code without having to re-query the site.
Actually parsing the HTML was pretty straightforward. Just a matter of targeting key divs, tables, and spans. Here and there, I use a dash of regex for simple post-processing of text.
Data - What It Looks Like
In this post, I want to focus on all the top-level features I’ve collected from the data. This means that, for now, I won’t be crossing features to see their joint distributions, I’ll also be ignoring the text of the bills for now. The rest of this article will be partitioned into sections based on the feature.
Overview - Tracker Status
First, I want to focus on what people probably care most about. What bills are being introduced and what bills are being passed? What happens to a bill in a given congress?
For every bill that ends up on Congress.gov, there’s a tracker status that indicates where the bill is in the legislative process. Based on the statuses in this current data set, the tracker status is a string that can take on one of the following values:
Tracker Status | Description |
---|---|
Introduced | The bill has been introduced to the chamber |
Passed House | The bill has been passed by the House |
Passed Senate | The bill has been passed by the Senate |
Failed House | The bill has failed to pass the House |
Failed Senate | The bill has failed to pass the Senate |
Resolving Differences | The bill has been passed by both chambers but the chambers have not yet resolved differences |
Became Law | The bill has been signed by the President and has become law |
Became Private Law | The bill has been signed by the President and has become private law |
If we like, we could further collapse these categories into three coarse categories:
- Introduced: Any bill that has been introduced to a chamber but never seen a vote (i.e. Introduced)
- Stalled: Any bill that has seen a vote but has not become law. Especially since the session is over, we can assume that any bill in this category will not become law (i.e. Passed House, Passed Senate, Failed House, Failed Senate, Resolving Differences)
- Law: Any bill that has become law (i.e. Became Law, Became Private Law)
Introduced | Stalled | Law | |
---|---|---|---|
House Resolution | 8,977 | 523 | 198 |
House Joint Resolution | 102 | 1 | 3 |
Senate Resolution | 5,083 | 114 | 160 |
Senate Joint Resolution | 57 | 9 | 4 |
Total | 14,219 | 647 | 365 |
Some quick observations here:
- Less than 7% of bills introduced ever see a vote. They’re either introduced and then forgotten about or they’re introduced and then die in committee.
- Of the bills that see a vote, around 36% become law. Compared to the total, approximately 2% of bills introduced become law.
All of this is conditional on the bills I’ve collected, and of the 117th Congress.
Overview - Sponsor
The sponsor of a bill is the primary member of Congress who introduced the bill. Sponsors have a party affiliation and a state, so we can look at the distribution of bills by party and state. This assumes those factors are relevant to the bill’s success, which may or may not be true.
Party
Introduced | Stalled | Law | |
---|---|---|---|
Democrat | 8,271 | 437 | 235 |
Republican | 5,883 | 210 | 130 |
Independent | 65 | 0 | 0 |
Some observations:
- Democrats:
- Around 7.5% of bills introduced by Democrats moved out of the Introduced stage.
- Around 2.6% of bills introduced by Democrats became law.
- Around 35% of bills that moved out of the Introduced stage became law.
- Republicans:
- Around 5.5% of bills introduced by Republicans moved out of the Introduced stage.
- Around 2.1% of bills introduced by Republicans became law.
- Around 38% of bills that moved out of the Introduced stage became law.
So, overall, Democrats are introducing more bills and more of their bills are becoming law. However, the percentage of bills that become law is fairly similar between the two parties. Additionally, Republicans have a higher percentage of bills that move out of the Introduced stage and become law.
State
Here’s a table of the top 10 states by number of bills introduced:
Ranking | State: Introduced | State: Stalled | State: Law |
---|---|---|---|
1 | CA: 1350 | CA: 93 | CA: 34 |
2 | TX: 879 | NY: 44 | MI: 30 |
3 | NY: 784 | TX: 43 | TX: 25 |
4 | FL: 766 | MI: 28 | NY: 24 |
5 | IL: 660 | NJ: 28 | MN: 17 |
6 | PA: 521 | IL: 27 | IL: 16 |
7 | NJ: 478 | VA: 26 | OH: 11 |
8 | MI: 380 | FL: 24 | VA: 1 |
9 | OH: 377 | PA: 22 | FL: 11 |
10 | MA: 361 | OH: 19 | GA: 9 |
We might also want to normalize by the number of representatives per state. This would give us a better idea of which states are introducing the most bills per representative.
Here’s a table of the top 10 states by number of bills introduced per representative:
Ranking | State: Introduced | State: Stalled | State: Law |
---|---|---|---|
1 | DC: 101.0 | DC: 7.0 | AK: 2.2 |
2 | NH: 47.5 | AK: 2.8 | NH: 2.0 |
3 | MT: 44.0 | IA: 2.3 | MT: 2.0 |
4 | OR: 41.0 | SD: 2.3 | MI: 1.9 |
5 | NV: 40.0 | NH: 2.2 | MN: 1.5 |
6 | DE: 38.7 | VA: 2.0 | HI: 1.5 |
7 | SD: 38.3 | NJ: 2.0 | CT: 1.3 |
8 | IA: 37.7 | PR: 2.0 | IA: 1.2 |
9 | RI: 36.5 | NV: 1.8 | OR: 1.1 |
10 | UT: 36.0 | MO: 1.8 | SD: 1.0 |
Individual
Why not also look at the top 10 individual sponsors by number of bills introduced, etc.?
Ranking | Individual: Introduced | Individual: Stalled | Individual: Law |
---|---|---|---|
1 | Sen. Rubio (R-FL): 186 | Sen. Peters (D-MI): 11 | Sen. Peters (D-MI): 19 |
2 | Sen. Klobuchar (D-MN): 143 | Sen. Cornyn (R-TX): 8 | Sen. Cornyn (R-TX): 15 |
3 | Sen. Lee (R-UT): 125 | Rep. Connolly (D-VA-11): 8 | Sen. Klobuchar (D-MN): 7 |
4 | Sen. Markey (D-MA): 118 | Rep. Takano (D-CA-41): 8 | Sen. Tester (D-MT): 6 |
5 | Sen. Casey (D-PA): 116 | Sen. Grassley (R-IA): 7 | Sen. Rubio (R-FL): 6 |
6 | Sen. Cortez Masto (D-NV): 109 | Del. Norton (D-DC): 7 | Rep. DeLauro (D-CT-3): 6 |
7 | Sen. Booker (D-NJ): 106 | Rep. Johnson (D-TX-30): 7 | Sen. Grassley (R-IA): 5 |
8 | Sen. Durbin (D-IL): 102 | Rep. Katko (R-NY-24): 7 | Sen. Ossoff (D-GA): 4 |
9 | Del. Norton (D-DC): 101 | Rep. Dean (D-PA-4): 6 | Sen. Murkowski (R-AK): 4 |
10 | Sen. Menendez (D-NJ): 99 | Rep. Wagner (R-MO-2): 6 | Sen. Padilla (D-CA): 4 |
What about a score based on a ratio of bills introduced to bills that became law?
$$ \text{score} = \frac{\text{bills that became law}}{\text{bills introduced} + \text{bills stalled} + \text{bills that became law}} $$
Ranking | Individual: Score |
---|---|
1 | Rep. Pelosi (D-CA-12): 0.500 |
2 | Rep. Mrvan (D-IN-1): 0.444 |
3 | Rep. Yarmuth (D-KY-3): 0.333 |
4 | Rep. Stivers (R-OH-15): 0.250 |
5 | Rep. Graves (R-MO-6): 0.222 |
6 | Rep. Jeffries (D-NY-8): 0.200 |
7 | Rep. Neal (D-MA-1): 0.200 |
8 | Rep. Palazzo (R-MS-4): 0.200 |
9 | Sen. Peters (D-MI): 0.186 |
10 | Rep. Fischbach (R-MN-7): 0.176 |
Policy Areas
Every bill is assigned a primary policy area. Here, we’re just doing a quick look at the most common policy areas by bill status.
Ranking | Policy Area: Introduced | Policy Area: Stalled | Policy Area: Law |
---|---|---|---|
1 | Health: 1885 | Government Operations and Politics: 79 | Government Operations and Politics: 94 |
2 | Armed Forces and National Security: 1114 | Armed Forces and National Security: 60 | Armed Forces and National Security: 69 |
3 | Taxation: 1066 | International Affairs: 60 | Crime and Law Enforcement: 31 |
4 | Government Operations and Politics: 982 | Health: 56 | Health: 19 |
5 | International Affairs: 866 | Crime and Law Enforcement: 44 | Native Americans: 17 |
6 | Crime and Law Enforcement: 842 | Public Lands and Natural Resources: 44 | International Affairs: 14 |
7 | Education: 663 | Science, Technology, Communications: 44 | Economics and Public Finance: 13 |
8 | Transportation and Public Works: 663 | Commerce: 43 | Public Lands and Natural Resources: 13 |
9 | Public Lands and Natural Resources: 548 | Finance and Financial Sector: 34 | Commerce: 13 |
10 | Finance and Financial Sector: 547 | Emergency Management: 27 | Emergency Management: 11 |
Wrapping Up
This is just a quick look at the data, and there’s so much more to consider. We haven’t looked at committees or cosponsors. We haven’t looked at voting patterns or the text of the bills themselves. Nevertheless, we can see that there are some interesting patterns in the data, and we can use this to guide our future analysis. Hopefully these tidbits will inspire you to dig deeper into the data and find your own interesting patterns.
If you have any thoughts, questions, feedback, or general concerns, don’t hesitate to reach out!