Title: | Data on the States and Counties of the United States |
---|---|
Description: | Demographic data on the United States at the county and state levels spanning multiple years. |
Authors: | Mine Çetinkaya-Rundel [aut, cre] , David Diez [aut], Leah Dorazio [aut] |
Maintainer: | Mine Çetinkaya-Rundel <[email protected]> |
License: | GPL-3 |
Version: | 0.3.1 |
Built: | 2024-10-30 02:59:56 UTC |
Source: | https://github.com/openintrostat/usdata |
Two utility functions. One converts state names to the state abbreviations, and the second does the opposite.
abbr2state(abbr)
abbr2state(abbr)
abbr |
A vector of state abbreviation. |
Returns a vector of the same length with the corresponding state names or abbreviations.
David Diez
state2abbr
, county
, county_complete
abbr2state("MN")
abbr2state("MN")
Summary Data counts for airline per carrier per US City.
airline_delay
airline_delay
A data frame with 3351 rows and 21 variables.
Year data collected
Numeric representation of the month
Carrier.
Carrier Name.
Airport code.
Name of airport.
Number of flights arriving at airport
Number of flights more than 15 minutes late
Number of flights delayed due to air carrier. (e.g. no crew)
Number of flights due to weather.
Number of flights delayed due to National Aviation System (e.g. heavy air traffic).
Number of flights canceled due to a security breach.
Number of flights delayed as a result of another flight on the same aircraft delayed
Number of cancelled flights
Number of flights that were diverted
Total time (minutes) of delayed flight.
Total time (minutes) of delay due to air carrier
Total time (minutes) of delay due to inclement weather.
Total time (minutes) of delay due to National Aviation System.
Total time (minutes) of delay as a result of a security issue .
Total time (minutes) of delay flights as a result of a previous flight on the same airplane being late.
Bureau of Transportation Statistics
library(ggplot2) ggplot(airline_delay, aes(arr_flights, arr_del15, color = as.factor(year))) + geom_point(alpha = 0.3) + labs( x = "Total Number of inbound flights", y = "Number of flights delayed by more than 15 mins", title = "Inbound vs delayed flights by year", color = "Year" )
library(ggplot2) ggplot(airline_delay, aes(arr_flights, arr_del15, color = as.factor(year))) + geom_point(alpha = 0.3) + labs( x = "Total Number of inbound flights", y = "Number of flights delayed by more than 15 mins", title = "Inbound vs delayed flights by year", color = "Year" )
Data for 3142 counties in the United States. See the
county_complete
data set for additional variables.
county
county
A data frame with 3142 observations on the following 14 variables.
County names.
State names.
Population in 2000.
Population in 2010.
Population in 2017.
Population change from 2010 to 2017.
Percent of population in poverty in 2017.
Home ownership rate, 2006-2010.
Percent of housing units in multi-unit structures, 2006-2010.
Unemployment rate in 2017.
Whether the county contains a metropolitan area.
Median education level (2013-2017).
Per capita (per person) income (2013-2017).
Median household income.
Describes whether the type of county-level
smoking ban in place in 2010, taking one of the values "none"
,
"partial"
, or "comprehensive"
.
These data were collected from Census Quick Facts (no longer available as of 2020) and its accompanying pages. Smoking ban data were from a variety of sources.
library(ggplot2) ggplot(county, aes(x = median_edu, y = median_hh_income)) + geom_boxplot()
library(ggplot2) ggplot(county, aes(x = median_edu, y = median_hh_income)) + geom_boxplot()
Data for 3142 counties in the United States with many variables of the 2019 American Community Survey.
county_2019
county_2019
A data frame with 3142 observations on the following 95 variables.
State.
County name.
FIPS code.
Median individual income (2019).
Margin of error for median_individual_income
.
2019 population.
Margin of error for pop
.
Percent of population that is white alone (2015-2019).
Margin of error for white
.
Percent of population that is black alone (2015-2019).
Margin of error for black
.
Percent of population that is Native American alone (2015-2019).
Margin of error for native
.
Percent of population that is Asian alone (2015-2019).
Margin of error for asian
.
Percent of population that is Native Hawaiian or other Pacific Islander alone (2015-2019).
Margin of error for pac_isl
.
Percent of population that is some other race alone (2015-2019).
Margin of error for other_single_race
.
Percent of population that is two or more races (2015-2019).
Margin of error for two_plus_races
.
Percent of population that identifies as Hispanic or Latino (2015-2019).
Margin of error for hispanic
.
Percent of population that is white alone, not Hispanic or Latino (2015-2019).
Margin of error for white_not_hispanic
.
Median age (2015-2019).
Margin of error for median_age
.
Percent of population under 5 (2015-2019).
Margin of error for age_under_5
.
Percent of population 85 and over (2015-2019).
Margin of error for age_over_85
.
Percent of population 18 and over (2015-2019).
Margin of error for age_over_18
.
Percent of population 65 and over (2015-2019).
Margin of error for age_over_65
.
Mean travel time to work (2015-2019).
Margin of error for mean_work_travel
.
Persons per household (2015-2019)
Margin of error for persons_per_household
.
Average family size (2015-2019).
Margin of error for avg_family_size
.
Percent of housing units in 1-unit structures (2015-2019).
Margin of error for housing_one_unit_structures
.
Percent of housing units in multi-unit structures (2015-2019).
Margin of error for housing_two_unit_structures
.
Percent of housing units in mobile homes and other types of units (2015-2019).
Margin of error for housing_mobile_homes
.
Median individual income (2019 dollars, 2015-2019).
Margin of error for median_individual_income_age_25plus
.
Percent of population 25 and older that is a high school graduate (2015-2019).
Margin of error for hs_grad
.
Percent of population 25 and older that earned a Bachelor's degree or higher (2015-2019).
Margin of error for bachelors
.
Total households (2015-2019).
Margin of error for households
.
Percent of households speaking Spanish (2015-2019).
Margin of error for households_speak_spanish
.
Percent of households speaking other Indo-European language (2015-2019).
Margin of error for households_speak_other_indo_euro_lang
.
Percent of households speaking Asian and Pacific Island language (2015-2019).
Margin of error for households_speak_asian_or_pac_isl
.
Percent of households speaking non European or Asian/Pacific Island language (2015-2019).
Margin of error for households_speak_other
.
Percent of limited English-speaking households (2015-2019).
Margin of error for households_speak_limited_english
.
Percent of population below the poverty level (2015-2019).
Margin of error for poverty
.
Percent of population under 18 below the poverty level (2015-2019).
Margin of error for poverty_under_18
.
Percent of population 65 and over below the poverty level (2015-2019).
Margin of error for poverty_65_and_over
.
Mean household income (2019 dollars, 2015-2019).
Margin of error for mean_household_income
.
Per capita money income in past 12 months (2019 dollars, 2015-2019).
Margin of error for per_capita_income
.
Median household income (2015-2019).
Margin of error for median_household_income
.
Percent among civilian population 18 and over that are veterans (2015-2019).
Margin of error for veterans
.
Unemployment rate among those ages 20-64 (2015-2019).
Margin of error for unemployment_rate
.
Percent of civilian noninstitutionalized population that is uninsured (2015-2019).
Margin of error for uninsured
.
Percent of population under 6 years that is uninsured (2015-2019).
Margin of error for uninsured_under_6
.
Percent of population under 19 that is uninsured (2015-2019).
Margin of error for uninsured_under_19
.
Percent of population 65 and older that is uninsured (2015-2019).
Margin of error for uninsured_65_and_older
.
Percent of households that have desktop or laptop computer (2015-2019).
Margin of error for household_has_computer
.
Percent of households that have smartphone (2015-2019).
Margin of error for household_has_smartphone
.
Percent of households that have broadband internet subscription (2015-2019).
Margin of error for household_has_broadband
.
The data were downloaded via the tidycensus
R package.
library(ggplot2) ggplot( county_2019, aes( x = hs_grad, y = median_individual_income, size = sqrt(pop) / 1000 ) ) + geom_point(alpha = 0.5) + scale_color_discrete(na.translate = FALSE) + guides(size = FALSE) + labs( x = "Percentage of population graduated from high school", y = "Median individual income" )
library(ggplot2) ggplot( county_2019, aes( x = hs_grad, y = median_individual_income, size = sqrt(pop) / 1000 ) ) + geom_point(alpha = 0.5) + scale_color_discrete(na.translate = FALSE) + guides(size = FALSE) + labs( x = "Percentage of population graduated from high school", y = "Median individual income" )
Data for 3142 counties in the United States.
county_complete
county_complete
A data frame with 3142 observations on the following 188 variables.
State.
County name.
FIPS code.
2000 population.
2010 population.
2011 population.
names
2012 population.
2013 population.
2014 population.
2015 population.
2016 population.
2017 population.
Percent of population under 5 (2010).
Percent of population under 5 (2017).
Percent of population under 18 (2010).
Percent of population over 65 (2010).
Percent of population over 65 (2017).
Median age (2017).
Percent of population that is female (2010).
Percent of population that is white (2010).
Percent of population that is black (2010).
Percent of population that is black (2017).
Percent of population that is a Native American (2010).
Percent of population that is a Native American (2017).
Percent of population that is a Asian (2010).
Percent of population that is a Asian (2017).
Percent of population that is Hawaii or Pacific Islander (2010).
Percent of population that is Hawaii or Pacific Islander (2017).
Percent of population that identifies as another single race (2017).
Percent of population that identifies as two or more races (2010).
Percent of population that identifies as two or more races (2017).
Percent of population that is Hispanic (2010).
Percent of population that is Hispanic (2017).
Percent of population that is white and not Hispanic (2010).
Percent of population that is white and not Hispanic (2017).
Percent of population that speaks English only (2017).
Percent of population that has not moved in at least one year (2006-2010).
Percent of population that is foreign-born (2006-2010).
Percent of population that speaks a foreign language at home (2006-2010).
Birth rate for women ages 16 to 50 (2017).
Percent of population that is a high school graduate (2006-2010).
Percent of population that is a high school graduate (2012-2016).
Percent of population that is a high school graduate (2017).
Percent of population with some college education (2012-2016).
Percent of population with some college education (2017).
Percent of population that earned a bachelor's degree (2006-2010).
Percent of population that earned a bachelor's degree (2012-2016).
Percent of population that earned a bachelor's degree (2017).
Percent of population that are veterans (2006-2010).
Percent of population that are veterans (2017).
Mean travel time to work (2006-2010).
Mean travel time to work (2017).
Percent of population who has access to broadband (2017).
Percent of population who has access to a computer (2017).
Number of housing units (2010).
Home ownership rate (2006-2010).
Housing units in multi-unit structures (2006-2010).
Median value of owner-occupied housing units (2006-2010).
Households (2006-2010).
Households (2017).
Persons per household (2006-2010).
Persons per household (2017).
Per capita money income in past 12 months (2010 dollars, 2006-2010)
Per capita money income in past 12 months (2017 dollars, 2017)
Whether the county contained a metropolitan area in 2013.
Median household income (2006-2010).
Median household income (2012-2016).
Median household income (2017).
Private nonfarm establishments (2009).
Private nonfarm employment (2009).
Private nonfarm employment, percent change from 2000 to 2009.
Nonemployer establishments (2009).
Total number of firms (2007).
Black-owned firms, percent (2007).
Native American-owned firms, percent (2007).
Asian-owned firms, percent (2007).
Native Hawaiian and other Pacific Islander-owned firms, percent (2007).
Hispanic-owned firms, percent (2007).
Women-owned firms, percent (2007).
Manufacturer shipments, 2007 ($1000).
Mercent wholesaler sales, 2007 ($1000).
Retail sales, 2007 ($1000).
Retail sales per capita, 2007.
Accommodation and food services sales, 2007 ($1000).
Building permits (2010).
Federal spending, in thousands of dollars (2009).
Land area in square miles (2010).
Persons per square mile (2010).
Describes whether the type of county-level smoking ban in place in 2010, taking one of the values "none"
, "partial"
, or "comprehensive"
.
Percent of population below poverty level (2006-2010).
Percent of population below poverty level (2012-2016).
Percent of population below poverty level (2017).
Percent of population under age 5 below poverty level (2017).
Percent of population under age 18 below poverty level (2017).
Civilian labor force in 2007.
Number of civilians employed in 2007.
Number of civilians unemployed in 2007.
Unemployment rate in 2007.
Civilian labor force in 2008.
Number of civilians employed in 2008.
Number of civilians unemployed in 2008.
Unemployment rate in 2008.
Civilian labor force in 2009.
Number of civilians employed in 2009.
Number of civilians unemployed in 2009.
Unemployment rate in 2009.
Civilian labor force in 2010.
Number of civilians employed in 2010.
Number of civilians unemployed in 2010.
Unemployment rate in 2010.
Civilian labor force in 2011.
Number of civilians employed in 2011.
Number of civilians unemployed in 2011.
Unemployment rate in 2011.
Civilian labor force in 2012.
Number of civilians employed in 2012.
Number of civilians unemployed in 2012.
Unemployment rate in 2012.
Civilian labor force in 2013.
Number of civilians employed in 2013.
Number of civilians unemployed in 2013.
Unemployment rate in 2013.
Civilian labor force in 2014.
Number of civilians employed in 2014.
Number of civilians unemployed in 2014.
Unemployment rate in 2014.
Civilian labor force in 2015.
Number of civilians employed in 2015.
Number of civilians unemployed in 2015.
Unemployment rate in 2015.
Civilian labor force in 2016.
Number of civilians employed in 2016.
Number of civilians unemployed in 2016.
Unemployment rate in 2016.
Percent of population who are uninsured (2017).
Percent of population under 6 who are uninsured (2017).
Percent of population under 19 who are uninsured (2017).
Percent of population under 74 who are uninsured (2017).
Civilian labor force in 2017.
Number of civilians employed in 2017.
Number of civilians unemployed in 2017.
Unemployment rate in 2017.
Median individual income (2019).
2019 population.
Percent of population that is white alone (2015-2019).
Percent of population that is black alone (2015-2019).
Percent of population that is Native American alone (2015-2019).
Percent of population that is Asian alone (2015-2019).
Percent of population that is Native Hawaiian or other Pacific Islander alone (2015-2019).
Percent of population that is some other race alone (2015-2019).
Percent of population that is two or more races (2015-2019).
Percent of population that identifies as Hispanic or Latino (2015-2019).
Percent of population that is white alone, not Hispanic or Latino (2015-2019).
Median age (2015-2019).
Percent of population under 5 (2015-2019).
Percent of population 85 and over (2015-2019).
Percent of population 18 and over (2015-2019).
Percent of population 65 and over (2015-2019).
Mean travel time to work (2015-2019).
Persons per household (2015-2019)
Average family size (2015-2019).
Percent of housing units in 1-unit structures (2015-2019).
Percent of housing units in multi-unit structures (2015-2019).
Percent of housing units in mobile homes and other types of units (2015-2019).
Median individual income (2019 dollars, 2015-2019).
Percent of population 25 and older that is a high school graduate (2015-2019).
Percent of population 25 and older that earned a Bachelor's degree or higher (2015-2019).
Total households (2015-2019).
Percent of households speaking Spanish (2015-2019).
Percent of households speaking other Indo-European language (2015-2019).
Percent of households speaking Asian and Pacific Island language (2015-2019).
Percent of households speaking non European or Asian/Pacific Island language (2015-2019).
Percent of limited English-speaking households (2015-2019).
Percent of population below the poverty level (2015-2019).
Percent of population under 18 below the poverty level (2015-2019).
Percent of population 65 and over below the poverty level (2015-2019).
Mean household income (2019 dollars, 2015-2019).
Per capita money income in past 12 months (2019 dollars, 2015-2019).
Median household income (2015-2019).
Percent among civilian population 18 and over that are veterans (2015-2019).
Unemployment rate among those ages 20-64 (2015-2019).
Percent of civilian noninstitutionalized population that is uninsured (2015-2019).
Percent of population under 6 years that is uninsured (2015-2019).
Percent of population under 19 that is uninsured (2015-2019).
Percent of population 65 and older that is uninsured (2015-2019).
Percent of households that have desktop or laptop computer (2015-2019).
Percent of households that have smartphone (2015-2019).
Percent of households that have broadband internet subscription (2015-2019).
The data prior to 2011 was from http://census.gov, though the exact page it came from is no longer available.
More recent data comes from the following sources.
Downloaded via the tidycensus
R package.
Download links for spreadsheets were found on https://www.ers.usda.gov/data-products/county-level-data-sets/download-data
Unemployment - Bureau of Labor Statistics - LAUS data - https://www.bls.gov/lau/.
Median Household Income - Census Bureau - Small Area Income and Poverty Estimates (SAIPE) data.
The original data table was prepared by USDA, Economic Research Service.
Census Bureau.
2012-16 American Community Survey 5-yr average.
The original data table was prepared by USDA, Economic Research Service.
Tim Parker (tparker at ers.usda.gov) is the contact for much of the new data incorporated into this data set.
library(dplyr) library(ggplot2) county_complete |> mutate( pop_change = 100 * ((pop2017 / pop2013) - 1), metro_area = if_else(metro_2013 == 1, TRUE, FALSE) ) |> ggplot(aes( x = poverty_2016, y = pop_change, color = metro_area, size = sqrt(pop2017) / 1e3 )) + geom_point(alpha = 0.5) + scale_color_discrete(na.translate = FALSE) + guides(size = FALSE) + labs( x = "Percentage of population in poverty (2016)", y = "Percentage population change between 2013 to 2017", color = "Metropolitan area", title = "Population change and poverty" ) # Counties with high population change county_complete |> mutate(pop_change = 100 * ((pop2017 / pop2013) - 1)) |> filter(pop_change < -10 | pop_change > 25) |> select(state, name, fips, pop_change) # Population by metro area county_complete |> mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |> filter(!is.na(metro_area)) |> ggplot(aes(x = metro_area, y = log(pop2017))) + geom_violin() + labs( x = "Metro area", y = "Log of population in 2017", title = "Population by metro area" ) # Poverty and median household income county_complete |> mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |> ggplot(aes( x = poverty_2016, y = median_household_income_2016, color = metro_area, size = sqrt(pop2017) / 1e3 )) + geom_point(alpha = 0.5) + scale_color_discrete(na.translate = FALSE) + guides(size = FALSE) + labs( x = "Percentage of population in poverty (2016)", y = "Median household income (2016)", color = "Metropolitan area", title = "Poverty and median household income" ) # Unemployment rate and poverty county_complete |> mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |> ggplot(aes( x = unemployment_rate_2017, y = poverty_2016, color = metro_area, size = sqrt(pop2017) / 1e3 )) + geom_point(alpha = 0.5) + scale_color_discrete(na.translate = FALSE) + guides(size = FALSE) + labs( x = "Unemployment rate (2017)", y = "Percentage of population in poverty (2016)", color = "Metropolitan area", title = "Unemployment rate and poverty" )
library(dplyr) library(ggplot2) county_complete |> mutate( pop_change = 100 * ((pop2017 / pop2013) - 1), metro_area = if_else(metro_2013 == 1, TRUE, FALSE) ) |> ggplot(aes( x = poverty_2016, y = pop_change, color = metro_area, size = sqrt(pop2017) / 1e3 )) + geom_point(alpha = 0.5) + scale_color_discrete(na.translate = FALSE) + guides(size = FALSE) + labs( x = "Percentage of population in poverty (2016)", y = "Percentage population change between 2013 to 2017", color = "Metropolitan area", title = "Population change and poverty" ) # Counties with high population change county_complete |> mutate(pop_change = 100 * ((pop2017 / pop2013) - 1)) |> filter(pop_change < -10 | pop_change > 25) |> select(state, name, fips, pop_change) # Population by metro area county_complete |> mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |> filter(!is.na(metro_area)) |> ggplot(aes(x = metro_area, y = log(pop2017))) + geom_violin() + labs( x = "Metro area", y = "Log of population in 2017", title = "Population by metro area" ) # Poverty and median household income county_complete |> mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |> ggplot(aes( x = poverty_2016, y = median_household_income_2016, color = metro_area, size = sqrt(pop2017) / 1e3 )) + geom_point(alpha = 0.5) + scale_color_discrete(na.translate = FALSE) + guides(size = FALSE) + labs( x = "Percentage of population in poverty (2016)", y = "Median household income (2016)", color = "Metropolitan area", title = "Poverty and median household income" ) # Unemployment rate and poverty county_complete |> mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |> ggplot(aes( x = unemployment_rate_2017, y = poverty_2016, color = metro_area, size = sqrt(pop2017) / 1e3 )) + geom_point(alpha = 0.5) + scale_color_discrete(na.translate = FALSE) + guides(size = FALSE) + labs( x = "Unemployment rate (2017)", y = "Percentage of population in poverty (2016)", color = "Metropolitan area", title = "Unemployment rate and poverty" )
A subset of the Washington Post database. Contains records of every fatal police shooting by an on-duty officer since January 1, 2015.
fatal_police_shootings
fatal_police_shootings
A data frame with 6421 rows and 12 variables.
date of fatal shooting.
shot or shot and Tasered.
Indicates if the victim was armed with some sort of implement that a police officer believed could inflict harm.
the age of the victim.
The gender of the victim. The Post identifies victims by the gender they identify with if reports indicate that it differs from their biological sex.
W White non-Hispanic; B Black non-Hispanic; A Asian; N Native American; H Hispanic; O Other None unknown.
The municipality where the fatal shooting took place. Note that in some cases this field may contain a county name if a more specific municipality is unavailable or unknown.
two-letter postal code abbreviation.
If news reports have indicated the victim had a history of mental health issues, expressed suicidal intentions or was experiencing mental distress at the time of the shooting.
The general criteria for the attack label was that there was the most direct and immediate threat to life that would include incidents where officers or others were shot at, threatened with a gun, attacked with other weapons or physical force, etc. ; the attack category is meant to flag the highest level of threat; the other and undetermined categories represent all remaining cases; other includes many incidents where officers or others faced significant threats.
If news reports have indicated the victim was moving away from officers by Foot, by Car, or Not fleeing.
If news reports have indicated an officer was wearing a body camera and it may have recorded some portion of the incident.
library(dplyr) # List race frequency and percentage fatal_police_shootings |> group_by(race) |> summarize(n = n()) |> mutate(freq = n / sum(n) * 100) # List different weapons that victims were armed with fatal_police_shootings |> distinct(armed)
library(dplyr) # List race frequency and percentage fatal_police_shootings |> group_by(race) |> summarize(n = n()) |> mutate(freq = n / sum(n) * 100) # List different weapons that victims were armed with fatal_police_shootings |> distinct(armed)
A dataset on gerrymandering and its influence on House elections. The data set was originally built by Jeff Whitmer.
gerrymander
gerrymander
A data frame with 435 rows and 12 variables:
Congressional district.
Last name of 2016 election winner.
First name of 2016 election winnner.
Political party of 2016 election winner.
Percent of vote received by Clinton in 2016 Presidential Election.
Percent of vote received by Trump in 2016 Presidential Election.
Did a Democrat win the 2016 House election. Levels of 1 (yes) and 0 (no).
State the Representative is from.
Political Party of the 2018 election winner.
Did a Democrat win the 2018 House election. Levels of 1 (yes) and 0 (no).
Did a Democrat flip the seat in the 2018 election? Levels of 1 (yes) and 0 (no).
Categorical variable for prevalence of gerrymandering with levels of low, mid and high.
library(ggplot2) library(dplyr) ggplot(gerrymander |> filter(gerry != "mid"), aes(clinton16, dem16, color = gerry)) + geom_jitter(height = 0.05, size = 3, shape = 1) + geom_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE) + scale_color_manual(values = c("purple", "orange")) + labs( title = "Logistic Regression of 2016 House Elections", subtitle = "by Congressional District", x = "Percent of Presidential Vote Won by Clinton", y = "Seat Won by Democrat Candidate", color = "Gerrymandering" )
library(ggplot2) library(dplyr) ggplot(gerrymander |> filter(gerry != "mid"), aes(clinton16, dem16, color = gerry)) + geom_jitter(height = 0.05, size = 3, shape = 1) + geom_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE) + scale_color_manual(values = c("purple", "orange")) + labs( title = "Logistic Regression of 2016 House Elections", subtitle = "by Congressional District", x = "Percent of Presidential Vote Won by Clinton", y = "Seat Won by Democrat Candidate", color = "Gerrymandering" )
Election results for 2010 Governor races in the U.S.
govrace10
govrace10
A data frame with 37 observations on the following 23 variables.
Unique identifier for the race, which does not overlap with other
2010 races (see houserace10
and senaterace10
)
State name
State name abbreviation
Name of the winning candidate
Percentage of vote for winning candidate (if more than one candidate)
Party of winning candidate
Number of votes for winning candidate
Name of candidate with second most votes
Percentage of vote for candidate who came in second
Party of candidate with second most votes
Number of votes for candidate who came in second
Name of candidate with third most votes
Percentage of vote for candidate who came in third
Party of candidate with third most votes
Number of votes for candidate who came in third
Name of candidate with fourth most votes
Percentage of vote for candidate who came in fourth
Party of candidate with fourth most votes
Number of votes for candidate who came in fourth
Name of candidate with fifth most votes
Percentage of vote for candidate who came in fifth
Party of candidate with fifth most votes
Number of votes for candidate who came in fifth
MSNBC.com, retrieved 2010-11-09.
table(govrace10$party1, govrace10$party2)
table(govrace10$party1, govrace10$party2)
Election results for the 2010 U.S. House of Represenatives races
houserace10
houserace10
A data frame with 435 observations on the following 24 variables.
Unique identifier for the race, which does not
overlap with other 2010 races (see govrace10
and
senaterace10
)
State name
State name abbreviation
District number for the state
Name of the winning candidate
Percentage of vote for winning candidate (if more than one candidate)
Party of winning candidate
Number of votes for winning candidate
Name of candidate with second most votes
Percentage of vote for candidate who came in second
Party of candidate with second most votes
Number of votes for candidate who came in second
Name of candidate with third most votes
Percentage of vote for candidate who came in third
Party of candidate with third most votes
Number of votes for candidate who came in third
Name of candidate with fourth most votes
Percentage of vote for candidate who came in fourth
Party of candidate with fourth most votes
Number of votes for candidate who came in fourth
Name of candidate with fifth most votes
Percentage of vote for candidate who came in fifth
Party of candidate with fifth most votes
Number of votes for candidate who came in fifth
This analysis in the Examples section was inspired by and is similar to that of Nate Silver's district-level analysis on the FiveThirtyEight blog in the New York Times: https://fivethirtyeight.com/features/2010-an-aligning-election/
MSNBC.com, retrieved 2010-11-09.
hr <- table(houserace10[, c("abbr", "party1")]) nr <- apply(hr, 1, sum) pr <- prrace08[prrace08$state != "DC", c("state", "p_obama")] hr <- hr[as.character(pr$state), ] (fit <- glm(hr ~ pr$p_obama, family = binomial)) x1 <- pr$p_obama[match(houserace10$abbr, pr$state)] y1 <- (houserace10$party1 == "Democrat") + 0 g <- glm(y1 ~ x1, family = binomial) x <- pr$p_obama[pr$state != "DC"] nr <- apply(hr, 1, sum) plot(x, hr[, "Democrat"] / nr, pch = 19, cex = sqrt(nr), col = "#22558844", xlim = c(20, 80), ylim = c(0, 1), xlab = "Percent vote for Obama in 2008", ylab = "Probability of Democrat winning House seat" ) X <- seq(0, 100, 0.1) lo <- -5.6079 + 0.1009 * X p <- exp(lo) / (1 + exp(lo)) lines(X, p) abline(h = 0:1, lty = 2, col = "#888888")
hr <- table(houserace10[, c("abbr", "party1")]) nr <- apply(hr, 1, sum) pr <- prrace08[prrace08$state != "DC", c("state", "p_obama")] hr <- hr[as.character(pr$state), ] (fit <- glm(hr ~ pr$p_obama, family = binomial)) x1 <- pr$p_obama[match(houserace10$abbr, pr$state)] y1 <- (houserace10$party1 == "Democrat") + 0 g <- glm(y1 ~ x1, family = binomial) x <- pr$p_obama[pr$state != "DC"] nr <- apply(hr, 1, sum) plot(x, hr[, "Democrat"] / nr, pch = 19, cex = sqrt(nr), col = "#22558844", xlim = c(20, 80), ylim = c(0, 1), xlab = "Percent vote for Obama in 2008", ylab = "Probability of Democrat winning House seat" ) X <- seq(0, 100, 0.1) lo <- -5.6079 + 0.1009 * X p <- exp(lo) / (1 + exp(lo)) lines(X, p) abline(h = 0:1, lty = 2, col = "#888888")
Real estate sales for Pierce County, WA in 2020.
pierce_county_house_sales
pierce_county_house_sales
A data frame with 16814 rows and 19 variables.
Date the legal document (deed) was executed.
Dollar amount recorded for the sale.
Sum of the square feet for the building.
Finished living area in the attic.
Total square footage of the basement..
Total square footage of the attached or built in garage(s).
Total detached garage(s) square footage.
Total count of single, double or PreFab stoves.
Text description associated with the predominant heating source for the built-as structure i.e. Forced Air, Electric Baseboard, Steam, etc. .
Predominant type of construction materials used for the exterior siding on Residential Buildings.
Predominant type of materials used on the interior walls. i.e. Sheetrock or Paneling.
Number of floors/building levels above grade. Stories do not include attic or basement areas.
Material used for the roof. I.e. Composition Shingles, Wood Shake, Concrete Tile, etc.
Year the building was built, as stated by the building permit or a historical record.
Number of bedrooms listed for a residential property.
Number of baths listed for a residential property. The number is listed as a decimal, i.e. 2.75 = two full and one three-quarter baths. A tub/sink/toilet combination (plus any additional fixtures) is considered 1.0 bath. A shower/sink/toilet combination (plus any additional fixtures) is 0.75 bath. A sink/toilet combination is .5 bath.
Describes the type of waterfront the property adjoins or has legal access to.
Assigned to reflect the market appeal of the overall view available from the dwelling or property.
Identifies if sewer/septic is installed, available or not available or if the property does not support an on site sewage disposal system.
library(dplyr) library(lubridate) # List house sales frequency and average price grouped by month pierce_county_house_sales |> mutate(month_sale = month(sale_date)) |> group_by(month_sale) |> summarize(freq = n(), mean_price = mean(sale_price)) |> arrange(desc(freq)) # List house sales frequency and average price group by waterfront type pierce_county_house_sales |> group_by(waterfront_type) |> summarize(freq = n(), mean_price = mean(sale_price)) |> arrange(desc(mean_price))
library(dplyr) library(lubridate) # List house sales frequency and average price grouped by month pierce_county_house_sales |> mutate(month_sale = month(sale_date)) |> group_by(month_sale) |> summarize(freq = n(), mean_price = mean(sale_price)) |> arrange(desc(freq)) # List house sales frequency and average price group by waterfront type pierce_county_house_sales |> group_by(waterfront_type) |> summarize(freq = n(), mean_price = mean(sale_price)) |> arrange(desc(mean_price))
State level data on population by age.
pop_age_2019
pop_age_2019
A data frame with 2820 rows and 4 variables.
State as 2 letter abbreviation.
State name.
Age cohort for population.
Population of age cohort.
total estimated state population in 2019
Centers for Disease Control and Prevention
library(dplyr) # List age population for each state with percent of total pop_age_2019 |> group_by(state_name, age) |> mutate(percent = population / state_total_population * 100) |> select(state_name, age, population, percent) pop_age_2019 |> select(state_name, state_total_population) |> distinct() |> arrange(desc(state_total_population))
library(dplyr) # List age population for each state with percent of total pop_age_2019 |> group_by(state_name, age) |> mutate(percent = population / state_total_population * 100) |> select(state_name, age, population, percent) pop_age_2019 |> select(state_name, state_total_population) |> distinct() |> arrange(desc(state_total_population))
State level data on population by race.
pop_race_2019
pop_race_2019
A data frame with 2820 rows and 4 variables.
State as 2 letter abbreviation.
State name.
race cohort for population.
indicates whether population is Hispanic or Latino
Population of race cohort.
total estimated state population in 2019
Centers for Disease Control and Prevention
library(dplyr) # List race population for each state with percent of total pop_race_2019 |> group_by(state_name, race, hispanic) |> mutate(percent = population / state_total_population * 100) |> select(state_name, race, hispanic, population, percent) pop_race_2019 |> select(state_name, state_total_population) |> distinct() |> arrange(desc(state_total_population))
library(dplyr) # List race population for each state with percent of total pop_race_2019 |> group_by(state_name, race, hispanic) |> mutate(percent = population / state_total_population * 100) |> select(state_name, race, hispanic, population, percent) pop_race_2019 |> select(state_name, state_total_population) |> distinct() |> arrange(desc(state_total_population))
Data from a Pew Research Center poll about Presidential power/control over gas prices.
prez_pwr
prez_pwr
A data frame with 365 rows and 3 variables.
Sitting President at time of the poll.
Political party of the respondent with levels d(emocrat) and r(epublican).
Respondent answer to the question: "Is the price of gasoline something the president can do alot about, or is that beyond the president's control?"
Pew Research Center, May 2006 & March 2012.
library(ggplot2) ggplot(prez_pwr, aes(has_pwr, fill = party)) + geom_bar() + labs( title = "Is the price of gasoline something the president can do alot about?", x = "", y = "Number of respondents", fill = "Respondent Party" ) + facet_wrap(~president)
library(ggplot2) ggplot(prez_pwr, aes(has_pwr, fill = party)) + geom_bar() + labs( title = "Is the price of gasoline something the president can do alot about?", x = "", y = "Number of respondents", fill = "Respondent Party" ) + facet_wrap(~president)
Election results for the 2008 U.S. Presidential race
prrace08
prrace08
A data frame with 51 observations on the following 7 variables.
State name abbreviation
Full state name
Number of votes for Barack Obama
Proportion of votes for Barack Obama
Number of votes for John McCain
Proportion of votes for John McCain
Number of electoral votes for a state
In Nebraska, 4 electoral votes went to McCain and 1 to Obama. Otherwise the electoral votes were a winner-take-all.
Presidential Election of 2008, Electoral and Popular Vote Summary, retrieved 2011-04-21.
# ===> Obtain 2010 US House Election Data <===# hr <- table(houserace10[, c("abbr", "party1")]) nr <- apply(hr, 1, sum) # ===> Obtain 2008 President Election Data <===# pr <- prrace08[prrace08$state != "DC", c("state", "p_obama")] hr <- hr[as.character(pr$state), ] (fit <- glm(hr ~ pr$p_obama, family = binomial)) # ===> Visualizing Binomial outcomes <===# x <- pr$p_obama[pr$state != "DC"] nr <- apply(hr, 1, sum) plot(x, hr[, "Democrat"] / nr, pch = 19, cex = sqrt(nr), col = "#22558844", xlim = c(20, 80), ylim = c(0, 1), xlab = "Percent vote for Obama in 2008", ylab = "Probability of Democrat winning House seat" ) # ===> Logistic Regression <===# x1 <- pr$p_obama[match(houserace10$abbr, pr$state)] y1 <- (houserace10$party1 == "Democrat") + 0 g <- glm(y1 ~ x1, family = binomial) X <- seq(0, 100, 0.1) lo <- -5.6079 + 0.1009 * X p <- exp(lo) / (1 + exp(lo)) lines(X, p) abline(h = 0:1, lty = 2, col = "#888888")
# ===> Obtain 2010 US House Election Data <===# hr <- table(houserace10[, c("abbr", "party1")]) nr <- apply(hr, 1, sum) # ===> Obtain 2008 President Election Data <===# pr <- prrace08[prrace08$state != "DC", c("state", "p_obama")] hr <- hr[as.character(pr$state), ] (fit <- glm(hr ~ pr$p_obama, family = binomial)) # ===> Visualizing Binomial outcomes <===# x <- pr$p_obama[pr$state != "DC"] nr <- apply(hr, 1, sum) plot(x, hr[, "Democrat"] / nr, pch = 19, cex = sqrt(nr), col = "#22558844", xlim = c(20, 80), ylim = c(0, 1), xlab = "Percent vote for Obama in 2008", ylab = "Probability of Democrat winning House seat" ) # ===> Logistic Regression <===# x1 <- pr$p_obama[match(houserace10$abbr, pr$state)] y1 <- (houserace10$party1 == "Democrat") + 0 g <- glm(y1 ~ x1, family = binomial) X <- seq(0, 100, 0.1) lo <- -5.6079 + 0.1009 * X p <- exp(lo) / (1 + exp(lo)) lines(X, p) abline(h = 0:1, lty = 2, col = "#888888")
Election results for the 2010 U.S. Senate races
senaterace10
senaterace10
A data frame with 38 observations on the following 23 variables.
Unique identifier for the race, which does not overlap with other
2010 races (see govrace10
and houserace10
)
State name
State name abbreviation
Name of the winning candidate
Percentage of vote for winning candidate (if more than one candidate)
Party of winning candidate
Number of votes for winning candidate
Name of candidate with second most votes
Percentage of vote for candidate who came in second
Party of candidate with second most votes
Number of votes for candidate who came in second
Name of candidate with third most votes
Percentage of vote for candidate who came in third
Party of candidate with third most votes
Number of votes for candidate who came in third
Name of candidate with fourth most votes
Percentage of vote for candidate who came in fourth
Party of candidate with fourth most votes
Number of votes for candidate who came in fourth
Name of candidate with fifth most votes
Percentage of vote for candidate who came in fifth
Party of candidate with fifth most votes
Number of votes for candidate who came in fifth
MSNBC.com, retrieved 2010-11-09.
library(ggplot2) ggplot(senaterace10, aes(x = perc1)) + geom_histogram(binwidth = 5) + labs(x = "Winning candidate vote percentage")
library(ggplot2) ggplot(senaterace10, aes(x = perc1)) + geom_histogram(binwidth = 5) + labs(x = "Winning candidate vote percentage")
Information about each state collected from both the official US Census website and from various other sources.
state_stats
state_stats
A data frame with 51 observations on the following 23 variables.
State name.
State abbreviation (e.g. "MN"
).
FIPS code.
Population in 2010.
Population in 2000.
Home ownership rate.
Percent of living units that are in multi-unit structures.
Average income per capita.
Median household income.
Poverty rate.
Federal spending per capita.
Land area.
Percent of population that smokes.
Murders per 100,000 people.
Robberies per 100,000.
Aggravated assaults per 100,000.
Larcenies per 100,000.
Vehicle theft per 100,000.
Percent of individuals collecting social security.
Percent of power coming from nuclear sources.
Percent of power coming from coal sources.
Traffic deaths per 100,000.
Traffic deaths per 100,000 where alcohol was not a factor.
Unemployment rate (February 2012, preliminary).
Census Quick Facts (no longer available as of 2020),
InfoChimps (also no longer available as of 2020),
National Highway Traffic Safety Administration
(tr_deaths
, tr_deaths_no_alc
),
Bureau of Labor Statistics
(unempl
).
library(ggplot2) library(dplyr) library(maps) states_selected <- state_stats |> mutate(region = tolower(state)) |> select(region, unempl, murder, nuclear) states_map <- map_data("state") |> inner_join(states_selected) # Unemployment map ggplot(states_map, aes(map_id = region)) + geom_map(aes(fill = unempl), map = states_map) + expand_limits(x = states_map$long, y = states_map$lat) + scale_fill_viridis_c() + labs(x = "", y = "", fill = "Unemployment\n(%)") # Murder rate map states_map |> filter(region != "district of columbia") |> ggplot(aes(map_id = region)) + geom_map(aes(fill = murder), map = states_map) + expand_limits(x = states_map$long, y = states_map$lat) + scale_fill_viridis_c() + labs(x = "", y = "", fill = "Murders\nper 100k") # Nuclear energy map ggplot(states_map, aes(map_id = region)) + geom_map(aes(fill = nuclear), map = states_map) + expand_limits(x = states_map$long, y = states_map$lat) + scale_fill_viridis_c() + labs(x = "", y = "", fill = "Nuclear energy\n(%)")
library(ggplot2) library(dplyr) library(maps) states_selected <- state_stats |> mutate(region = tolower(state)) |> select(region, unempl, murder, nuclear) states_map <- map_data("state") |> inner_join(states_selected) # Unemployment map ggplot(states_map, aes(map_id = region)) + geom_map(aes(fill = unempl), map = states_map) + expand_limits(x = states_map$long, y = states_map$lat) + scale_fill_viridis_c() + labs(x = "", y = "", fill = "Unemployment\n(%)") # Murder rate map states_map |> filter(region != "district of columbia") |> ggplot(aes(map_id = region)) + geom_map(aes(fill = murder), map = states_map) + expand_limits(x = states_map$long, y = states_map$lat) + scale_fill_viridis_c() + labs(x = "", y = "", fill = "Murders\nper 100k") # Nuclear energy map ggplot(states_map, aes(map_id = region)) + geom_map(aes(fill = nuclear), map = states_map) + expand_limits(x = states_map$long, y = states_map$lat) + scale_fill_viridis_c() + labs(x = "", y = "", fill = "Nuclear energy\n(%)")
Two utility functions. One converts state names to the state abbreviations, and the second does the opposite.
state2abbr(state)
state2abbr(state)
state |
A vector of state name, where there is a little fuzzy matching. |
Returns a vector of the same length with the corresponding state names or abbreviations.
David Diez
abbr2state
, county
, county_complete
state2abbr("Minnesota") # Some spelling/capitalization errors okay state2abbr("mINnesta")
state2abbr("Minnesota") # Some spelling/capitalization errors okay state2abbr("mINnesta")
Census data for the 50 states plus DC and Puerto Rico.
urban_owner
urban_owner
A data frame with 52 observations on the following 28 variables.
State
Total housing units available in 2000.
Total housing units available in 2010.
a numeric vector
Occupied.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
US Census.
urban_owner
urban_owner
Census info for the 50 US states plus DC.
urban_rural_pop
urban_rural_pop
A data frame with 51 observations on the following 5 variables.
US state.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
US census.
urban_rural_pop
urban_rural_pop
National data on the number of crimes committed in the US between 1960 and 2019.
us_crime_rates
us_crime_rates
A data frame with 60 rows and 12 variables.
Year data was collected.
Population of the United States the year data was collected.
Total number of violent and property crimes committed.
Total number of violent crimes committed.
Total number of property crimes committed.
Number of murders committed. Counted in violent total.
Number of forcible rapes committed. Counted in violent total.
Number of robberies committed. Counted in violent total.
Number of aggravated assaults committed. Counted in violent total.
Number of burglaries committed. Counted in property total.
Number of larcency thefts committed. Counted in property total.
Number of vehicle thefts committed. Counted in property total.
library(ggplot2) ggplot(us_crime_rates, aes(x = population, y = total)) + geom_point() + labs( title = "Crimes V Population", x = "Population", y = "Total Number of Crimes" ) ggplot(us_crime_rates, aes(x = murder)) + geom_boxplot() + labs( title = "US Murders", subtitle = "1960 - 2019", x = "Number of Murders" ) + theme(axis.text.y = element_blank())
library(ggplot2) ggplot(us_crime_rates, aes(x = population, y = total)) + geom_point() + labs( title = "Crimes V Population", x = "Population", y = "Total Number of Crimes" ) ggplot(us_crime_rates, aes(x = murder)) + geom_boxplot() + labs( title = "US Murders", subtitle = "1960 - 2019", x = "Number of Murders" ) + theme(axis.text.y = element_blank())
A representative set of monitoring locations were taken from NOAA data that had both years of interest (1950 and 2022). The information was collected so as to spread the measurements across the continental United States. Daily high and low temperatures are given for each of 24 weather stations.
us_temp
us_temp
A data frame with 17250 observations on the following 9 variables.
Station ID, measurements from 24 stations.
Name of the station.
Latitude of the station.
Longitude of the station.
Elevation of the station.
Date of observed temperature.
High temp for the observed day.
Low temp for the observed day.
Factor variable for year, levels: 1950
and 2022
.
Please keep in mind that these are two annual snapshots from a few dozen arbitrarily selected weather stations. A complete analysis would consider more than two years of data and a more precise random sample uniformly distributed across the United States.
https://www.ncei.noaa.gov/cdo-web/, retrieved 2023-09-23.
library(ggplot2) library(maps) library(sf) library(dplyr) # Summarize temperature by station and year for plotting summarized_temp <- us_temp |> group_by(station, year, latitude, longitude) |> summarize(tmax_med = median(tmax, na.rm = TRUE), .groups = "drop") |> mutate(plot_shift = ifelse(year == "1950", 0, 2)) # Make a map of the US as a baseline usa <- st_as_sf(maps::map("state", fill = TRUE, plot = FALSE)) # Layer the US map with summarized temperatures ggplot(data = usa) + geom_sf() + geom_point( data = summarized_temp, aes(x = longitude + plot_shift, y = latitude, fill = tmax_med, shape = year), color = "black", size = 3 ) + scale_fill_gradient(high = "red", low = "yellow") + scale_shape_manual(values = c(21, 24)) + labs( title = "Median high temperature, 1950 and 2022", x = "Longitude", y = "Latitude", fill = "Median\nhigh temp", shape = "Year" )
library(ggplot2) library(maps) library(sf) library(dplyr) # Summarize temperature by station and year for plotting summarized_temp <- us_temp |> group_by(station, year, latitude, longitude) |> summarize(tmax_med = median(tmax, na.rm = TRUE), .groups = "drop") |> mutate(plot_shift = ifelse(year == "1950", 0, 2)) # Make a map of the US as a baseline usa <- st_as_sf(maps::map("state", fill = TRUE, plot = FALSE)) # Layer the US map with summarized temperatures ggplot(data = usa) + geom_sf() + geom_point( data = summarized_temp, aes(x = longitude + plot_shift, y = latitude, fill = tmax_med, shape = year), color = "black", size = 3 ) + scale_fill_gradient(high = "red", low = "yellow") + scale_shape_manual(values = c(21, 24)) + labs( title = "Median high temperature, 1950 and 2022", x = "Longitude", y = "Latitude", fill = "Median\nhigh temp", shape = "Year" )
Average Time Spent on Activities by Americans
us_time_survey
us_time_survey
A data frame with 11 rows and 8 variables.
Year data collected
Average hours per day spent on household activities - travel included
Average hours per day spent eating and drinking including travel.
Average hours per day spent on leisure and sports - including travel.
Average Hours spent sleeping.
Average hours spent per day caring for and helping children under 18 years of age.
Average hours spent working for those employed. (15 years and older)
Average hours per day spent working on days worked (15 years and older)
library(ggplot2) us_time_survey$year <- as.factor(us_time_survey$year) ggplot(us_time_survey, aes(year, sleeping)) + geom_point(alpha = 0.3) + labs( x = "Year", y = "Average hours spent Sleeping", title = "US Average hours spent sleeping, 2009 - 2019" )
library(ggplot2) us_time_survey$year <- as.factor(us_time_survey$year) ggplot(us_time_survey, aes(year, sleeping)) + geom_point(alpha = 0.3) + labs( x = "Year", y = "Average hours spent Sleeping", title = "US Average hours spent sleeping, 2009 - 2019" )
In 2013, the House of Representatives voted to not stop the National Security Agency's (NSA's) mass surveillance of phone behaviors. We look at two predictors for how a representative voted: their party and how much money they have received from the private defense industry.
vote_nsa
vote_nsa
A data frame with 434 observations on the following 5 variables.
Name of the Congressional representative.
The party of the representative: D
for Democrat and R
for Republican.
State for the representative.
Money received from the defense industry for their campaigns.
Voting to rein in the phone dragnet or continue allowing mass surveillance.
MapLight. Available at http://s3.documentcloud.org/documents/741074/amash-amendment-vote-maplight.pdf.
Kravets, D., 2020. Lawmakers Who Upheld NSA Phone Spying Received Double The Defense Industry Cash. WIRED. Available at https://www.wired.com/2013/07/money-nsa-vote/.
table(vote_nsa$party, vote_nsa$phone_spy_vote) boxplot(vote_nsa$money / 1000 ~ vote_nsa$phone_spy_vote, ylab = "$1000s Received from Defense Industry" )
table(vote_nsa$party, vote_nsa$phone_spy_vote) boxplot(vote_nsa$money / 1000 ~ vote_nsa$phone_spy_vote, ylab = "$1000s Received from Defense Industry" )
State-level data on federal elections held in November between 1980 and 2014.
voter_count
voter_count
A data frame with 936 rows and 7 variables.
Year election was held.
Specifies if data is state or national total.
Number of citizens eligible to vote; does not count felons.
Number of ballots cast.
Number of ballots that contained a vote for the highest office of that election.
Overall voter turnout percentage.
Highest office voter turnout percentage.
United States Election Project
library(ggplot2) ggplot(voter_count, aes(x = percent_highest_office, y = percent_total_ballots_counted)) + geom_point() + labs( title = "Total Ballots V Highest Office", x = "Highest Office", y = "Total Ballots" )
library(ggplot2) ggplot(voter_count, aes(x = percent_highest_office, y = percent_total_ballots_counted)) + geom_point() + labs( title = "Total Ballots V Highest Office", x = "Highest Office", y = "Total Ballots" )