Example Analyses with DownBallotR
Source:vignettes/articles/example-analyses.Rmd
example-analyses.RmdDownBallotR returns standardized election results —
candidates, parties, votes, and winners — across federal, state, and
local contests. This vignette walks through several example
questions users commonly want to answer with the data, and
shows the corresponding tidyverse code.
The examples use Indiana (IN) as a running example
because Indiana publishes partisan results for every office on the
General Election ballot and exposes both a statewide and a county-level
frame. Most patterns transfer to other states, but some
details (exact office strings, county column name) vary by source — each
question below calls out what’s portable and what is not.
A few things you might use this data for
- Journalism — Find the closest races of the cycle, identify which offices flipped party, or answer “how did each party do this year?”
- Academic research — Quantify down-ballot drop-off, study ticket-splitting, or assemble panel data on partisan races across years.
- Local political strategy — Understand the partisan baseline in a county, identify uncontested races, or target candidate recruitment in winnable districts.
What varies across states (read this first)
A handful of details differ by source. They matter mostly when you start filtering or cross-state binding:
-
Office strings are not normalized.
officeis taken from each source’s site, so what one source calls"Governor"another might call"Governor of Indiana"or"GOVERNOR". Always inspect withunique(df$office)before filtering, and preferstr_detect()with a regex over==. -
Subnational column names differ. Indiana uses
county_name, ElectionStats states (NH, MA, CO, …) usecounty_or_city, Clarity states (GA, UT) usecounty, Connecticut usestown, Louisiana usesparish. See the Data dictionary for the full schema per source. -
Some columns only exist for some sources.
election_date,election_id,num_seats,url,candidate_idare source-specific.bind_rows()fills missing columns withNA, so cross-state binding still works — just don’t reference those columns in cross-state code. -
Party labels are normalized to canonical full names
(
"Democratic","Republican","Independent") across all sources, so cross-state party comparisons are safe. -
office_levelis always one of"Federal","State", or"Local". Not every state populates all three.
Setup
library(DownBallotR)
library(tidyverse)
# Statewide + county results for Indiana, 2020–2024.
# When level = "all", scrape_elections() also assigns each sub-frame
# directly into your environment (e.g. in_state, in_county).
res <- scrape_elections(state = "IN", year_from = 2024, year_to = 2024)
# Either of these works — they refer to the same data frames:
in_state <- res$state # statewide candidate totals
in_county <- res$county # county-level breakdownA useful first step on any new scrape — see what office strings, office levels, and parties are actually in the data:
in_state %>% count(office_level, sort = TRUE)
in_state %>% distinct(office_level, office) %>% arrange(office_level, office)
in_state %>% count(party, sort = TRUE)For reference, here’s what Indiana actually returns for 2024:
# office_level counts (state-level frame)
office_level n
Local 2316
State 306
Federal 51
# unique (office_level, office) — Federal & State
[Federal] Presidential Electors for US President & VP
[Federal] US Representative
[Federal] US Senator
[State] Attorney General
[State] Governor & Lt. Governor
[State] Judge, Circuit Court / Probate Court / Superior Court
[State] State Representative
[State] State Senator
# parties present
Democratic, Republican, Libertarian, Independent,
Nonpartisan (1,100 rows — mostly judges and school boards), and a few rare labels.
Note two things up front:
- Indiana writes
"US Senator"and"US Representative"(no periods). New Hampshire writes"U.S. House"and"Governor". Always inspect your data before filtering by exact office string. - Indiana labels many local races as
"Nonpartisan"— judges, school boards. Filteringparty == "Democratic"silently drops these from the denominator. For “partisan races only,” filterparty %in% c("Democratic", "Republican", "Independent", "Libertarian")or exclude"Nonpartisan".
The remaining sections assume in_state and
in_county are loaded.
Question 1 — How many races did each party win?
Filter to winners and count by party:
Often the more interesting view is by office level — federal, state, and local performance can diverge:
Portability: fully portable.
winner, party, and office_level
are present on every state-level scrape.
Question 2 — All races for a given office
The state-run General Election data covers every partisan office on the ballot. To answer “how many of these races did each party win?” you filter by office and count winners. Picking a single-word, universal office like Governor keeps the regex simple:
str_detect() with a regex is more forgiving than
==, since office names vary slightly across sources and
years (e.g. "Governor" vs
"Governor of Indiana" vs "GOVERNOR").
Which party won each Governor’s race in our window?
in_state %>%
filter(str_detect(office, regex("governor", ignore_case = TRUE)),
winner) %>%
select(election_year, office, candidate, party, votes, vote_pct)The same pattern works for any office — swap "governor"
for "senate", "president", or
"sheriff". The user’s original “mayoral races by party”
question is identical — regex("mayor", ignore_case = TRUE)
— though Indiana’s General Election data does not include municipal
mayoral races (those run on odd years through counties); Connecticut’s
data does.
Portability: pattern is portable; the regex string
is not. Run unique(df$office) first to confirm how
your source spells the office. A few real examples I observed:
-
U.S. House: NH writes
"U.S. House", IN writes"US Representative". Cross-state regex:regex("u\\.?s\\.?\\s*(house|representative)", ignore_case = TRUE) -
U.S. Senate: NH writes
"U.S. Senate", IN writes"US Senator". Cross-state regex:regex("senat", ignore_case = TRUE)(matches both “Senate” and “Senator”; combine withoffice_level == "Federal"to exclude state senate). -
Governor: both NH and IN write something containing
"Governor"as a substring (IN’s full string is"Governor & Lt. Governor"), soregex("governor", ignore_case = TRUE)works for both.
Question 3 — Total votes by party
Sometimes the question is not “who won” but “how much support did each party receive overall?” Sum votes across all contests:
in_state %>%
group_by(party) %>%
summarise(total_votes = sum(votes, na.rm = TRUE)) %>%
arrange(desc(total_votes))For a fairer comparison, restrict to a single office level — otherwise a few high-turnout federal contests dominate the total:
in_state %>%
filter(office_level == "State") %>%
group_by(election_year, party) %>%
summarise(total_votes = sum(votes, na.rm = TRUE), .groups = "drop") %>%
arrange(election_year, desc(total_votes))Portability: fully portable.
Question 4 — What were the closest races?
A common journalistic question. Compute each contest’s winning margin in percentage points by sorting candidates within a contest:
in_state %>%
group_by(election_year, office, district) %>%
arrange(desc(vote_pct), .by_group = TRUE) %>%
summarise(
winner_party = first(party),
winner_pct = first(vote_pct),
runner_pct = nth(vote_pct, 2),
margin_pp = winner_pct - runner_pct,
.groups = "drop"
) %>%
filter(!is.na(margin_pp)) %>%
slice_min(margin_pp, n = 10)nth(vote_pct, 2) returns NA for uncontested
races, which the filter(!is.na(...)) step drops.
slice_min() returns the 10 rows with the smallest margin in
ascending order.
Portability: fully portable.
(election_year, office, district) uniquely identifies a
contest in every source schema.
Question 5 — Down-ballot roll-off (a careful one)
Do voters who cast a top-of-ticket ballot also vote in down-ballot races? This question is easy to ask but easy to get wrong.
Tempting (but wrong) first attempt: sum votes per office level.
in_state %>%
filter(election_year == 2024) %>%
group_by(office_level) %>%
summarise(total_votes = sum(votes, na.rm = TRUE)) %>%
mutate(pct_of_federal = total_votes /
total_votes[office_level == "Federal"])For Indiana 2024 this returns:
office_level total_votes pct_of_federal
Federal 8,634,681 1.00
State 12,420,340 1.44
Local 21,278,852 2.46
Local shows 2.46× as many votes as Federal, which
can’t possibly mean Local turnout was higher than presidential turnout.
The reason: each voter casts ballots in several local races
(county treasurer + county council + school board + …) but only a
handful of federal ones (president + their House district + maybe
senator). So sum(votes) counts
voters × (races per voter), and inflates by the number of
races at that level.
The correct measure compares single races. Each voter contributes at most one vote to a given statewide race, so race-vs-race ratios are real turnout comparisons:
in_state %>%
filter(election_year == 2024,
office %in% c("Presidential Electors for US President & VP",
"Governor & Lt. Governor",
"US Senator",
"Attorney General")) %>%
group_by(office) %>%
summarise(total_votes = sum(votes, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_votes)) %>%
mutate(pct_of_top = total_votes / max(total_votes))Indiana 2024 actual output:
office total_votes pct_of_top
Presidential Electors for US President & VP 2,936,677 1.000
Governor & Lt. Governor 2,879,655 0.981
Attorney General 2,838,098 0.966
US Senator 2,829,897 0.964
A few percentage points of roll-off down the statewide ballot is typical — voters who skipped a race after voting for President.
Local roll-off needs a different approach. Each
local race is confined to a single county/district, so summing all 92
county treasurer contests does equal ~one vote per voter (every
voter has one CT race available) — but only if every county actually
holds the race that year, which isn’t always true. The safest local
roll-off measure: pick one county, then compare votes in its
top-of-ticket race to a specific local race within
in_county.
in_county %>%
filter(election_year == 2024,
county_name == "Marion",
office %in% c("Presidential Electors for US President & VP",
"County Treasurer",
"School Board Member")) %>%
group_by(office) %>%
summarise(total_votes = sum(votes, na.rm = TRUE), .groups = "drop") %>%
mutate(pct_of_president = total_votes /
total_votes[office == "Presidential Electors for US President & VP"])Portability: the pattern is portable; the exact
office strings are state-specific. Use
unique(df$office) to find the top federal race in your
state. NH would use "President" and
"Governor", for example, not Indiana’s longer category
labels.
Question 6 — Year-over-year trends (update the initial data pull to ensure you have mutiple years of data)
Once you have a multi-year scrape, party performance over time is a
straightforward count():
in_state %>%
filter(office_level == "Federal", winner) %>%
count(election_year, party) %>%
arrange(election_year, desc(n))Or vote share for a single party over time:
in_state %>%
filter(office_level == "Federal") %>%
group_by(election_year) %>%
summarise(
dem_share = sum(votes[party == "Democratic"], na.rm = TRUE) /
sum(votes, na.rm = TRUE)
)Portability: fully portable.
Question 7 — County-level variation within a single race
Where did a candidate over- or under-perform their statewide result? Using the county-level frame, look at Democratic vote share for the 2024 U.S. Senate race across Indiana’s 92 counties.
Note the regex below uses "senat", not
"senate" — Indiana writes the office as
"US Senator", so regex("senate", ...) would
return zero rows. "senat" matches both
"Senate" (NH) and "Senator" (IN), and the
office_level == "Federal" filter excludes State Senate.
in_county %>%
filter(election_year == 2024,
office_level == "Federal",
str_detect(office, regex("senat", ignore_case = TRUE)),
party == "Democratic") %>%
arrange(desc(vote_pct)) %>%
select(county_name, candidate, votes, vote_pct)To find the counties where the statewide nominee under-performed most, pull the statewide number and compute a swing for each county:
statewide_dem_senate <- in_state %>%
filter(election_year == 2024,
office_level == "Federal",
str_detect(office, regex("senat", ignore_case = TRUE)),
party == "Democratic") %>%
pull(vote_pct)
in_county %>%
filter(election_year == 2024,
office_level == "Federal",
str_detect(office, regex("senat", ignore_case = TRUE)),
party == "Democratic") %>%
mutate(swing_vs_state = vote_pct - statewide_dem_senate) %>%
select(county_name, vote_pct, swing_vs_state) %>%
slice_min(swing_vs_state, n = 10)Portability: pattern is portable; the column name is not. The subnational unit column changes by source:
| Source | Subnational column |
|---|---|
| Indiana | county_name |
| ElectionStats (NH, MA, CO, …) | county_or_city |
| Clarity (GA, UT) | county |
| Connecticut | town |
| Louisiana | parish |
| North Carolina |
county (county frame) or precinct
|
Drilling into local elections
State-run General Election scrapers include a long tail of local offices alongside the federal and statewide races. Indiana 2024 alone returns 2,316 local-office rows across 14 office types (school boards, county treasurers, county commissioners, town councils, township boards, etc.). Several questions are only possible at this level.
First, a data-shape caveat. For local offices in
Indiana’s state-level frame, the district column means
different things depending on the office:
-
Countywide offices (County Treasurer, County Recorder,
County Auditor, etc.) have no sub-district, so the parser puts the
county name into
district(e.g."Bartholomew County"). One row of(office, district)= one contest. -
Districted offices (County Council, County Commissioner)
have an internal district. The parser puts the district
label into
district(e.g."District 1"), and the county name is lost from the state frame. To analyze these by county, use the county-level frame (in_county), which hascounty_nameas a separate column.
The examples below stick to offices where district
reliably encodes the contest.
What local offices are on the ballot, and which are partisan?
in_state %>%
filter(office_level == "Local") %>%
group_by(office) %>%
summarise(
n_candidates = n_distinct(candidate),
is_partisan = any(party %in% c("Democratic", "Republican")),
parties = paste(sort(unique(party)), collapse = ", "),
.groups = "drop"
) %>%
arrange(desc(n_candidates))Real output for Indiana 2024 (abbreviated):
office n_candidates is_partisan
School Board Member 1086 FALSE
County Council Member 408 TRUE
County Commissioner 262 TRUE
Town Council Member 125 TRUE
County Treasurer 88 TRUE
County Coroner 86 TRUE
County Surveyor 71 TRUE
Township Board Member 57 TRUE
County Recorder 47 TRUE
Clerk of the Circuit Court 34 TRUE
County Auditor 28 TRUE
Town Clerk-Treasurer 21 TRUE
Useful as a first orientation: nine of the twelve most common local office types are partisan, with school boards (the single largest local office by candidate count) the major nonpartisan exception.
How competitive are countywide partisan races?
County Treasurer is a textbook countywide partisan office — every
county that holds the election has exactly one contest, so
(office, district) cleanly identifies each race. The
natural question: how often is the race contested, and how often is it
one-party?
in_state %>%
filter(office == "County Treasurer",
party %in% c("Democratic", "Republican")) %>%
group_by(district, election_year) %>% # district = county name
summarise(
n_candidates = n_distinct(candidate),
parties = paste(sort(unique(party)), collapse = "/"),
.groups = "drop"
) %>%
count(n_candidates, parties)The output is a small table — for Indiana 2024 you’ll typically see
many n_candidates == 1 rows (single party fielding the only
candidate) alongside the Democratic/Republican
two-candidate races. The same pattern works for County Recorder, County
Auditor, and other countywide partisan offices.
For an explicitly partisan-baseline view, group by the winner’s party:
in_state %>%
filter(office == "County Treasurer",
party %in% c("Democratic", "Republican"),
winner) %>%
count(party)This is the local analogue of “how many county treasurer races did each party win?” — the user’s original mayoral question, restated for an office that actually appears in this dataset.
Are school board races competitive?
School Board Member is by far the largest local office in Indiana’s data (~1,000 candidates per cycle across ~300 contests). The distribution of candidates per contest measures field size:
in_state %>%
filter(office == "School Board Member") %>%
group_by(district, election_year) %>%
summarise(n_candidates = n_distinct(candidate), .groups = "drop") %>%
count(n_candidates) %>%
arrange(n_candidates)Most school board contests draw 1–4 candidates; a long right tail of large at-large boards (10+, sometimes 50+ candidates) reflects multi-seat elections rather than head-to-head races.
Portability: patterns are portable; office strings
are not. The set of local offices a source reports varies a lot
— Indiana exposes 14 distinct local office types; NH exposes 6
(Attorney, County Commissioner, Register of Deeds, Register of Probate,
Sheriff, Treasurer); Connecticut surfaces actual municipal mayors that
Indiana omits. Always inspect unique(df$office) for your
state first, and prefer the county/town/parish frame for
cross-jurisdictional analysis of districted offices.
Across multiple states (small N)
scrape_elections() returns one state at a time, but the
output schemas are consistent enough that you can bind frames together
and compare. The state column is always populated, so
cross-state queries work cleanly even when the column sets differ
slightly. bind_rows() fills NA where columns
aren’t shared.
Pulling several states
ind <- scrape_elections(state = "IN", year_from = 2024, year_to = 2024,
level = "state")
nh <- scrape_elections(state = "NH", year_from = 2024, year_to = 2024,
level = "state")
multi <- bind_rows(ind, nh)Question 8 — Governor winners across states
Which party won the Governor’s race in each state? (Both IN and NH held gubernatorial elections in 2024.)
Question 9 — Democratic vote share across states
A single statewide number per state — useful for ranking how each state voted in 2024 at the federal level:
Question 10 — Down-ballot roll-off, state by state
Applying the correct Question 5 logic — single top race at each level — to multiple states puts them on a comparable scale. For each state, find the highest-vote race at the Federal and State levels, then compute the ratio:
multi %>%
filter(office_level %in% c("Federal", "State")) %>%
group_by(state, office_level, office) %>%
summarise(office_votes = sum(votes, na.rm = TRUE), .groups = "drop") %>%
group_by(state, office_level) %>%
slice_max(office_votes, n = 1, with_ties = FALSE) %>% # top race per level
group_by(state) %>%
mutate(pct_of_top_federal = office_votes /
office_votes[office_level == "Federal"]) %>%
select(state, office_level, office, office_votes, pct_of_top_federal) %>%
arrange(state, desc(office_votes))The Federal row is 1.00 by construction; the State row’s
pct_of_top_federal reveals how much turnout dropped between
the top federal race and the top state race in each state. (For local
roll-off, see the note in Question 5 — use a single county’s data from
the county-level frame.)
A note on vote_pct
vote_pct is normalized within each contest, so it’s
not directly comparable across contests or states. For
cross-state comparisons, always work from raw votes and
re-aggregate (as in Questions 9 and 10 above).
A note on party labels in cross-state queries
Most sources use canonical "Democratic" /
"Republican" labels, but a few quirks show up when binding
many states:
-
New Hampshire has fusion-party labels — candidates
appear with
"Republican/Democratic"or"Democratic/Republican"when nominated by both major parties. A filterparty == "Democratic"silently drops them. Usestr_detect(party, "Democratic")if you want to count fusion candidates as Democrats. -
Indiana labels many local races (judges, school
boards) as
"Nonpartisan". These have no Dem/Rep candidate at all. Includefilter(party != "Nonpartisan")if your question is about partisan competition only.
Comparing across many states (large N)
Once you have data for a dozen or more states, the questions shift from “who won” to “how do states differ on this metric?” The pattern is always the same: scrape each state once, bind them, then compute one number per state. Below are questions that only become interesting at scale.
Building a multi-state panel
Scraping many states in one R session can take a while. A common pattern is to scrape each state once, save the result, and re-load later:
states <- c("IN", "NH", "VA", "MA", "CO", "VT", "ID", "NY", "NM", "SC")
# Pull each state's 2024 state-level frame, bind, and save
panel_2024 <- map_dfr(states, function(s) {
scrape_elections(state = s, year_from = 2024, year_to = 2024,
level = "state")
})
saveRDS(panel_2024, "panel_2024.rds")
# Next session — skip the rescrape
panel_2024 <- readRDS("panel_2024.rds")map_dfr() (from purrr) calls
scrape_elections() once per state and row-binds the
results. If one state errors, wrap the call in possibly()
or safely() so a single failure doesn’t lose the rest.
Question 11 — Closest statewide race in each state
Which state had the nail-biter of the cycle? Combine Q4’s margin
calculation with a group_by(state):
panel_2024 %>%
filter(office_level == "Federal") %>%
group_by(state, office, district) %>%
arrange(desc(vote_pct), .by_group = TRUE) %>%
summarise(
winner_party = first(party),
margin_pp = first(vote_pct) - nth(vote_pct, 2),
.groups = "drop"
) %>%
filter(!is.na(margin_pp)) %>%
group_by(state) %>%
slice_min(margin_pp, n = 1) %>%
arrange(margin_pp)This returns the single tightest federal race per state, ranked nationally — a useful one-glance summary of where the action was.
Question 12 — Two-party vote share, ranked across states
How does Democratic vote share for federal races compare across the panel?
panel_2024 %>%
filter(office_level == "Federal") %>%
group_by(state) %>%
summarise(
n_contests = n_distinct(office, district),
dem_share = sum(votes[party == "Democratic"], na.rm = TRUE) /
sum(votes, na.rm = TRUE),
rep_share = sum(votes[party == "Republican"], na.rm = TRUE) /
sum(votes, na.rm = TRUE)
) %>%
mutate(two_party_dem = dem_share / (dem_share + rep_share)) %>%
arrange(desc(two_party_dem))two_party_dem puts states on a comparable scale even
when third-party share varies.
Question 13 — Uncontested races by state
Which states have the most one-candidate races? A useful structural indicator of party competition:
panel_2024 %>%
group_by(state, office_level, office, district, election_year) %>%
summarise(n_candidates = n_distinct(candidate), .groups = "drop") %>%
filter(n_candidates == 1) %>%
count(state, office_level, sort = TRUE)Or as a rate — share of contests that are uncontested:
panel_2024 %>%
group_by(state, office_level, office, district, election_year) %>%
summarise(n_candidates = n_distinct(candidate), .groups = "drop") %>%
group_by(state, office_level) %>%
summarise(
n_contests = n(),
n_uncontested = sum(n_candidates == 1),
pct_uncontested = n_uncontested / n_contests,
.groups = "drop"
) %>%
arrange(desc(pct_uncontested))Question 14 — Down-ballot roll-off, ranked nationally
Q10 with more states — the answers become a national distribution rather than a two-state comparison. Using the same “single top race per level” approach to avoid the per-voter-multiplier bug in Question 5:
panel_2024 %>%
filter(office_level %in% c("Federal", "State")) %>%
group_by(state, office_level, office) %>%
summarise(office_votes = sum(votes, na.rm = TRUE), .groups = "drop") %>%
group_by(state, office_level) %>%
slice_max(office_votes, n = 1, with_ties = FALSE) %>%
group_by(state) %>%
summarise(
top_federal_votes = office_votes[office_level == "Federal"],
top_state_votes = office_votes[office_level == "State"],
state_to_federal = top_state_votes / top_federal_votes,
.groups = "drop"
) %>%
arrange(state_to_federal)States at the top of the result lose the most voters between the top
of the ticket and the top statewide office (e.g., Governor or Attorney
General) — a useful indicator of how much voter attention extends past
President. For local-level roll-off across many states, loop over a
single county per state using in_county (the local
comparison only works when every voter shares the race, which doesn’t
hold across multiple counties).
Question 15 — Party dominance index
A composite of “share of statewide federal contests won by the dominant party,” useful for ranking states by partisan asymmetry:
panel_2024 %>%
filter(office_level == "Federal", winner) %>%
count(state, party) %>%
group_by(state) %>%
summarise(
total_wins = sum(n),
top_party = party[which.max(n)],
top_share = max(n) / sum(n),
.groups = "drop"
) %>%
arrange(desc(top_share))Quick top-line: summarize_results()
Before writing targeted queries, summarize_results()
gives a one-call overview of what’s in any scraped data frame:
summarize_results(in_state)Output (abbreviated):
Election Results Summary
========================
State : Indiana
Years covered : 2020–2024 (3 years)
Elections : ...
Unique candidates: ...
Elections by office level:
Federal : N elections, M offices
- ...
State : ...
Local : ...
It is a useful starting point for orienting yourself before drilling into a specific office, party, or year.