Datasheet for DownBallotR Election Results Data • DownBallotR

This document follows the spirit of the Datasheets for Datasets framework (Gebru et al., 2021) adapted for a data-retrieval package rather than a static dataset. Because DownBallotR fetches data live from external websites, some properties — such as exact record counts and precise coverage dates — are inherently dynamic and cannot be stated with finality.

1. Motivation

Why was this package created?

Sub-national election data in the United States is fragmented across hundreds of state, county, and municipal reporting systems. Each jurisdiction publishes results in a different format, on a different website, and with a different update cadence. Existing aggregators cover federal and some statewide races well, but consistent machine-readable access to local, down-ballot races — school board, municipal, judicial, and county-level contests — remains a significant gap for researchers.

DownBallotR was created to reduce the friction of accessing this data for academic research, journalism, and civic technology. It wraps a set of state-specific scrapers into a single, consistent R interface so that a researcher can retrieve precinct-level school board results with the same call structure they use for statewide general election results.

Who created it?

DownBallotR was developed by Graham Chickering in collaboration with Chris Warshaw. It is intended primarily as a research tool and is not affiliated with any government entity, election authority, or political organization.

Is there funding or sponsorship?

There is no commercial sponsorship. The package is open-source under the Apache License 2.0.

2. Data Composition

What does the package retrieve?

DownBallotR retrieves election results — vote totals by candidate and contest — from official sources. The level argument selects the constituency level of the returned results (the geographic unit at which votes are tabulated): depending on the source, results may be available at the statewide (state-level constituency), county/parish/town, or precinct level.

What sources are covered?

Source	States covered	Approx. start year	Sub-unit level
ElectionStats	CO, ID, MA, NH, NM, NY, SC, VA, VT	1789–2008 (varies)	County; precinct for CO, ID, MA, SC, NM, VA
NC State Board of Elections	North Carolina	2000	Precinct
Connecticut CTEMS	Connecticut	2016	Town
Georgia Secretary of State	Georgia	2000	Precinct
Utah elections portal	Utah	2023	Precinct
Indiana voters portal	Indiana	2019	County
Louisiana Secretary of State	Louisiana	1982	Parish

Start years are approximate. Each source has a confirmed range documented in db_available_years(). Data quality and completeness for the earliest years of each source has not been systematically validated.

What types of races are included?

Coverage depends on the source:

ElectionStats: general elections across a wide range of offices; exact offices vary by state and year.
State-portal scrapers (NC, CT, GA, UT, IN, LA): general elections as published by the respective state authority.

Primary elections, runoffs, special elections, and referenda are not systematically included across all sources.

What fields are returned?

The exact columns returned vary by source. Common fields include:

Contest or office name
Candidate name
Party affiliation (where available)
Vote total
Geographic unit (state, county, precinct, parish, or town)
Election year and/or date

There is no single guaranteed schema across all sources. Users should inspect the data frame returned for each source before combining results across sources.

Are there known gaps in coverage?

Yes. Coverage is limited to the states and years listed above. Many US states, territories, and local jurisdictions are not yet covered. Within covered states, individual contests may be missing, mislabeled, or structured differently across years if the underlying source changed its reporting format.

3. Data Collection Process

How is data collected?

The package uses two retrieval mechanisms:

HTTP requests (static HTML / JSON / CSV): Used for sources that serve data in a machine-readable form without JavaScript rendering. Requests are made from the user’s machine to the source website at scrape time.
Browser automation (Playwright via Python): Used for sources that require JavaScript rendering to populate results tables. A headless Chromium browser is launched locally on the user’s machine. This browser is managed by the Python playwright library, which is installed into an isolated virtual environment by downballot_install_python().

Both mechanisms retrieve data directly from the source websites at the time scrape_elections() is called. No data is cached or pre-fetched by the package maintainers. The data the user receives reflects the state of the source website at the moment of retrieval.

Is the data collected with the knowledge of the source websites?

The package makes standard HTTP/browser requests using publicly accessible URLs. No authentication is required for any source, and no terms of service are known to be violated. However, the package does not have formal data-sharing agreements with any source. Users are responsible for reviewing the terms of use of each source website before using the package in a production or high-volume context.

Are there rate limits or scraping constraints?

max_workers is capped at 4 parallel browser instances by default to reduce load on source servers. Users should not attempt to override this limit in a way that could overwhelm public infrastructure. For sources with long historical ranges (e.g., Louisiana 1982–present), full historical scrapes may take significant time and generate substantial request volume.

How are years and election cycles determined?

For most sources, year ranges are specified by the caller via year_from and year_to. The package validates requested years against a known confirmed range documented in db_available_years(). Years beyond the confirmed end are attempted but results are not guaranteed.

4. Preprocessing and Standardization

What preprocessing does the package apply?

The level of preprocessing varies substantially by source. In general:

Vote totals are extracted as integers and not aggregated or modified.
Candidate names are returned as they appear in the source.
Office/contest names are returned as they appear in the source and are not normalized to a common taxonomy across sources.
Geographic units (county, precinct, town, parish) are returned as they appear in the source. No crosswalk to standardized place names is applied automatically.
Party labels may involve heuristic parsing when the source does not provide a distinct party field. Where heuristics are used, common abbreviations (e.g., “Dem”, “Rep”, “D”, “R”) are mapped to full party names, but edge cases may be mislabeled.
Write-in candidates have been removed at this time but could be added in based upon request.
Election stages (primary, general, runoff) are not consistently distinguished across sources.

Is any imputation or gap-filling done?

Some columns, such as office_level or winner, are not reported by certain states. The package does its best based on other columns, such as vote counts or election names, to infer the correct values.

Is the output validated?

The package does not validate the substantive correctness of retrieved data against any external reference. It checks that scraping operations succeeded and returns what the source provided. We will be working to create a publicly available dataset containing these results that will be validated and checked in the future.

5. Intended Uses

What is this package designed for?

DownBallotR is designed for:

Academic research on electoral behavior, candidate emergence, and down-ballot political dynamics.
Exploratory data analysis and hypothesis generation using election results.
Journalism requiring historical context or cross-jurisdictional comparison of election results.
Civic technology applications that need programmatic access to historical results for visualization or analysis.

Who is the intended user?

Researchers and analysts who are comfortable with R, who understand the limitations of scraped data, and who can evaluate data quality relative to their specific analytical needs.

6. Out-of-Scope Uses

The following uses are explicitly not supported and should not be pursued with data retrieved from this package:

Official election certification or canvassing. Results from this package reflect what was published on source websites at the time of retrieval and are not certified, audited, or guaranteed to be final.
Legal proceedings relying on election result accuracy.
Real-time election night reporting. The package is not designed for live or near-live data retrieval. Source sites may be under heavy load or update continuously on election night.
High-stakes automated decision-making based on election results without human review.
High-volume automated scraping that could disrupt access to public election reporting infrastructure.

7. Distribution and Access

How is the package distributed?

DownBallotR is distributed as an open-source R package on GitHub. It is intended for CRAN submission; see the package DESCRIPTION for the current status.

Does the package cache or redistribute data?

No. The package does not include any election data or database. All data is retrieved live from the source websites by the user. The package maintainers do not host, mirror, or redistribute election result data.

What does a user need to use the package?

R (>= 4.1.0)
An internet connection at scrape time
Python (>= 3.10), installable automatically via downballot_install_python()
~100–200 MB of disk space for the Playwright Chromium browser (first-time setup)

8. Maintenance

Who maintains the package?

The package is maintained by Graham Chickering. Bug reports and feature requests can be filed at https://github.com/gchickering21/DownBallotR/issues.

How are source website changes handled?

State election websites change structure periodically without notice. When a source changes in a way that breaks scraping, the package will either return an error or may return malformed data. Users who observe unexpected results or errors are encouraged to file a bug report with the output they received so that the relevant scraper can be updated.

How frequently is the package updated?

There is no fixed release cadence. Updates are driven by source website changes, newly supported states, and user-reported issues.

Is there a versioning or changelog policy?

The package follows semantic versioning. Breaking changes to the public API will be reflected in the major or minor version number.

9. Ethical and Legal Considerations

Are there legal restrictions on the underlying data?

Election results published by government entities are generally in the public domain in the United States, but users should verify any jurisdiction-specific restrictions, licensing terms, or reuse conditions that may apply to particular sources or downstream uses.

Does the package collect or transmit user data?

The package itself does not collect, log, or transmit information about users, their queries, or the data they retrieve. However, the package documentation website uses Google Analytics to measure aggregate site traffic and understand how the documentation is being used. This website-level analytics may collect information such as page views, referral sources, device/browser metadata, and general usage patterns in accordance with Google Analytics practices. This tracking applies only to visits to the documentation website, not to use of the package within R.

Are there privacy concerns with the data?

Election results as typically published are aggregated by contest and geographic unit and do not contain voter-level information. Some sources may include candidate information such as names or party affiliation, which are part of the public record. Separately, visitors to the documentation website should be aware that web analytics tools may process limited usage data as part of standard website traffic measurement.

Could this package be used to suppress or manipulate election information?

The package retrieves and presents data as published by official sources. It does not have the ability to alter, suppress, or submit election information to any authority. Misrepresenting data retrieved by this package as official results, or selectively presenting results in misleading ways, would constitute misuse by the user rather than a feature of the package.

10. Known Limitations

The following limitations are known and should be considered by users:

Coverage is incomplete. Only the states listed in Section 2 are supported. Most states and all US territories are not currently covered.
Historical data quality is uneven. Earlier years have not been systematically validated. Missing contests, inconsistent formatting, and partial results are more likely for older elections.
Source websites can and do change. A scraper that worked at package release may break if the source website updates its structure. There is no guarantee of continued availability for any source.
Contest names are not normalized. “City Council District 4” and “City Council, Ward 4” may refer to the same or different races depending on the source and year. No ontology is applied to race names.
Sub-unit geographic names are not standardized. County names, precinct IDs, and town names are returned as they appear in the source. These may not match standard geographic reference files without additional cleaning.
Results may not be final. A scrape performed immediately after an election may retrieve unofficial or preliminary totals. Even well after election day, some sources may not clearly distinguish certified from uncertified results.
The package has no guaranteed uptime. Because it depends on live external websites, it may fail at any time due to website downtime, bot mitigation, or network issues outside the maintainer’s control.

11. Responsible Use Recommendations

Users working with data from this package should:

Verify critical results against official sources before publication or high-stakes decisions. Official certified results are available from the relevant state or county election authority and should be used as the authoritative reference.
Record the date and time of each scrape so that results can be reproduced or explained if the source is later updated.
Cite the original source, not DownBallotR, when reporting specific election results. The package is a retrieval tool; the data originates from the source websites listed in Section 2.
Inspect output for anomalies before analysis — unexpected zeros, missing candidates, or duplicate rows may indicate a parsing issue rather than a true feature of the data.
Use db_available_years() to check confirmed coverage before specifying a year range, and treat results for years beyond the confirmed end date as experimental.
Respect the source infrastructure. Avoid automated loops that issue hundreds of requests in quick succession.

12. Citation

If you use data retrieved through DownBallotR in published work, please cite both the package and the original data sources.

Package citation:

To generate the official citation (as defined in the package metadata), run:

citation("DownBallotR")

Recommended citation:

Chickering, G., & Warshaw, C. (2026). DownBallotR: Access federal, state, and local election data (R package, version 0.1.0). https://github.com/gchickering21/DownBallotR

Original data sources:

Election results should be cited individually. Official state election results are typically citeable by state, office, and year from the relevant Secretary of State or State Board of Elections websites.

References

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723