U.S. House Election Data Visualisation Project

US Election/Data Analytics/Data Visualisation/Web Scrapping/Web App with Dash/Beautiful Soup/Selenium

Author: Regan C.H. Yin   |   Special thanks: Andy Chan, Daniel Lau
An interactive analytics tool that lets users explore the 2022 U.S. House election at state and district levels. It combines robust web scraping (Selenium + BeautifulSoup), data processing (Pandas) and visualisation (Dash + Plotly).

US House 2022 interactive dashboard screenshot
Interactive Dash app: Choropleth map + party vote distributions + district winners.

Executive Summary

Scope
US House 2022
Total Seats
435
Headline
R 222 · D 213
Interactivity
Hover/Filter & Linked Charts

The project delivers a clean, performant dashboard to compare party performance, vote distributions and district-level winners. It automates data ingestion from public election pages, cleans and aggregates results, then renders them in an intuitive interface suitable for analysts and the general public alike.

Tech Stack & Roles

Python Selenium BeautifulSoup Pandas Dash (JupyterDash) Plotly Express

End-to-end by Regan: scraping, data engineering, visual design, and app integration.

Objectives

  • Provide a user-friendly, interactive view of the 2022 U.S. House election.
  • Enable state & district-level comparisons of winners and voting patterns.
  • Demonstrate a full pipeline: scraping → cleaning → feature engineering → dashboard.

Code Writing Procedure (End-to-End)

Part I — Web Scraping

Libraries: selenium, bs4 (BeautifulSoup), webdriver_manager, pandas.

  1. Driver setup. Configure Chrome with lightweight options, initialise webdriver.Chrome().
  2. State mapping. Define a state_dict of full state names → USPS codes.
  3. Core function. Implement get_election_data(driver, state):
    • Parse page source via BeautifulSoup.
    • Handle two layouts: multi-district vs single-district states.
    • Extract district, candidate, party, incumbent, votes, percent.
    • Normalise party labels (R/D/Ind./Libertarian/Green).
  4. State enumeration. Visit the index page to collect the 50 state links, then iterate each state’s House page; click “expand” buttons when present to reveal all candidates.
  5. Tabular output. Convert nested lists → DataFrame with columns: State, State Code, District, Party, Candidate, Incumbent, Vote, Pct%; persist to house.csv.
# sketch of the core scraping pieces
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

state_dict = {"Alabama":"AL", "Alaska":"AK", ...}

def get_election_data(driver, state):
    soup = BeautifulSoup(driver.page_source, "html.parser")
    result = []
    # handle multi- and single-district layouts, normalise parties & incumbency
    # append rows: [state, state_code, district, party, name, incumbent, votes, pct]
    return result

# build state list from index, loop each state house page, click expand when present
rows = []
for state in state_list:
    driver.get(f"https://www.politico.com/2022-election/results/{state.lower().replace(' ','-')}/house/")
    rows += get_election_data(driver, state)

house = pd.DataFrame(rows, columns=["State","State Code","District","Party","Candidate","Incumbent","Vote","Pct%"])
house.to_csv("house.csv", index=False)

Part II — Data Cleaning & Feature Engineering

  • Sanitise numeric fields: remove commas; coerce Vote to int, Pct% to float (0–1).
  • For each state: compute Total Seats, Total Votes, Won Seats (Republican seat count), party vote sums, and Rep. Won Seats %; infer Won Party.
  • Output an aggregated seat_won table for the map + summary charts.
import pandas as pd
house = pd.read_csv("house.csv")
house["Vote"] = pd.to_numeric(house["Vote"].str.replace(",",""), errors="coerce").dropna().astype(int)
house["Pct%"] = house["Pct%"].str.rstrip("%").astype(float)/100.0

seat_rows = []
for (state, code), g in house.groupby(["State","State Code"]):
    total_seats = g["District"].nunique()
    total_votes = g["Vote"].sum()
    won_seats = sum(g.groupby("District").apply(lambda x: x.loc[x["Pct%"].idxmax(),"Party"])=="Republican")
    republican_votes = g.loc[g["Party"]=="Republican","Vote"].sum()
    democratic_votes = g.loc[g["Party"]=="Democratic","Vote"].sum()
    libertarian_votes = g.loc[g["Party"].str.contains("Libertarian", na=False),"Vote"].sum()
    seat_rows.append({
        "State":state,"State Code":code,"Total Seats":total_seats,"Total Votes":total_votes,
        "Won Seats":won_seats,"Republican Votes":republican_votes,"Democratic Votes":democratic_votes,
        "Libertarian Votes":libertarian_votes,"Rep. Won Seats %":round(won_seats/total_seats*100,2),
        "Won Party":"Republican" if won_seats > total_seats/2 else "Democratic"
    })
seat_won = pd.DataFrame(seat_rows)

Part III — Dash App (Map + Linked Charts)

  • Layout: Title, KPI line (435 seats; R 222 / D 213), USA choropleth (state code), and two right-hand panels: party vote distribution & district winner bars, both responding to map hover.
  • Top/Bottom 25: Radio selector renders ranked bar charts for % of Republican seats.
  • Callbacks: one for map hover → updates two panels; one for the Top/Bottom toggle.
from jupyter_dash import JupyterDash
from dash import html, dcc, Input, Output
import plotly.express as px

app = JupyterDash(__name__)
app.layout = html.Div([...])  # map + linked charts + toggle

@app.callback(
    [Output("choropleth_map","figure"), Output("bar_chart","figure"), Output("district_bar_chart","figure")],
    Input("choropleth_map","hoverData")
)
def update_panels(hover):
    # build map from seat_won; when hovering a state, slice seat_won & house
    # return: map fig, party vote bar, district winner bar
    ...

@app.callback(Output("top-bottom-25-bar-chart","figure"), Input("toggle-chart","value"))
def update_top_bottom(sel):
    # select nlargest/nsmallest on 'Rep. Won Seats %', render bar chart
    ...

Final Outcomes

  • Fully working, interactive dashboard with linked views (map ↔ state & district panels).
  • Cleaned dataset (house.csv) and engineered summary table (seat_won).
  • Insight surfaces: party dominance by state, competitiveness, and distribution of winners across districts.

Reflections & Next Steps

What went well: Reliable scraping across heterogeneous layouts; concise visual grammar; responsive callbacks.

Challenges: Dynamic pages & sporadic missing party tags required defensive parsing; hover-driven UX needed careful defaults.

Roadmap: Add Senate/Presidential modules; deploy Dash app to a managed host; integrate trendlines and forecasting.

How to Run Locally

  1. Create and activate a Python 3.10+ environment.
  2. pip install selenium webdriver-manager beautifulsoup4 pandas jupyter-dash dash plotly
  3. Run the scraper to produce house.csv, then start the Dash app (JupyterDash or pure Dash).
  4. Open the served URL (e.g., http://127.0.0.1:805x).