Energy burden—the proportion of household income spent on energy—is a critical metric for understanding energy poverty and inequity. However, traditional energy burden ratios present analytical challenges including difficulties with aggregation and visualization of extreme values. The emburden package for R implements Net Energy Return (Nh) methodology to address these limitations while enabling temporal analysis of household energy characteristics. This paper introduces the package’s design and demonstrates its application to comparing Low-Income Energy Affordability Data (LEAD) Tool vintages from 2018 and 2022 across geographic and demographic dimensions. The package provides functions for downloading, processing, and analyzing census tract-level energy burden data for all U.S. states, with particular attention to proper weighted aggregation and schema normalization across data vintages. We demonstrate the package’s capabilities through examples ranging from state-level summaries to fine-grained census tract comparisons, illustrating how policy-relevant insights can be extracted at multiple scales.
Household energy affordability is a persistent challenge affecting millions of households in the United States. Low-income households face disproportionate energy burdens, often spending more than 6% of their income on energy costs compared to 2-3% for higher-income households (Ross, Drehobl, and Stickles 2018; Drehobl and Ross 2016). Understanding these disparities and tracking changes over time is essential for designing effective energy assistance programs and policies.
The traditional energy burden metric—the ratio of energy expenditures (\(S\)) to gross income (\(G\))—has several analytical limitations. As a ratio with income in the denominator, energy burden (\(E_b = S/G\)) approaches infinity for households with very low incomes, creating challenges for aggregation and visualization. Additionally, the metric requires harmonic mean aggregation rather than arithmetic means, which is not widely understood or consistently applied (Scheier and Kittner 2022).
The emburden package for R addresses these challenges by
implementing Net Energy Return (NER) methodology, adapted from
macro-energy systems analysis (Hall, Lambert, and
Balogh 2011; Brandt, Dale, and Barnhart 2013; Carbajales-Dale et al.
2014). Net energy analysis estimates the net energy return of a
process as a relationship between gross resources extracted and embodied
energy directed toward extraction:
\[G = Gross\ Resource\ Extracted\]
\[S = Spending\ on\ Extraction\ Process\]
\[Net\ Energy\ Return\ (NER) = \frac{G - S}{S}\]
For households extracting income from the economy, these ratios become:
\[G_{income} = Gross\ Income\]
\[S_{energy} = Spending\ on\ Energy\]
\[NER_{household} = \frac{G_{income} - S_{energy}}{S_{energy}}\]
This metric represents the net earnings a household receives for every dollar of expenditure on secondary energy. For notational simplicity, we use \(N_h\) to denote household Net Energy Return throughout this paper, where \(N_h = NER_{household}\).
Energy burden, the traditional metric in energy poverty analysis, is defined as:
\[Energy\ Burden = E_b = \frac{S_{energy}}{G_{income}}\]
While energy burden is intuitive as a percentage, it has several mathematical limitations. The Net Energy Return transformation addresses these by preventing double-counting of energy expenditures (income in the numerator already includes the portion spent on energy) and enabling proper weighted mean aggregation:
\[\overline{N_h} = \frac{\sum (N_h \times households)}{\sum households}\]
In contrast, energy burden requires harmonic mean aggregation:
\[\overline{E_b} = \frac{1}{\overline{1/E_b}}\]
The two metrics are mathematically related through the transformation \(E_b = 1/(N_h + 1)\), allowing seamless conversion between representations.
Energy poverty is commonly defined as spending greater than 10% of household income on energy (Bednar and Reames 2020):
\[E_b^{*} = \frac{S_{energy}}{G_{income}} > 10\%\]
Translated to Net Energy Return, the energy poverty threshold becomes:
\[N_h^{*} < 9: Household\ at\ Energy\ Poverty\ Line\]
This means a household earning less than $9 of income for every dollar spent on secondary energy is considered to be in energy poverty by the traditional energy burden accounting method. A Net Energy Return of 9 or lower is equivalent to an energy burden of 10% or higher. While this threshold is somewhat arbitrary and may not be suitable in all situations, it provides a useful benchmark for comparing results to the energy poverty literature.
The U.S. Department of Energy’s Low-Income Energy Affordability Data (LEAD) Tool (Ma et al. 2019) provides census tract-level estimates of household energy characteristics based on American Community Survey microdata. The tool uses iterative proportional fitting to allocate households to census tracts while calibrating to utility-reported sales and revenues.
Multiple vintages of LEAD Tool data have been released:
These vintages enable temporal analysis of energy burden trends, but require careful handling of schema differences and income bracket definitions.
The emburden package is designed around several key
principles:
The emburden package provides access to three primary
datasets for household energy burden analysis:
The Low-Income Energy Affordability Data (LEAD) Tool (Ma et al. 2019) portrays average income, electricity expenditures, gas expenditures, and other fuel expenditures for cohorts of households segmented by location (census tract, county, state) and household characteristics (ownership status, building age, number of units, attachment status, primary heating fuel).
The dataset is assembled using iterative proportional fitting (IPF), a widely used spatial microsimulation method to allocate households to census tracts while calibrating characteristics to known quantities. The IPF algorithm processes cross-tabulations of household responses from the American Community Survey (ACS) Public Use Microdata Samples, scaling them to match aggregate annual values from utility sales and revenues reported in Energy Information Administration forms 861 (electricity) and 176 (natural gas).
Multiple vintages are available:
The Renewable Energy Potential of Low-Income Communities in America (REPLICA) dataset (Sigrin and Mooney 2018) adds technical rooftop solar potential and additional techno-economic variables including demographics and electricity rates. The package can merge REPLICA data with LEAD data to enrich analyses with utility type, locale classification, and solar generation potential.
A critical challenge in temporal analysis is handling schema differences between LEAD Tool vintages. The package implements automatic normalization through the following transformations:
Income bracket aggregation: The LEAD Tool provides income as a fraction of Area Median Income (AMI) or Federal Poverty Level (FPL). For AMI data, the package can aggregate detailed brackets into simplified categories matching the REPLICA schema:
For FPL data, the aggregation follows poverty line definitions:
Building type simplification: Housing units are classified as:
1 Unit: Multi-Family
These normalizations enable valid temporal comparisons despite underlying schema evolution between vintages.
The package processes raw LEAD Tool data through several stages:
For each household cohort, the package calculates:
\[s = electricity + natural\ gas + other\ fuels\]
\[g = annual\ household\ income\]
From these base metrics, all energy burden indicators are derived using the formulas presented in Section 1.1.
The package implements proper weighted aggregation using household counts as weights. For Net Energy Return:
calculate_weighted_metrics(
data,
group_columns = c("state", "income_bracket"),
metric_name = "ner"
)
This function:
The key insight is that Net Energy Return allows arithmetic weighted means, while energy burden would require harmonic mean aggregation—a distinction that significantly impacts the validity and interpretability of aggregate statistics.
Iterative proportional fitting has limitations as an estimation procedure. The relationship between constraint variables tends toward the average of the initializing dataset, potentially depressing variations among otherwise similar regions. This may explain the large quantities of households estimated to have very low incomes. Validating these estimated data would require randomized surveys along the dimensions of interest.
Additionally, the “primary heating fuel” category derives from the ACS question “Which fuel is used most for heating this house, apartment, or mobile home?” The predictive power of this question for energy expenditures is not fully understood and warrants caution in interpretation.
Though REPLICA relies on a different LEAD vintage (2017) than recent analyses (2019, 2022), the package still enables useful cross-dataset analysis. However, inferring differences among annual estimates should account for the standard error of the data (Ma et al. 2019). Rigorous temporal analysis benefits from comparing identically-processed vintages.
The emburden package is organized into several
functional modules:
library(emburden)
# Energy metric calculations
energy_burden_func(gross_income, energy_spending)
ner_func(gross_income, energy_spending) # Net Energy Return
eroi_func(gross_income, energy_spending) # EROI
dear_func(gross_income, energy_spending) # DEAR
# Statistical aggregation
calculate_weighted_metrics(
graph_data,
group_columns = "state",
metric_name = "ner"
)
The package provides automatic data downloading and caching:
# Load census tract data (auto-downloads if not available)
nc_tracts <- load_census_tract_data(states = "NC")
# Load cohort data by income bracket
nc_ami <- load_cohort_data(
dataset = "ami",
states = "NC",
vintage = "2022"
)
# Compare vintages
comparison <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "state"
)
The emburden package’s primary contribution is enabling
temporal analysis of energy burden through proper schema normalization
and aggregation. This section demonstrates the package’s capabilities
through progressively detailed examples.
The compare_energy_burden() function provides the core
temporal analysis functionality:
library(emburden)
# Compare North Carolina energy burden: 2018 vs 2022
nc_comparison <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "income_bracket"
)
# View formatted comparison table
print(nc_comparison)
The function automatically:
The comparison object contains multiple metrics:
# Energy burden in 2018 and 2022
nc_comparison$neb_2018
nc_comparison$neb_2022
# Change in energy burden (percentage points)
nc_comparison$neb_change_pp
# Net Energy Return values
nc_comparison$ner_2018
nc_comparison$ner_2022
# Household counts
nc_comparison$households_2018
nc_comparison$households_2022
To examine overall state changes without grouping by demographic characteristics:
# Overall state comparison
nc_state <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "none"
)
# Extract key findings
cat(sprintf(
"North Carolina energy burden changed from %.1f%% (2018) to %.1f%% (2022)\n",
nc_state$neb_2018 * 100,
nc_state$neb_2022 * 100
))
cat(sprintf(
"Change: %+.2f percentage points\n",
nc_state$neb_change_pp * 100
))
Disaggregating by income bracket reveals which populations experienced the largest changes:
# Compare by income bracket
nc_income <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "income_bracket"
)
# Visualize changes
library(ggplot2)
ggplot(nc_income, aes(x = income_bracket, y = neb_change_pp * 100)) +
geom_col(fill = "steelblue") +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
title = "Change in Energy Burden by Income Bracket",
subtitle = "North Carolina, 2018 to 2022",
x = "Income Bracket (% of Area Median Income)",
y = "Change in Energy Burden (percentage points)"
) +
theme_minimal()
Typical findings show that very low-income households (0-30% AMI) experience the highest energy burdens and are most vulnerable to changes in energy costs or income levels.
Comparing multiple states reveals regional patterns and policy impacts:
# Compare Southern states
southern_states <- compare_energy_burden(
dataset = "ami",
states = c("NC", "SC", "GA", "FL"),
group_by = "state"
)
# Which states improved most?
southern_states %>%
arrange(neb_change_pp) %>%
select(state_abbr, neb_2018, neb_2022, neb_change_pp)
# Visualize state comparison
ggplot(southern_states, aes(x = reorder(state_abbr, neb_2022),
y = neb_2022 * 100)) +
geom_col(fill = "darkgreen") +
geom_point(aes(y = neb_2018 * 100), color = "red", size = 3) +
labs(
title = "Energy Burden by State: 2022 (bars) vs 2018 (points)",
x = "State",
y = "Energy Burden (%)"
) +
theme_minimal()
Energy burden often varies significantly between renters and homeowners:
# Compare by housing tenure
nc_tenure <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "housing_tenure"
)
# Calculate the renter-owner gap
gap_2018 <- nc_tenure$neb_2018[nc_tenure$housing_tenure == "RENTER"] -
nc_tenure$neb_2018[nc_tenure$housing_tenure == "OWNER"]
gap_2022 <- nc_tenure$neb_2022[nc_tenure$housing_tenure == "RENTER"] -
nc_tenure$neb_2022[nc_tenure$housing_tenure == "OWNER"]
cat(sprintf(
"Renter-Owner energy burden gap: %.2f pp (2018) → %.2f pp (2022)\n",
gap_2018 * 100,
gap_2022 * 100
))
Renters typically face higher energy burdens due to split-incentive problems where landlords make efficiency investment decisions but tenants pay energy bills.
For policy applications targeting households below the federal poverty line:
# Use FPL dataset instead of AMI
nc_fpl <- compare_energy_burden(
dataset = "fpl",
states = "NC",
group_by = "income_bracket"
)
# Compare poverty vs non-poverty households
nc_fpl %>%
filter(income_bracket %in% c("Below Federal Poverty Line",
"Above Federal Poverty Line")) %>%
select(income_bracket, neb_2018, neb_2022, neb_change_pp)
This analysis is particularly relevant for programs like the Low-Income Home Energy Assistance Program (LIHEAP) which target households below specific poverty thresholds.
For fine-grained spatial analysis, load tract-level data directly:
# Load 2022 census tract data
nc_tracts_2022 <- load_census_tract_data(
states = "NC",
vintage = "2022"
)
# Calculate county-level statistics
nc_counties <- calculate_weighted_metrics(
nc_tracts_2022,
group_columns = "county_name",
metric_name = "ner"
)
# Identify counties with highest energy burden
nc_counties %>%
mutate(energy_burden = 1 / (ner + 1)) %>%
arrange(desc(energy_burden)) %>%
head(10) %>%
select(county_name, energy_burden, household_count)
Census tract data enables spatial analysis and mapping applications, revealing urban-rural disparities and identifying communities in need of targeted assistance.
The ability to track energy burden changes over time has important policy implications. Programs like LIHEAP (Low-Income Home Energy Assistance Program) and WAP (Weatherization Assistance Program) target households experiencing energy insecurity, but evaluating their effectiveness requires robust temporal analysis.
The emburden package enables researchers and
policymakers to:
A persistent challenge in energy equity is the split-incentive problem: landlords make energy efficiency investment decisions, but tenants pay the energy bills. This misalignment of incentives leads to underinvestment in efficiency improvements for rental properties.
The package’s ability to analyze energy burden by housing tenure reveals the magnitude of this problem:
# Quantify the renter-owner gap
tenure_comparison <- compare_energy_burden(
dataset = "ami",
states = "all", # National analysis
group_by = "housing_tenure"
)
# Calculate disparity
renter_burden <- tenure_comparison$neb_2022[
tenure_comparison$housing_tenure == "RENTER"
]
owner_burden <- tenure_comparison$neb_2022[
tenure_comparison$housing_tenure == "OWNER"
]
disparity_ratio <- renter_burden / owner_burden
Addressing this gap requires policy interventions such as:
Users should be aware of several data limitations:
The LEAD Tool uses IPF to allocate households to census tracts, which has important implications:
Household income as reported in the ACS has known limitations:
The “primary heating fuel” categorization derives from a single ACS question and may not fully capture:
Despite these limitations, the LEAD Tool represents the most comprehensive spatial dataset available for energy burden analysis in the United States.
Several extensions would enhance the package’s capabilities:
As DOE releases new LEAD Tool vintages (potentially 2024, 2026, etc.), the package can incorporate them to enable longer-term trend analysis. This would support:
The package currently implements Net Energy Return, EROI, and DEAR. Future versions could add:
Geographic extensions could include:
Methodological extensions for policy evaluation:
Several tools exist for energy burden analysis, each with different strengths:
emburden: Focused on temporal analysis
with proper aggregation methodologyThe emburden package fills a gap by providing
programmatic access to multiple vintages with automated schema
normalization, enabling reproducible temporal analyses at scale.
The emburden package provides a robust framework for
temporal analysis of household energy burden using proper Net Energy
Return methodology. By automating data access, normalizing schema
differences, and implementing correct aggregation methods, the package
enables researchers and policymakers to track energy affordability
trends across multiple scales.
Key contributions include:
The package is available from GitHub at and is licensed under AGPL-3+. Documentation, vignettes, and issue tracking are available through the package website.