---
title: "Net Energy Return Methodology"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Net Energy Return Methodology}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(emburden)
library(dplyr)
```

## Introduction

This vignette explains the mathematical foundations of the Net Energy Return (Nh) methodology for analyzing household energy burden. We'll show why this approach is both theoretically sound and computationally advantageous for aggregation across households.

## The Problem: Aggregating Ratios

Energy burden is defined as the ratio of energy spending to gross income:

**Energy Burden (EB) = S / G**

Where:
- **S** = Total energy spending (electricity + gas + other fuels)
- **G** = Gross household income

While this definition is straightforward for a single household, **aggregating energy burden across multiple households is not trivial**.

### Why Arithmetic Mean Fails

Consider three households:

```{r simple-example}
# Three households with different income/spending patterns
households <- data.frame(
  id = 1:3,
  income = c(30000, 50000, 100000),
  spending = c(3000, 3500, 4000)
)

households$eb <- energy_burden_func(households$income, households$spending)
print(households)
```

What is the "average" energy burden across these three households?

```{r wrong-aggregation}
# Attempt 1: Simple arithmetic mean (WRONG!)
mean_eb_wrong <- mean(households$eb)
print(paste("Simple mean EB:", scales::percent(mean_eb_wrong)))
```

This is incorrect because it treats all households equally, ignoring that they represent different amounts of total income and spending.

```{r weighted-wrong}
# Attempt 2: Weighted arithmetic mean (STILL WRONG!)
# Let's weight by number of households (all equal here, but principle matters)
weights <- c(100, 150, 200)  # Different household counts

eb_arithmetic_mean <- weighted.mean(households$eb, weights)
print(paste("Weighted arithmetic mean EB:", scales::percent(eb_arithmetic_mean)))
```

**Why is this still wrong?** Energy burden is a ratio, and ratios cannot be aggregated using arithmetic means. The correct approach requires the **harmonic mean**.

### The Correct Approach: Harmonic Mean

For ratios like energy burden, the mathematically correct aggregation uses the harmonic mean:

```{r harmonic-mean}
# Correct aggregation: Weighted harmonic mean
eb_harmonic <- 1 / weighted.mean(1 / households$eb, weights)
print(paste("Weighted harmonic mean EB:", scales::percent(eb_harmonic)))

# Verify by calculating from totals
total_spending <- sum(households$spending * weights)
total_income <- sum(households$income * weights)
eb_from_totals <- total_spending / total_income
print(paste("EB from totals:", scales::percent(eb_from_totals)))

# These should be identical
print(paste("Difference:", abs(eb_harmonic - eb_from_totals)))
```

The harmonic mean gives the same result as calculating energy burden from the aggregated totals, which is what we want!

## The Solution: Net Energy Return (Nh)

While the harmonic mean is correct, it has some practical drawbacks:
1. **Computational complexity**: Requires division by each individual EB value
2. **Numerical instability**: Very small EB values (e.g., 0.01) become very large (100) after inversion
3. **Less intuitive**: Harmonic means are less familiar to most analysts
4. **Error-prone**: Easy to accidentally use arithmetic mean instead

The Net Energy Return (Nh) transformation solves these issues.

### Mathematical Relationship

Net Energy Return is defined as:

**Nh = (G - S) / S**

This represents "net energy return per unit of energy spending" - similar to concepts in biophysical economics.

**Key insight**: Nh can be aggregated using simple arithmetic mean, then converted back to energy burden!

Let's verify the mathematical relationship:

```{r nh-identity}
# Starting from Nh = (G - S) / S
# Let's solve for EB = S / G

# Nh = (G - S) / S
# Nh = G/S - S/S
# Nh = G/S - 1
# Nh + 1 = G/S
# 1 / (Nh + 1) = S/G = EB

# Therefore: EB = 1 / (Nh + 1)

# Verify with our example
households$nh <- ner_func(households$income, households$spending)
households$eb_from_nh <- 1 / (households$nh + 1)

# Compare to original EB
households$identical <- all.equal(households$eb, households$eb_from_nh)
print(households[, c("id", "eb", "eb_from_nh", "nh")])
```

### Aggregation via Arithmetic Mean

Now we can aggregate using simple arithmetic mean:

```{r nh-aggregation}
# Step 1: Calculate Nh for each household
nh_values <- ner_func(households$income, households$spending)

# Step 2: Arithmetic weighted mean (simple!)
nh_mean <- weighted.mean(nh_values, weights)
print(paste("Weighted mean Nh:", round(nh_mean, 2)))

# Step 3: Convert back to EB
eb_from_nh <- 1 / (nh_mean + 1)
print(paste("EB from Nh method:", scales::percent(eb_from_nh)))

# Compare to harmonic mean result
print(paste("EB from harmonic mean:", scales::percent(eb_harmonic)))
print(paste("Difference:", abs(eb_from_nh - eb_harmonic)))
```

**Result**: Both methods give identical results, but the Nh method uses simple arithmetic mean instead of harmonic mean!

## Why This Works: The Mathematics

The key mathematical insight is that the Nh transformation linearizes the aggregation problem.

For a group of households with weights $w_i$, incomes $G_i$, and spending $S_i$:

**Harmonic mean approach**:
$$\text{EB}_{\text{agg}} = \frac{1}{\sum_i w_i \cdot \frac{1}{\text{EB}_i}} = \frac{1}{\sum_i w_i \cdot \frac{G_i}{S_i}}$$

**Nh approach**:
$$\text{Nh}_{\text{mean}} = \sum_i w_i \cdot \text{Nh}_i = \sum_i w_i \cdot \frac{G_i - S_i}{S_i}$$

$$\text{EB}_{\text{agg}} = \frac{1}{1 + \text{Nh}_{\text{mean}}}$$

Both formulations are mathematically equivalent when computing from the same underlying data.

## Computational Advantages (For Aggregation)

The Nh method provides several advantages **when aggregating across multiple households**:

```{r computational-comparison}
# Simulate larger dataset
set.seed(42)
n <- 10000
large_data <- data.frame(
  income = rlnorm(n, meanlog = 10.8, sdlog = 0.8),  # Log-normal income distribution
  spending = NA
)
large_data$spending <- pmin(
  rlnorm(n, meanlog = 8.2, sdlog = 0.5),  # Log-normal spending
  large_data$income * 0.5  # Cap at 50% of income
)
weights <- sample(50:500, n, replace = TRUE)

# Method 1: Nh with arithmetic mean
system.time({
  nh <- ner_func(large_data$income, large_data$spending)
  nh_mean <- weighted.mean(nh, weights)
  eb_nh <- 1 / (1 + nh_mean)
})

# Method 2: Harmonic mean
system.time({
  eb_direct <- energy_burden_func(large_data$income, large_data$spending)
  eb_harmonic <- 1 / weighted.mean(1 / eb_direct, weights)
})

# Verify results are identical
print(paste("EB via Nh:", scales::percent(eb_nh)))
print(paste("EB via harmonic mean:", scales::percent(eb_harmonic)))
print(paste("Difference:", abs(eb_nh - eb_harmonic)))
```

**Note**: The computational advantage applies specifically to **aggregation across households**. For single household calculations, both methods are equivalent (EB = S/G = 1/(1+Nh)) and require the same basic operations.

### Numerical Stability

The Nh method is also more numerically stable:

```{r numerical-stability}
# Households with very low energy burden
low_burden <- data.frame(
  income = c(200000, 500000, 1000000),
  spending = c(2000, 3000, 5000)  # Very low spending relative to income
)

low_burden$eb <- energy_burden_func(low_burden$income, low_burden$spending)
low_burden$nh <- ner_func(low_burden$income, low_burden$spending)

print("Energy Burden (direct):")
print(low_burden$eb)

print("\nReciprocal of EB (used in harmonic mean):")
print(1 / low_burden$eb)  # Very large numbers!

print("\nNet Energy Return (Nh):")
print(low_burden$nh)  # More reasonable range
```

Very low energy burdens (e.g., 0.01 = 1%) become very large values (100) when inverted for harmonic mean calculation. The Nh values remain in a more reasonable range, reducing numerical precision issues.

## Error from Incorrect Aggregation

Let's quantify the error introduced by incorrectly using arithmetic mean on EB values:

```{r error-demonstration}
# Use realistic North Carolina-like data
set.seed(123)
n_households <- 5000

realistic_data <- data.frame(
  income_bracket = sample(
    c("0-30% AMI", "30-50% AMI", "50-80% AMI", "80-100% AMI", "100%+ AMI"),
    n_households,
    replace = TRUE,
    prob = c(0.15, 0.12, 0.20, 0.10, 0.43)
  ),
  income = rlnorm(n_households, meanlog = 10.8, sdlog = 0.8),
  households = sample(10:100, n_households, replace = TRUE)
)

realistic_data$spending <- realistic_data$income * rlnorm(
  n_households,
  meanlog = log(0.05),
  sdlog = 0.6
)

# Calculate metrics
realistic_data <- realistic_data %>%
  mutate(
    eb = energy_burden_func(income, spending),
    nh = ner_func(income, spending),
    neb = neb_func(income, spending)
  )

# WRONG: Arithmetic mean of EB
eb_wrong <- weighted.mean(realistic_data$eb, realistic_data$households)

# CORRECT: Via Nh
nh_mean <- weighted.mean(realistic_data$nh, realistic_data$households)
eb_correct <- 1 / (1 + nh_mean)

# Calculate error
absolute_error <- eb_wrong - eb_correct
relative_error_pct <- (absolute_error / eb_correct) * 100

cat(sprintf("WRONG (arithmetic mean EB): %.2f%%\n", eb_wrong * 100))
cat(sprintf("CORRECT (via Nh method):   %.2f%%\n", eb_correct * 100))
cat(sprintf("Absolute error:             %.4f\n", absolute_error))
cat(sprintf("Relative error:             %.2f%%\n", relative_error_pct))
```

The error from using arithmetic mean instead of proper aggregation is typically 1-5% in relative terms, which can be significant for policy analysis and equity assessments.

## Practical Workflow

Here's the recommended workflow for energy burden analysis:

```{r practical-workflow}
# Step 1: Calculate Nh for all observations
data_with_metrics <- realistic_data %>%
  mutate(
    nh = ner_func(income, spending),
    neb = neb_func(income, spending)  # Same as eb, but emphasizes proper aggregation
  )

# Step 2: Aggregate by groups using arithmetic weighted mean
by_bracket <- data_with_metrics %>%
  group_by(income_bracket) %>%
  summarise(
    total_households = sum(households),
    nh_mean = weighted.mean(nh, households),
    neb = 1 / (1 + nh_mean),  # Correct aggregation
    .groups = "drop"
  ) %>%
  arrange(desc(neb))

print(by_bracket)

# Step 3: Identify high-burden households
high_burden_threshold <- 0.06  # 6% energy burden threshold
nh_threshold <- (1 / high_burden_threshold) - 1  # = 15.67

high_burden_count <- sum(
  data_with_metrics$households[data_with_metrics$nh < nh_threshold]
)
total_households <- sum(data_with_metrics$households)

cat(sprintf("\nHouseholds with >6%% energy burden: %.1f%%\n",
            (high_burden_count / total_households) * 100))
```

## Summary

### Key Principles

1. **Energy burden is a ratio** and requires harmonic mean for proper aggregation
2. **Net Energy Return (Nh)** transformation enables arithmetic mean aggregation
3. **Both methods are mathematically equivalent** and give identical results
4. **Nh method advantages** (for aggregation across households):
   - Simpler computation (arithmetic vs harmonic mean)
   - Better numerical stability
   - More interpretable (net return per dollar of energy spending)
   - Makes improper aggregation obviously wrong

### The Correct Formula

**For aggregation across households**:

```r
# Step 1: Calculate Nh for each household/cohort
data$nh <- ner_func(data$income, data$spending)

# Step 2: Weighted arithmetic mean
nh_mean <- weighted.mean(data$nh, data$weights)

# Step 3: Convert to energy burden
eb_aggregate <- 1 / (1 + nh_mean)
```

### Common Mistakes to Avoid

**❌ NEVER do this**:
```r
# WRONG: Arithmetic mean of energy burden
eb_wrong <- weighted.mean(energy_burden_func(income, spending), weights)
```

**✅ ALWAYS do this**:
```r
# CORRECT: Nh method with arithmetic mean, then convert
nh <- ner_func(income, spending)
nh_mean <- weighted.mean(nh, weights)
eb_correct <- 1 / (1 + nh_mean)
```

## References

For a quick reference guide, see `NEB_QUICKSTART.md` in the package repository.

For practical examples with real data, see `vignette("getting-started")`.

## Mathematical Appendix

### Identity Proofs

**Proof that NEB = EB at household level**:

$$\text{Nh} = \frac{G - S}{S}$$

$$\text{NEB} = \frac{1}{1 + \text{Nh}} = \frac{1}{1 + \frac{G-S}{S}} = \frac{1}{\frac{S + G - S}{S}} = \frac{1}{\frac{G}{S}} = \frac{S}{G} = \text{EB}$$

**Proof that harmonic mean of EB equals arithmetic mean via Nh**:

For weighted aggregation with weights $w_i$:

$$\text{EB}_{\text{harmonic}} = \frac{1}{\sum_i w_i \cdot \frac{1}{\text{EB}_i}}$$

Since $\text{EB}_i = S_i / G_i$:

$$\text{EB}_{\text{harmonic}} = \frac{1}{\sum_i w_i \cdot \frac{G_i}{S_i}}$$

Now via Nh method:

$$\text{Nh}_i = \frac{G_i - S_i}{S_i} = \frac{G_i}{S_i} - 1$$

$$\overline{\text{Nh}} = \sum_i w_i \cdot \text{Nh}_i$$

$$\text{EB}_{\text{via Nh}} = \frac{1}{1 + \overline{\text{Nh}}} = \frac{1}{1 + \sum_i w_i \cdot \left(\frac{G_i}{S_i} - 1\right)}$$

$$= \frac{1}{\sum_i w_i \cdot \frac{G_i}{S_i} - \sum_i w_i + 1}$$

When weights are normalized ($\sum_i w_i = 1$), this simplifies to:

$$= \frac{1}{\sum_i w_i \cdot \frac{G_i}{S_i}} = \text{EB}_{\text{harmonic}}$$

Therefore, both methods are mathematically equivalent. ∎