The calculate_regression_diagnostics()
function in
REPS provides regression diagnostics by
period. It is designed for panel or repeated cross-section data
(e.g. property transactions over time) to evaluate the quality of
period-specific log-linear regressions.
For each period, it:
log(price) ~ covariates
These diagnostics help assess model quality over time, identifying periods with issues like non-normality, low fit, heteroscedasticity, or autocorrelation.
Your dataset should include:
# Example dataset (you should already have this loaded)
head(data_constraxion)
#> period price floor_area dist_trainstation neighbourhood_code
#> 1 2008Q1 1142226 127.41917 2.887992985 E
#> 2 2008Q1 667664 88.70604 2.903955192 D
#> 3 2008Q1 636207 107.26257 8.250659447 B
#> 4 2008Q1 777841 112.65725 0.005760792 E
#> 5 2008Q1 795527 108.08537 1.842145127 E
#> 6 2008Q1 539206 97.87751 6.375981360 D
#> dummy_large_city
#> 1 0
#> 2 1
#> 3 1
#> 4 0
#> 5 0
#> 6 1
# We log transform the floor_area again (see vignette on calculating price index as why)
dataset <- data_constraxion
dataset$floor_area <- log(dataset$floor_area)
calculate_regression_diagnostics()
Example:
diagnostics <- calculate_regression_diagnostics(
dataset = dataset,
period_variable = "period",
dependent_variable = "price",
numerical_variables = c("floor_area", "dist_trainstation"),
categorical_variables = c("dummy_large_city", "neighbourhood_code")
)
head(diagnostics)
#> period norm_pvalue r_adjust bp_pvalue autoc_pvalue autoc_dw
#> 1 2008Q1 0.9586930 0.8633499 0.74178260 0.5842200307 2.038772
#> 2 2008Q2 0.8191076 0.8607036 0.81813032 0.9540503936 2.274047
#> 3 2008Q3 0.4560750 0.8825515 0.15220690 0.3246547621 1.924436
#> 4 2008Q4 0.9064669 0.9098143 0.97583499 0.7436197200 2.108734
#> 5 2009Q1 0.4036003 0.8624850 0.04268543 0.4948207614 2.003177
#> 6 2009Q2 0.4644423 0.9002921 0.32760619 0.0007476682 1.487031
For convenient visualization:
This generates a 3x2 grid of plots:
Example:
The hedonic price index relies on a log-linear regression model, which assumes that certain statistical conditions hold. The diagnostics plot provides an overview of how well these assumptions are met across different periods.
Each subplot corresponds to a specific model assumption:
The calculate_regression_diagnostics()
and
plot_regression_diagnostics()
functions in
REPS enable:
They support robust, high-quality hedonic price index modeling by systematically checking regression assumptions.