The csdm package implements econometric methods for
panel data with cross-sectional dependence (CSD). In many applications,
observations across units (e.g., countries, firms, regions) are not
independent—macroeconomic shocks, trade relationships, or spillovers
create correlation across cross-sectional units. The csdm
package provides robust estimators that account for this dependence
structure, plus diagnostic tests to detect and characterize it.
This vignette demonstrates four core estimation methods and related inference tools on real panel data from the Penn World Table (PWT).
Consider a panel model with \(T\) time periods and \(N\) cross-sectional units (e.g., countries):
\[y_{it} = \alpha_i + \beta_i x_{it} + u_{it}, \quad i = 1, \ldots, N; \quad t = 1, \ldots, T\]
where: - \(y_{it}\) is the outcome variable for unit \(i\) at time \(t\) - \(\alpha_i\) is a unit-specific intercept - \(\beta_i\) is a unit-specific slope (heterogeneous across units) - \(x_{it}\) is explanatory variable(s) - \(u_{it}\) is the idiosyncratic error term
The key feature is heterogeneity in slopes (\(\beta_i\) varies by unit), which allows each unit to have its own relationship between \(x\) and \(y\). Four estimators are available to fit this model under different assumptions about cross-sectional dependence.
The Mean Group (MG) estimator fits unit-specific regressions separately and averages the results:
\[\hat{\beta}_{MG} = \frac{1}{N} \sum_{i=1}^{N} \hat{\beta}_i\]
Interpretation: The MG coefficient is the simple average of individual unit slopes. It is consistent under mild regularity conditions and allows arbitrary cross-sectional dependence in errors \(u_{it}\).
Use case: When dependence is present but you only care about average effects. MG is robust to forms of CSD that would break other methods.
The Dynamic Common Correlated Effects (DCCE) estimator extends CCE to include lagged dependent variable:
\[y_{it} = \alpha_i + \lambda_i y_{it-1} + \beta_i x_{it} + \gamma_i \bar{x}_t + \gamma_i \bar{y}_t + v_{it}\]
where \(\lambda_i\) is the unit-specific autoregressive coefficient. DCCE is ideal for dynamic panel models (e.g., when studying persistence of outcomes over time).
Interpretation: \(\lambda_i\) captures dynamic adjustment within units, \(\beta_i\) measures the long-run effect after accounting for dynamics, and \(\gamma_i\) adjusts for common factors.
Use case: When the outcome has substantial persistence (lagged effects) and cross-sectional dependence is suspected.
The CS-ARDL model extends the ARDL framework with cross-sectional augmentation:
\[\Delta y_{it} = \alpha_i + \lambda_i (y_{it-1} - \theta_i x_{it-1}) + \beta_i \Delta x_{it} + \hat{\gamma}_i \Delta \bar{x}_t + v_{it}\]
This model combines autoregressive and distributed lag dynamics. It separates short-run effects (\(\beta_i\)) from long-run cointegrating relationships (\(\theta_i\)), all while controlling for common factors.
Interpretation: - \(\theta_i\) is the long-run equilibrium relationship (cointegrating coefficient) - \(\beta_i\) is the short-run adjustment to shocks - \(\lambda_i\) governs speed of reversion to equilibrium
Use case: When studying long-run relationships in non-stationary panels with complex short-run dynamics.
The PWT_60_07 dataset contains macroeconomic indicators
for 93 countries covering 1960–2007 (48 years). Key variables
include:
id: Country identifieryear: Calendar year (1960–2007)log_rgdpo: Log real GDP per capitalog_hc: Log human capital indexlog_ck: Log capital stocklog_ngd: Log government debt (control variable)data(PWT_60_07, package = "csdm")
head(PWT_60_07, 10)
#> id year log_rgdpo log_hc log_ck log_ngd
#> 1 1 1960 7.780284 0.7042058 11.33559 NA
#> 2 1 1961 7.792448 0.7096307 11.39625 -2.714639
#> 3 1 1962 7.800655 0.7150558 11.45449 -2.719555
#> 4 1 1963 7.751311 0.7204807 11.44691 -2.723755
#> 5 1 1964 7.786854 0.7259058 11.48774 -2.727269
#> 6 1 1965 7.879184 0.7313308 11.54133 -2.730137
#> 7 1 1966 7.885357 0.7386493 11.57771 -2.737341
#> 8 1 1967 7.903803 0.7459679 11.61129 -2.744694
#> 9 1 1968 7.923327 0.7532863 11.64887 -2.745329
#> 10 1 1969 8.004197 0.7606049 11.71465 -2.739602
str(PWT_60_07)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 4464 obs. of 6 variables:
#> $ id : num 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "format.stata")= chr "%12.0g"
#> $ year : num 1960 1961 1962 1963 1964 ...
#> ..- attr(*, "label")= chr "(firstnm) year"
#> ..- attr(*, "format.stata")= chr "%9.0g"
#> $ log_rgdpo: num 7.78 7.79 7.8 7.75 7.79 ...
#> ..- attr(*, "format.stata")= chr "%9.0g"
#> $ log_hc : num 0.704 0.71 0.715 0.72 0.726 ...
#> ..- attr(*, "format.stata")= chr "%9.0g"
#> $ log_ck : num 11.3 11.4 11.5 11.4 11.5 ...
#> ..- attr(*, "format.stata")= chr "%9.0g"
#> $ log_ngd : num NA -2.71 -2.72 -2.72 -2.73 ...
#> ..- attr(*, "format.stata")= chr "%9.0g"
# For computational speed in this vignette, use a subset:
# first 15 countries, 1970-2007
first_15_ids <- unique(PWT_60_07$id)[1:15]
df <- subset(PWT_60_07, id %in% first_15_ids & year >= 1970 & year <= 2007)The panel is relatively balanced. We will use growth regressions:
modeling log-GDP (log_rgdpo) as a function of human capital
(log_hc), capital stock (log_ck), and
government debt (log_ngd), and test cross-sectional
dependence in residuals.
To install the csdm package from CRAN, run:
To install the latest development version from GitHub, run:
All models are fitted with csdm(), which automatically
detects the input structure and applies the appropriate methodology. The
key arguments are id and time to specify the
cross-sectional and time-period identifiers, and model to
choose the estimator. For CCE and DCCE, additional arguments
(csa and lr) specify treatment of
cross-sectional averages and dynamics.
# MG: Separate regression per country, then average coefficients
fit_mg <- csdm(
log_rgdpo ~ log_hc + log_ck + log_ngd,
data = df,
id = "id",
time = "year",
model = "mg"
)
print(fit_mg)
#> csdm fit (mg)
#> Formula: log_rgdpo ~ log_hc + log_ck + log_ngd
#> N: 15, T: 38
#> Estimate Std.Error
#> (Intercept) 6.5609 0.8648
#> log_hc 0.1725 0.8530
#> log_ck 0.3056 0.1359
#> log_ngd 0.7800 0.3777
summary(fit_mg)
#> csdm summary: Mean Group Model (MG)
#> Formula: log_rgdpo ~ log_hc + log_ck + log_ngd
#> N: 15, T: 38
#> Number of obs: 570
#> R-squared (mg): 0.9449
#> CD = 3.2379, p = 0.0012
#> (For additional CD diagnostics, use cd_test())
#>
#> Mean Group:
#> Coef. Std. Err. z P>|z| Signif. CI 2.5% CI 97.5%
#> (Intercept) 6.5609 0.8648 7.5867 0.0000 *** 4.8659 8.2558
#> log_hc 0.1725 0.8530 0.2022 0.8398 -1.4993 1.8443
#> log_ck 0.3056 0.1359 2.2485 0.0245 * 0.0392 0.5720
#> log_ngd 0.7800 0.3777 2.0652 0.0389 * 0.0398 1.5203
#>
#> Mean Group Variables: log_hc, log_ck, log_ngd
#> Cross Sectional Averaged Variables: none (lags=0)
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Interpretation: The MG estimate suggests that on average across countries, increases in human capital, capital stock, and changes in debt are associated with changes in real GDP. The standard errors reflect cross-country heterogeneity in these relationships.
# DCCE: Include dynamics and cross-sectional means
# Use lagged dependent variable to capture dynamic adjustment
fit_dcce <- csdm(
log_rgdpo ~ log_hc + log_ck + log_ngd,
data = df,
id = "id",
time = "year",
model = "dcce",
csa = csdm_csa(
vars = c("log_rgdpo", "log_hc", "log_ck", "log_ngd"),
lags = 3
),
lr = csdm_lr(type = "ardl", ylags = 1, xdlags = 0)
)
print(fit_dcce)
#> csdm fit (dcce)
#> Formula: log_rgdpo ~ log_hc + log_ck + log_ngd
#> N: 15, T: 38
#> Estimate Std.Error
#> (Intercept) 9.2888 8.0386
#> log_hc -1.9558 1.5659
#> log_ck 0.6666 0.2424
#> log_ngd -0.3178 1.1271
#> lag1_log_rgdpo -0.0745 0.0555
summary(fit_dcce)
#> csdm summary: Dynamic Common Correlated Error Model (DCCE)
#> Formula: log_rgdpo ~ log_hc + log_ck + log_ngd
#> N: 15, T: 38
#> Number of obs: 525
#> R-squared (mg): 0.9861
#> CD = -2.392, p = 0.0168
#> (For additional CD diagnostics, use cd_test())
#>
#> Mean Group:
#> Coef. Std. Err. z P>|z| Signif. CI 2.5% CI 97.5%
#> (Intercept) 9.2888 8.0386 1.1555 0.2479 -6.4666 25.0441
#> log_hc -1.9558 1.5659 -1.2490 0.2117 -5.0249 1.1132
#> log_ck 0.6666 0.2424 2.7499 0.0060 ** 0.1915 1.1417
#> log_ngd -0.3178 1.1271 -0.2819 0.7780 -2.5268 1.8913
#> lag1_log_rgdpo -0.0745 0.0555 -1.3424 0.1795 -0.1834 0.0343
#>
#> Mean Group Variables: log_hc, log_ck, log_ngd, lag1_log_rgdpo
#> Cross Sectional Averaged Variables: log_rgdpo, log_hc, log_ck, log_ngd (lags=3)
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Interpretation: The DCCE model includes lagged GDP to capture dynamic adjustment. The lagged coefficient typically lies between 0.8–0.95, indicating strong income persistence. The coefficients on other variables represent short-run elasticities; to compute long-run effects, divide by \((1 - \text{lag coefficient})\).
# CS-ARDL: Separate short-run and long-run dynamics
# Includes lagged dependent and lagged regressors
fit_csardl <- csdm(
log_rgdpo ~ log_hc + log_ck + log_ngd,
data = df,
id = "id",
time = "year",
model = "cs_ardl",
csa = csdm_csa(
vars = c("log_rgdpo", "log_hc", "log_ck", "log_ngd"),
lags = 3
),
lr = csdm_lr(type = "ardl", ylags = 1, xdlags = 1)
)
print(fit_csardl)
#> csdm fit (cs_ardl)
#> Formula: log_rgdpo ~ log_hc + log_ck + log_ngd
#> N: 15, T: 38
#> Estimate Std.Error
#> (Intercept) 8.1181 7.0545
#> log_hc -1.9945 4.6810
#> log_ck 0.7188 0.2091
#> log_ngd -0.2987 1.6467
#> lag1_log_rgdpo -0.1158 0.0696
#> lag1_log_hc 1.4323 4.3137
#> lag1_log_ck -0.4974 0.3607
#> lag1_log_ngd 0.4333 1.4199
summary(fit_csardl)
#> csdm summary: Cross-Sectional ARDL (CS-ARDL)
#> Formula: log_rgdpo ~ log_hc + log_ck + log_ngd
#> N: 15, T: 38
#> Number of obs: 525
#> R-squared (mg): 0.989
#>
#> CD = -1.3462, p = 0.1782
#> (For additional CD diagnostics, use cd_test())
#>
#> Short Run Est.
#> Coef. Std. Err. z P>|z| Signif. CI 2.5% CI 97.5%
#> (Intercept) 8.1181 7.0545 1.1508 0.2498 -5.7085 21.9447
#> log_hc -1.9945 4.6810 -0.4261 0.6700 -11.1691 7.1801
#> log_ck 0.7188 0.2091 3.4374 0.0006 *** 0.3089 1.1286
#> log_ngd -0.2987 1.6467 -0.1814 0.8561 -3.5261 2.9287
#> lag1_log_rgdpo -0.1158 0.0696 -1.6635 0.0962 . -0.2522 0.0206
#> lag1_log_hc 1.4323 4.3137 0.3320 0.7399 -7.0223 9.8869
#> lag1_log_ck -0.4974 0.3607 -1.3792 0.1678 -1.2043 0.2095
#> lag1_log_ngd 0.4333 1.4199 0.3052 0.7602 -2.3497 3.2163
#>
#> Adjust. Term
#> Coef. Std. Err. z P>|z| Signif. CI 2.5% CI 97.5%
#> lr_log_rgdpo -1.1158 0.0696 -16.0333 0 *** -1.2522 -0.9794
#>
#> Long Run Est.
#> Coef. Std. Err. z P>|z| Signif. CI 2.5% CI 97.5% n_used
#> lr_log_hc -0.5487 1.7292 -0.3173 0.7510 -3.9379 2.8404 15
#> lr_log_ck 0.0450 0.4807 0.0936 0.9254 -0.8971 0.9871 15
#> lr_log_ngd 0.0906 0.4461 0.2030 0.8391 -0.7838 0.9650 15
#>
#> Mean Group Variables: lag1_log_rgdpo, log_hc, log_ck, log_ngd, lag1_log_hc, lag1_log_ck, lag1_log_ngd
#> Cross Sectional Averaged Variables: log_rgdpo, log_hc, log_ck, log_ngd (lags=3)
#> Long Run Variables: log_hc, log_ck, log_ngd
#> Cointegration variable(s): log_rgdpo
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Interpretation: The CS-ARDL model returns both short-run coefficients (immediate response to shocks) and long-run coefficients (equilibrium effect after full adjustment). The long-run elasticities are often larger than short-run responses, consistent with gradual accumulation effects in capital and human capital.
After fitting a model, we can test whether residuals exhibit cross-sectional dependence using the Pesaran CD test and related variants. CSD tests detect whether residuals \(u_{it}\) are correlated across units—a key assumption violation that can bias standard errors.
All CD tests have null hypothesis: residuals are cross-sectionally independent.
The Pesaran CD statistic is:
\[CD = \sqrt{\frac{2}{N(N-1)}} \sum_{i=1}^{N-1} \sum_{j=i+1}^{N} \hat{\rho}_{ij} \sqrt{T}\]
where \(\hat{\rho}_{ij}\) is the cross-sectional correlation between residuals of units \(i\) and \(j\). The test statistic is approximately standard normal under the null.
Interpretation: Large \(|CD|\) rejects independence; both positive and negative correlations are flagged. This is the most general CD test and works even when \(N\) is fixed and \(T \to \infty\).
The CDw statistic applies unit-level random sign weights to the cross-sectional correlations:
\[CD_w = \sqrt{\frac{2}{N(N-1)}} \sum_{i=1}^{N-1} \sum_{j=i+1}^{N} w_i w_j \hat{\rho}_{ij} \sqrt{T}\]
where weights \(w_i \in \{-1,1\}\) are independent random sign flips assigned at the unit level and held fixed within a replication. This random-weighting scheme improves the behavior of the test in the presence of heteroskedasticity.
CDw+ uses the same unit-level random sign weights but applies them to a bias-adjusted version of the CD statistic:
\[CD_w^+ = \sqrt{\frac{2}{N(N-1)}} \sum_{i=1}^{N-1} \sum_{j=i+1}^{N} w_i w_j \hat{\rho}_{ij}^+ \sqrt{T}\]
where \(\hat{\rho}_{ij}^+\) denotes the adjusted cross-sectional correlation. CDw+ is designed to improve robustness in large panels with heteroskedasticity.
The CD* statistic is a semiparametric refinement for large \(N\) and \(T\):
\[CD^* = \frac{1}{\sqrt{N(N-1)}} \sum_{i=1}^{N-1} \sum_{j=i+1}^{N} (\hat{\rho}_{ij}^2 - \tau_T)\]
where \(\tau_T\) is a variance adjustment. FLY-type tests are designed for large panel dimensions and provide robustness against certain forms of weak cross-sectional dependence.
The cd_test() function accepts the fitted model and
computes all test variants. Tests use a random seed to
initialize pseudo-random computations (for cdw and
cdw+); setting a seed ensures reproducibility
of numerical results across runs.
# Test MG residuals for CSD
cd_mg <- cd_test(fit_mg, type = "CD")
print(cd_mg)
#> Cross-sectional dependence tests
#> N = 15, T = 38
#>
#> statistic p.value
#> CD 3.238 0.001
# Test CCE residuals for CSD
set.seed(1234)
cd_cce <- cd_test(fit_cce, type = "all")
print(cd_cce)
#> Cross-sectional dependence tests
#> N = 15, T = 38
#>
#> statistic p.value
#> CD -2.676 0.007
#> CDw -1.532 0.126
#> CDw+ 150.431 0.000
#> CDstar -2.835 0.005
# Test DCCE residuals for CSD
set.seed(1234)
cd_dcce <- cd_test(fit_dcce, type = "CDw")
print(cd_dcce)
#> Cross-sectional dependence tests
#> N = 15, T = 35
#>
#> statistic p.value
#> CDw 1.092 0.275
# Test CS-ARDL residuals for CSD
set.seed(1234)
cd_csardl <- cd_test(fit_csardl, type = "all")
print(cd_csardl)
#> Cross-sectional dependence tests
#> N = 15, T = 35
#>
#> statistic p.value
#> CD -1.346 0.178
#> CDw -0.111 0.911
#> CDw+ 135.834 0.000
#> CDstar -0.081 0.935Interpreting Results:
In practice, models that do not account for cross-sectional dependence (like MG without augmentation) typically show significant CD test rejections, justifying the use of CSD-robust methods like CCE and DCCE.
Chudik, A., & Pesaran, M. H. (2013). Common correlated effects estimation of heterogeneous dynamic panel data models with weakly exogenous regressors. Journal of Business and Economic Statistics, 33(2), 232–247.
Ditzen, J. (2021). Estimating dynamic common-correlated effects in Stata. The Stata Journal, 21(1), 39–59.
Fan, J., Liao, Y., & Yao, J. (2015). Power-enhanced simultaneous test for high-dimensional covariance matrix. Journal of the American Statistical Association, 110(510), 325–337.
Juodis, A., & Reese, S. (2022). The role of the N/T ratio in large N, large T panel time-series models. Econometric Reviews, 41(2), 221–261.
Pesaran, M. H. (2007). A simple unit root test in the presence of cross-section dependence. Journal of Applied Econometrics, 22(2), 265–312.
Pesaran, M. H., & Xie, Y. (2021). A bias-adjusted LM test of error cross-section independence. Econometric Reviews, 40(1), 7–24.
For further details on the theoretical foundations and
implementation of CSD-robust methods, see the documentation for
?csdm, ?cd_test, and
?summary.csdm_fit.