--- title: "Basic Regressions with mverse" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Basic Regressions with mverse} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} references: - id: multiverse title: "multiverse: R package for creating explorable multiverse analysis" type: entry URL: https://mucollective.github.io/multiverse/ accessed: year: 2020 author: - given: Abhraneel family: Sarma - given: Matthew family: Kay - id: boston title: Hedonic prices and the demand for clean air author: - family: Harrison Jr given: David - family: Rubinfeld given: Daniel L container-title: Journal of environmental economics and management type: article-journal volume: 5 number: 1 pages: 81-102 issued: year: 1978 publisher: Elsevier --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, fig.width = 7 ) Sys.setenv(LANG = "en") ``` This vignette describes the workflow of linear regression modeling in the multiverse with the following functions: * `formula_branch()`, `add_formula_branch`: create branches for regression formulas and add them to a `mverse` object. * `lm_mverse()`: fit a simple linear model with the given formula branches and family branches. * `summary()`: provide a summary of the fitted models in different branches. * `spec_curve()`: display the specification curve of a model. ```{r load, warning=FALSE, message=FALSE} library(mverse) ``` We will use the Boston housing dataset {@boston} as an example. This dataset has 506 observations on 14 variables. This dataset is extensively used in regression analyses and algorithm benchmarks. The objective is to predict the median value of a home (`medv`) with the feature variables. ```{r} dplyr::glimpse(MASS::Boston) # using kable for displaying data in html ``` ## Simple Linear Regression with `mverse` In order to perform a linear regression in the multiverse, we create a formula branch with all the models we wish to explore, add it the `mverse` object, and execute `lm` on each universe by calling `lm_mverse`. Create a multiverse with `mverse`. ```{r lm} mv <- create_multiverse(MASS::Boston) ``` We can explore models of the median value of home prices `medv` on different combinations of the following explanatory variables: proportion of adults without some high school education and proportion of male workers classified as laborers (`lstat`), average number of rooms per dwelling (`rm`), per capita crime rate (`crim`), and property tax (`tax`). Create the models with `formula_branch()` ```{r} formulas <- formula_branch(medv ~ log(lstat) * rm, medv ~ log(lstat) * tax, medv ~ log(lstat) * tax * rm) ``` Add the models to the multiverse `mv`. ```{r} mv <- mv |> add_formula_branch(formulas) ``` Fit `lm()` across `mv` using `lm_mverse()`. ```{r} lm_mverse(mv) ``` By default, `summary` will give the estimates of parameters for each model. You can also output other information by changing the `output` parameter. ```{r summary_lm} summary(mv) ``` Changing `output` to `df` yields the degrees of freedom table. ```{r} summary(mv, output = "df") ``` Other options include F (`output = "f"`) statistics ```{r} summary(mv, output = "f") ``` and $R^2$ (`output = "r"`). ```{r} # output R-squared by `r.squared` or "r" summary(mv, output = "r") ``` Finally, we can display how the effect of number of rooms in a dwelling `log(lstat)` using `spec_curve`. ```{r fig.height=5} spec_summary(mv, var = "log(lstat)") |> spec_curve(label = "code") + ggplot2::labs("Significant at 0.05") ```