--- title: "Basic usage of edfinr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Basic usage of edfinr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( message = FALSE, warning = FALSE, eval = TRUE ) ``` ## Introduction The `edfinr` package provides a simple, consistent interface for accessing comprehensive education finance data for U.S. school districts. This vignette will help you get started with the package's core functionality. ```{r setup, include = FALSE} library(edfinr) library(dplyr) library(ggplot2) ``` ```{r, eval = FALSE} library(edfinr) library(tidyverse) ``` ## Core Function: get_finance_data() The primary function in `edfinr` is `get_finance_data()`, which provides access to comprehensive school finance data from school years 2011-12 through 2021-22. The function combines data from multiple sources: - **Financial data**: Revenue and expenditure data from the National Center for Education Statistics (NCES) version of the F-33 survey. - **Enrollment**: Student counts from NCES Common Core of Data. - **Demographics**: Poverty estimates from the U.S. Census Bureau Small Area Income and Poverty Estimates (SAIPE). - **Community characteristics**: Income and education data from American Community Survey (ACS). - **Inflation adjustments**: Consumer Price Index for All Urban Consumers (CPI-U) data for constant dollar calculations. ## Basic Usage The simplest way to use `get_finance_data()` is to specify a year and state. For example, to get finance data for Kentucky school districts from the 2015-16 school year: ```{r example-1} # download "skinny" dataset for a single year and a single staet ky_sy16 <- get_finance_data(yr = 2016, geo = "KY") # view the structure of the returned data glimpse(ky_sy16) ``` ## Dataset Types: Skinny vs. Full By default, `get_finance_data()` returns a "skinny" dataset with 41 essential variables covering: - District identifiers and characteristics. - Total revenues by source (local, state, federal). - Current expenditures. - Key demographic and economic indicators. For more detailed analysis, you can request the "full" dataset with 89 variables that includes: - All skinny dataset variables. - Detailed expenditure data. - Data on spending of temporary pandemic-related federal funding. ```{r example-2} # download the full dataset with detailed expenditure data for a single year/state ky_full_sy16 <- get_finance_data(yr = "2016", geo = "KY", dataset_type = "full") # view additional variables in "full" dataset names(ky_full_sy16)[42:89] ``` ## Multiple Years and States The `get_finance_data()` function makes it easy to access data across multiple years and states: ```{r example-3} # get data for multiple states across multiple years sec_data <- get_finance_data( yr = "2018:2022", # years 2018 through 2022 geo = "AL,AR,FL,GA,KY,LA,MS,MO,OK,SC,TN,TX" # comma-separated state codes ) # get the most recent year of data for all states us_sy22 <- get_finance_data(yr = 2022, geo = "all") ``` ## Working with the Data Once you've retrieved the data, you can use standard data manipulation tools to analyze it. Here are some common analysis patterns: ### Analyze Local vs. Total Revenue Per-Pupil ```{r analysis-1} # download 2022 data for connecticut ct_sy22 <- get_finance_data(yr = "2022", geo = "CT") # plot local revenue vs. total revenue w/ urbanicity + enrollment ggplot(ct_sy22) + geom_point(aes( x = rev_local_pp, y = rev_total_pp, color = urbanicity, size = enroll), alpha = .6) + scale_size_area( max_size = 10, labels = scales::label_comma() ) + scale_x_continuous(labels = scales::label_dollar()) + scale_y_continuous(labels = scales::label_dollar()) + labs( title = "Connecticut Districts' Local vs. Total Revenue Per-Pupil, SY2021-22", x = "Local Revenue Per-Pupil", y = "Total Revenue Per-Pupil", size = "Enrollment", color = "Urbanicity") + theme_bw() ``` ### Analyzing Revenue Sources by Urbanicity ```{r analysis-2} # compare revenue sources across districts revenue_analysis <- ct_sy22 |> mutate( pct_local = rev_local / rev_total, pct_state = rev_state / rev_total, pct_federal = rev_fed / rev_total ) |> select(dist_name, urbanicity, enroll, pct_local, pct_state, pct_federal) |> group_by(urbanicity) |> summarize( avg_pct_local = mean(pct_local, na.rm = TRUE), avg_pct_state = mean(pct_state, na.rm = TRUE), avg_pct_federal = mean(pct_federal, na.rm = TRUE), n_districts = n(), enrollment = sum(enroll, na.rm = TRUE) ) print(revenue_analysis) ``` ## Additional Resources For more information about the data and methods used in this package: - Use `list_variables()` to see all available variables and their descriptions. - Use `get_states()` to see valid state codes. - See the "CPI Adjustments" vignette for information about inflation adjustments. - See the "Data Sources and Methods" vignette for detailed methodology.