--- title: "Health Data from PNAD Continua with healthbR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Health Data from PNAD Continua with healthbR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Overview The **PNAD Continua (Pesquisa Nacional por Amostra de Domicilios Continua)** is IBGE's continuous household sample survey. While its core questionnaire focuses on labor and income, it includes **supplementary health-related modules** that are applied in specific quarters. The `healthbR` package provides access to 4 health-related supplementary modules: | Module | Description | Years | |--------|-------------|-------| | `deficiencia` | Persons with disabilities | 2019, 2022, 2024 | | `habitacao` | Housing characteristics (sanitation, water) | 2012--2024 | | `moradores` | General resident characteristics | 2012--2024 | | `aps` | Primary health care access | 2022 (Q2) | ## Getting started ```{r setup} library(healthbR) library(dplyr) ``` ### Module information ```{r} pnadc_info() ``` ### Available modules and years ```{r} pnadc_modules() #> # A tibble: 4 x 4 #> module name_pt name_en years #> #> 1 deficiencia Pessoas com deficiencia Persons with disab... #> 2 habitacao Caracteristicas dos domicilios Housing character... #> 3 moradores Caracteristicas gerais dos morad... General character... #> 4 aps Atencao primaria a saude Primary health care # Years for a specific module pnadc_years("deficiencia") #> [1] 2019 2022 2024 ``` ## Downloading data ### Basic download ```{r} # Disability module, 2022 df <- pnadc_data(module = "deficiencia", year = 2022) ``` ### Select variables ```{r} df <- pnadc_data( module = "deficiencia", year = 2022, vars = c("UF", "V2007", "V2009", "V2010", "G001", "G003", "G006") ) ``` ### Multiple years ```{r} # Housing conditions across all available years df <- pnadc_data(module = "habitacao") ``` ## Exploring variables ```{r} # List variable names pnadc_variables(module = "deficiencia", year = 2022) # Full dictionary with positions and widths pnadc_dictionaries(module = "deficiencia", year = 2022) ``` ## Survey design PNAD Continua uses a complex sample design. Survey design variables (`UPA`, `Estrato`, `V1028`) are always included in the output. Use `as_survey = TRUE` to create a survey design object: ```{r} # Requires srvyr package svy <- pnadc_data( module = "deficiencia", year = 2022, as_survey = TRUE ) # Use srvyr verbs for proper variance estimation library(srvyr) svy |> group_by(UF) |> survey_mean(G001 == "1", na.rm = TRUE) # disability prevalence by state ``` ## Example: Disability prevalence ```{r} df <- pnadc_data(module = "deficiencia", year = 2022) # G001: "Tem dificuldade permanente de enxergar" (vision difficulty) # 1 = Sim, nao consegue de modo algum (cannot at all) # 2 = Sim, muita dificuldade (great difficulty) # 3 = Sim, alguma dificuldade (some difficulty) # 4 = Nao, nenhuma dificuldade (no difficulty) vision <- df |> filter(G001 %in% c("1", "2", "3", "4")) |> count(G001) |> mutate(pct = n / sum(n) * 100) ``` ## Example: Housing sanitation over time ```{r} df <- pnadc_data(module = "habitacao", year = c(2016, 2019, 2022)) # Analyze water supply and sanitation trends # Variables vary by year -- check pnadc_variables() for each edition ``` ## Example: Primary health care access ```{r} # APS module only available for 2022 Q2 df <- pnadc_data(module = "aps", year = 2022) ``` ## Cache and performance ```{r} # Check cache pnadc_cache_status() # Clear cache pnadc_clear_cache() # Lazy evaluation lazy_df <- pnadc_data( module = "deficiencia", year = 2022, lazy = TRUE, backend = "arrow" ) ``` ## Further reading - PNAD Continua on IBGE (`www.ibge.gov.br/estatisticas/sociais/trabalho/17270-pnad-continua`) - PNAD Continua microdata documentation (`ftp.ibge.gov.br`)