--- title: "Chronic Disease Risk Factors from VIGITEL with healthbR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Chronic Disease Risk Factors from VIGITEL with healthbR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Overview **VIGITEL (Vigilancia de Fatores de Risco e Protecao para Doencas Cronicas por Inquerito Telefonico)** is an annual telephone survey conducted by the Brazilian Ministry of Health since 2006. It monitors risk and protective factors for chronic non-communicable diseases among adults (18+) in all 26 state capitals and the Federal District. | Topic | Examples | |-------|----------| | **Tobacco** | Smoking prevalence, cessation | | **Alcohol** | Consumption patterns, binge drinking | | **Diet** | Fruit/vegetable intake, ultra-processed foods | | **Physical activity** | Leisure, commuting, sedentary behavior | | **Chronic diseases** | Diabetes, hypertension, obesity self-report | | **Preventive exams** | Mammography, Pap smear, colonoscopy | Each annual edition interviews approximately 54,000 adults via landline telephone, with post-stratification weighting (`pesorake`) to match the adult population of each city. ## Getting started ```{r setup} library(healthbR) library(dplyr) ``` ### Check available years ```{r} vigitel_years() #> [1] 2006 2007 2008 ... 2023 2024 ``` ### Survey information ```{r} vigitel_info() ``` ## Downloading data ### All years at once VIGITEL is distributed as a single consolidated file covering 2006--2024. By default, all years are downloaded: ```{r} df <- vigitel_data() ``` ### Specific years ```{r} df <- vigitel_data(year = 2020:2024) ``` ### Select variables ```{r} df <- vigitel_data(year = 2024, vars = c("cidade", "sexo", "idade", "pesorake", "q6", "q7", "q9")) ``` ### Data format Two formats are available: Stata (`.dta`, default) and CSV. The Stata format preserves variable labels: ```{r} df_dta <- vigitel_data(format = "dta") # default, with labels df_csv <- vigitel_data(format = "csv") # alternative ``` ## Exploring variables ### Data dictionary ```{r} vigitel_dictionary() ``` ### Search variables ```{r} vigitel_variables() ``` ## Example: Smoking prevalence over time ```{r} # Download smoking-related variables df <- vigitel_data( year = 2006:2024, vars = c("ano", "cidade", "sexo", "pesorake", "q6") ) # q6: "Atualmente, o(a) sr(a) fuma?" (1 = sim, 2 = nao) smoking <- df |> filter(q6 %in% c("1", "2")) |> group_by(ano) |> summarise( smokers = sum(pesorake[q6 == "1"], na.rm = TRUE), total = sum(pesorake, na.rm = TRUE), prevalence = smokers / total * 100 ) ``` ## Example: Obesity by capital city ```{r} df <- vigitel_data( year = 2024, vars = c("cidade", "sexo", "pesorake", "q8", "q9") ) # q8 = weight (kg), q9 = height (cm) # BMI >= 30 = obesity obesity <- df |> filter(!is.na(q8), !is.na(q9), q9 > 0) |> mutate( bmi = as.numeric(q8) / (as.numeric(q9) / 100)^2, obese = bmi >= 30 ) |> group_by(cidade) |> summarise( prevalence = weighted.mean(obese, as.numeric(pesorake), na.rm = TRUE) * 100 ) |> arrange(desc(prevalence)) ``` ## Cache and performance Data is automatically cached in partitioned parquet format (when `arrow` is installed). Subsequent calls load instantly from cache: ```{r} # First call downloads (~30 seconds) df <- vigitel_data(year = 2024) # Second call loads from cache (instant) df <- vigitel_data(year = 2024) # Check cache status vigitel_cache_status() # Clear cache if needed vigitel_clear_cache() ``` ### Lazy evaluation For large analyses, use lazy evaluation to query without loading all data into memory: ```{r} lazy_df <- vigitel_data(lazy = TRUE, backend = "arrow") ``` ## Further reading - VIGITEL official page (`svs.aids.gov.br/daent/cgdnt/vigitel`)