
healthbR provides easy access to Brazilian public health data directly from R. The package downloads, caches, and processes data from official sources, returning clean, analysis-ready tibbles following tidyverse conventions.
| Module | Description | Years |
|---|---|---|
| VIGITEL | Surveillance of Risk Factors for Chronic Diseases by Telephone Survey | 2006–2024 |
| PNS | National Health Survey (microdata + SIDRA API) | 2013, 2019 |
| PNAD Continua | Continuous National Household Sample Survey | 2012–2024 |
| POF | Household Budget Survey (food security, consumption, anthropometry) | 2002–2018 |
| Censo | Population denominators via SIDRA API | 1970–2022 |
| Module | Description | Granularity | Years |
|---|---|---|---|
| SIM | Mortality Information System (deaths) | Annual/UF | 1996–2024 |
| SINASC | Live Birth Information System | Annual/UF | 1996–2024 |
| SIH | Hospital Information System (admissions) | Monthly/UF | 2008–2024 |
| SIA | Outpatient Information System (13 file types) | Monthly/type/UF | 2008–2024 |
DATASUS modules download .dbc files (compressed DBF) and
decompress them internally using vendored C code – no external
dependencies required.
You can install the development version of healthbR from GitHub:
# install.packages("pak")
pak::pak("SidneyBissoli/healthbR")library(healthbR)
# see all available data sources
list_sources()All DATASUS modules follow a consistent API: *_years(),
*_info(), *_variables(),
*_dictionary(), *_data(),
*_cache_status(), *_clear_cache().
# mortality data -- deaths in Acre, 2022
obitos <- sim_data(year = 2022, uf = "AC")
# filter by cause of death (CID-10 prefix)
obitos_cardio <- sim_data(year = 2022, uf = "AC", cause = "I")
# live births in Acre, 2022
nascimentos <- sinasc_data(year = 2022, uf = "AC")
# hospital admissions in Acre, January 2022
internacoes <- sih_data(year = 2022, month = 1, uf = "AC")
# filter by diagnosis (CID-10 prefix)
intern_resp <- sih_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")
# outpatient production in Acre, January 2022
ambulatorial <- sia_data(year = 2022, month = 1, uf = "AC")
# different file type (e.g., high-cost medications)
medicamentos <- sia_data(year = 2022, month = 1, uf = "AC", type = "AM")# VIGITEL telephone survey
vigitel <- vigitel_data(year = 2024)
# PNS national health survey
pns <- pns_data(year = 2019)
# PNAD Continua
pnadc <- pnadc_data(year = 2023, quarter = 1)
# POF household budget survey
pof <- pof_data(year = 2018, register = "morador")
# Census population
pop <- censo_populacao(year = 2022, territorial_level = "state")# list variables for any module
sim_variables()
sia_variables(search = "sexo")
# data dictionary with category labels
sim_dictionary("SEXO")
sia_dictionary("PA_RACACOR")All modules cache downloaded data automatically. Install
arrow for optimized Parquet caching:
install.packages("arrow")Each module provides cache management functions:
# check what is cached
sim_cache_status()
sih_cache_status()
sia_cache_status()
# clear cache for a module
sim_clear_cache()All data is downloaded from official Brazilian government repositories:
ftp://ftp.datasus.gov.br/dissemin/publicos/)If you use healthbR in your research, please cite it:
citation("healthbR")Contributions are welcome! Please open an issue to discuss proposed changes or submit a pull request.
Please note that the healthbR project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
MIT © Sidney da Silva Pereira Bissoli