The healthbR package provides access to eight DATASUS
information systems, covering mortality, live births, hospital
admissions, outpatient production, notifiable diseases, the health
facility registry, vaccination data, and primary care coverage:
| Module | Function | Source document | Granularity | Years |
|---|---|---|---|---|
| SIM | sim_data() |
Declaracao de Obito (DO) | Annual/UF | 1996–2024 |
| SINASC | sinasc_data() |
Declaracao de Nascido Vivo (DN) | Annual/UF | 1996–2024 |
| SIH | sih_data() |
AIH (Autorizacao de Internacao Hospitalar) | Monthly/UF | 2008–2024 |
| SIA | sia_data() |
BPA / APAC | Monthly/type/UF | 2008–2024 |
| SINAN | sinan_data() |
Ficha de Notificacao | Annual/National | 2007–2024 |
| CNES | cnes_data() |
Cadastro de Estabelecimentos | Monthly/type/UF | 2005–2024 |
| SI-PNI | sipni_data() |
PNI (doses, cobertura, microdados) | Annual/UF | 1994–2025 |
| SISAB | sisab_data() |
Cobertura da Atencao Primaria | Monthly | 2007–present |
All seven modules share the same infrastructure:
ftp.datasus.gov.br with automatic retry and exponential
backoff.arrow is installed) or .rds format.*_years(), *_info(),
*_variables(), *_dictionary(),
*_data(), *_cache_status(), and
*_clear_cache().Each module provides the same set of helper functions. Here is a quick tour using SIM as an example:
# available years
sim_years()
#> [1] 1996 1997 1998 ... 2023
# module information (data source, key variables, usage tips)
sim_info()
# list all variables with descriptions
sim_variables()
# search for a specific variable
sim_variables(search = "causa")
# data dictionary with category labels
sim_dictionary("SEXO")The same pattern works for sinasc_*(),
sih_*(), sia_*(), sinan_*(),
cnes_*(), and sipni_*().
The SIM (Sistema de Informacoes sobre Mortalidade) contains individual death records based on the Declaracao de Obito (DO).
The cause parameter filters by underlying cause of death
(CAUSABAS) using CID-10 prefix matching:
| Variable | Description |
|---|---|
CAUSABAS |
Underlying cause of death (CID-10) |
DTOBITO |
Date of death |
SEXO |
Sex (M = Male, F = Female, I = Unknown) |
IDADE |
Age (encoded: 1st digit = unit, digits 2-3 = value) |
CODMUNRES |
Municipality of residence (IBGE code) |
The SINASC (Sistema de Informacoes sobre Nascidos Vivos) contains individual live birth records from the Declaracao de Nascido Vivo (DN).
The anomaly parameter filters by the CODANOMAL variable
using CID-10 prefix matching:
| Variable | Description |
|---|---|
DTNASC |
Date of birth |
SEXO |
Sex (1 = Male, 2 = Female, 0 = Unknown) |
PESO |
Birth weight (grams) |
IDADEMAE |
Mother’s age |
CODMUNRES |
Municipality of residence (IBGE code) |
CODANOMAL |
Congenital anomaly code (CID-10) |
nasc_ac <- sinasc_data(year = 2022, uf = "AC")
nasc_ac |>
mutate(peso_num = as.numeric(PESO)) |>
filter(!is.na(peso_num), peso_num > 0) |>
mutate(weight_group = case_when(
peso_num < 1500 ~ "Very low (<1500g)",
peso_num < 2500 ~ "Low (1500-2499g)",
peso_num < 4000 ~ "Normal (2500-3999g)",
TRUE ~ "High (>=4000g)"
)) |>
count(weight_group)The SIH (Sistema de Informacoes Hospitalares) contains individual hospital admission records from the AIH (Autorizacao de Internacao Hospitalar). Unlike SIM and SINASC, data is organized monthly.
month parameterSIH data is monthly – one file per UF per month. Use
month to control which months to download:
The diagnosis parameter filters by the principal
diagnosis (DIAG_PRINC) using CID-10 prefix matching:
| Variable | Description |
|---|---|
DIAG_PRINC |
Principal diagnosis (CID-10) |
DT_INTER |
Admission date |
SEXO |
Sex (1 = Male, 3 = Female, 0 = Unknown) |
MORTE |
In-hospital death (1 = Yes, 0 = No) |
VAL_TOT |
Total value (R$) |
DIAS_PERM |
Length of stay (days) |
The SIA (Sistema de Informacoes Ambulatoriais) contains outpatient production records. Like SIH, data is monthly, but SIA also has 13 file types covering different categories of outpatient care.
| Code | Name | Description |
|---|---|---|
| PA | Producao Ambulatorial | BPA consolidated (default) |
| BI | Boletim Individualizado | BPA individualized |
| AD | APAC Laudos Diversos | High-complexity authorizations |
| AM | APAC Medicamentos | High-cost medications |
| AN | APAC Nefrologia | Nephrology procedures |
| AQ | APAC Quimioterapia | Oncology chemotherapy |
| AR | APAC Radioterapia | Oncology radiotherapy |
| AB | APAC Cirurgia Bariatrica | Bariatric surgery |
| ACF | APAC Confeccao de Fistula | Arteriovenous fistula |
| ATD | APAC Tratamento Dialitico | Dialysis |
| AMP | APAC Acompanhamento Multiprofissional | Multiprofessional follow-up |
| SAD | RAAS Atencao Domiciliar | Home care services |
| PS | RAAS Psicossocial | CAPS and psychosocial services |
| Variable | Description |
|---|---|
PA_PROC_ID |
Procedure code (SIGTAP) |
PA_CIDPRI |
Principal diagnosis (CID-10) |
PA_SEXO |
Sex (1 = Male, 2 = Female) |
PA_IDADE |
Patient age |
PA_VALAPR |
Approved value (R$) |
PA_QTDAPR |
Approved quantity |
The SINAN (Sistema de Informacao de Agravos de Notificacao) contains individual notification records for 31 compulsorily notifiable diseases. Unlike other DATASUS modules, SINAN files are national (one file per disease per year, covering all of Brazil).
SINAN covers 31 diseases. Use sinan_diseases() to see
all available codes:
# dengue notifications, 2022 (default disease)
dengue <- sinan_data(year = 2022)
# tuberculosis notifications, 2020-2022
tb <- sinan_data(year = 2020:2022, disease = "TUBE")
# select specific variables
sinan_data(year = 2022, disease = "DENG",
vars = c("DT_NOTIFIC", "CS_SEXO", "NU_IDADE_N",
"ID_MUNICIP", "CLASSI_FIN"))Since files are national, filter by UF after download:
| Variable | Description |
|---|---|
DT_NOTIFIC |
Notification date |
ID_AGRAVO |
Disease code (CID-10) |
CS_SEXO |
Sex (M = Male, F = Female, I = Unknown) |
NU_IDADE_N |
Age (encoded: 1st digit = unit, digits 2-4 = value) |
ID_MUNICIP |
Municipality of notification (IBGE code) |
CLASSI_FIN |
Final classification (1 = Confirmed, 2 = Discarded) |
EVOLUCAO |
Outcome (1 = Cure, 2 = Death from disease) |
The CNES (Cadastro Nacional de Estabelecimentos de Saude) is the national registry of all health facilities in Brazil. Like SIH and SIA, data is organized monthly (one file per type/UF/month), and there are 13 file types covering different aspects of the registry.
| Code | Name | Description |
|---|---|---|
| ST | Estabelecimentos | Facility registry (default) |
| LT | Leitos | Hospital beds |
| PF | Profissional | Health professionals |
| DC | Dados Complementares | Complementary facility data |
| EQ | Equipamentos | Health equipment |
| SR | Servico Especializado | Specialized services |
| HB | Habilitacao | Facility certifications |
| EP | Equipes | Health teams |
| RC | Regra Contratual | Contractual rules |
| IN | Incentivos | Financial incentives |
| EE | Estab. de Ensino | Teaching facilities |
| EF | Estab. Filantropico | Philanthropic facilities |
| GM | Gestao e Metas | Management and targets |
| Variable | Description |
|---|---|
CNES |
Facility CNES code |
CODUFMUN |
Municipality (UF + IBGE 6-digit code) |
TP_UNID |
Facility type (22 categories) |
VINC_SUS |
SUS-linked (0 = No, 1 = Yes) |
TP_GESTAO |
Management type (M = Municipal, E = State, D = Dual) |
ESFERA_A |
Administrative sphere (1-4) |
The SI-PNI (Sistema de Informacao do Programa Nacional de Imunizacoes) provides vaccination data from two sources:
sipni_data() transparently routes to the correct source
based on the requested year.
| Code | Name | Description |
|---|---|---|
| DPNI | Doses Aplicadas | Doses applied per municipality, age group, vaccine, and dose type (FTP, default) |
| CPNI | Cobertura Vacinal | Vaccination coverage per municipality and vaccine (FTP) |
| API | Microdados | Individual-level microdata via OpenDataSUS (2020+, automatic) |
# FTP: doses applied in Acre, 2019 (default type = "DPNI")
doses_ac <- sipni_data(year = 2019, uf = "AC")
doses_ac
# FTP: vaccination coverage
cob_ac <- sipni_data(year = 2019, type = "CPNI", uf = "AC")
# API: individual-level microdata, Acre, January 2024
micro_ac <- sipni_data(year = 2024, uf = "AC", month = 1)
micro_ac| Variable | Description |
|---|---|
IMUNO |
Vaccine code (immunobiological) |
QT_DOSE |
Number of doses applied |
DOSE |
Dose type (1st, 2nd, booster, etc.) |
FX_ETARIA |
Age group (coded) |
MUNIC |
Municipality (IBGE 6-digit code) |
ANOMES |
Year and month (YYYYMM) |
| Variable | Description |
|---|---|
IMUNO |
Vaccine code |
QT_DOSE |
Number of doses applied |
POP |
Target population |
COBERT |
Vaccination coverage (%) |
MUNIC |
Municipality (IBGE 6-digit code) |
A key strength of healthbR is the ability to combine
data from different DATASUS modules and Census denominators in a single
workflow. Below are three practical examples.
Calculate the crude cardiovascular mortality rate per 100,000 population:
# step 1: count cardiovascular deaths in Sao Paulo, 2022
obitos_cardio <- sim_data(year = 2022, uf = "SP", cause = "I")
n_obitos <- nrow(obitos_cardio)
# step 2: get population denominator from Census 2022
pop_sp <- censo_populacao(year = 2022, territorial_level = "state") |>
filter(grepl("Paulo", territorial_unit))
# step 3: calculate rate
taxa_mortalidade <- n_obitos / pop_sp$population * 100000
taxa_mortalidadeCompare the number of live births and deaths in a state:
Compare volumes and costs of respiratory care (CID-10 chapter J) between hospital and outpatient settings:
# hospital admissions for respiratory diseases, January 2022
intern_resp <- sih_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")
# outpatient production for respiratory diseases, January 2022
ambul_resp <- sia_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")
# compare volumes
n_internacoes <- nrow(intern_resp)
n_ambulatorial <- nrow(ambul_resp)
# compare costs
custo_intern <- sum(as.numeric(intern_resp$VAL_TOT), na.rm = TRUE)
custo_ambul <- sum(as.numeric(ambul_resp$PA_VALAPR), na.rm = TRUE)
tibble::tibble(
setting = c("Hospital (SIH)", "Outpatient (SIA)"),
records = c(n_internacoes, n_ambulatorial),
total_cost_brl = c(custo_intern, custo_ambul)
)All DATASUS modules cache downloaded data automatically. When the
arrow package is installed, data is saved in Parquet format
(fast and compact); otherwise, .rds is used as fallback.
Each module provides *_cache_status() and
*_clear_cache():
uf to download only the states you
need instead of all 27 (SIM, SINASC, SIH, SIA, CNES).month (SIH, SIA, CNES) to limit
monthly downloads. Downloading a full year for all states requires 324
files per module (27 UFs x 12 months).vars to keep only the variables
you need, reducing memory usage.type, uf, and month.month to
limit months.datasus.saude.gov.br) – online
tabulation tool for DATASUS dataftp.datasus.gov.br) – public FTP server
with raw data fileswiki.saude.gov.br/sigtap) – procedure code
table for SUS (SIA/SIH)