The SINAN (Sistema de Informacao de Agravos de Notificacao) is Brazil’s national notifiable disease surveillance system, managed by the Ministry of Health through DATASUS. It records individual notification forms for compulsory-notification diseases.
The healthbR package provides access to SINAN microdata
from the DATASUS FTP:
| Feature | Details |
|---|---|
| Coverage | National (one file per disease per year) |
| Diseases | 31 notifiable disease codes |
| Years | 2007–2024 (final + preliminary) |
| Unit | One row per notification record |
| Format | .dbc files, decompressed internally |
SINAN covers 31 notifiable diseases. Use
sinan_diseases() to browse them:
# all available diseases
sinan_diseases()
# search by name or code
sinan_diseases(search = "dengue")
sinan_diseases(search = "sifilis")
sinan_diseases(search = "tuberculose")Common disease codes:
| Code | Disease |
|---|---|
| DENG | Dengue |
| CHIK | Chikungunya |
| ZIKA | Zika |
| TUBE | Tuberculose |
| HANS | Hanseniase |
| HEPA | Hepatites virais |
| SIFA | Sifilis adquirida |
| SIFC | Sifilis congenita |
| LEPT | Leptospirose |
| MENI | Meningite |
SINAN files are national (not per-state). To filter
by geographic unit, use the SG_UF_NOT (UF of notification)
or ID_MUNICIP (municipality code) columns after
download:
| Variable | Description |
|---|---|
| DT_NOTIFIC | Notification date |
| ID_AGRAVO | Disease code (CID-10) |
| SG_UF_NOT | UF of notification (IBGE code) |
| ID_MUNICIP | Municipality of notification (IBGE 6 digits) |
| CS_SEXO | Sex (M/F/I) |
| NU_IDADE_N | Age (encoded: 1st digit = unit, digits 2-3 = value) |
| CS_RACA | Race/color (1=White, 2=Black, 3=Yellow, 4=Brown, 5=Indigenous) |
| CLASSI_FIN | Final classification (1=Confirmed, 2=Discarded) |
| EVOLUCAO | Outcome (1=Cured, 2=Death by disease, 3=Death other causes) |
| CRITERIO | Confirmation criteria (1=Lab, 2=Clinical-epi) |
SINAN publishes both final (definitive) and preliminary data. By
default, sinan_years() returns only final years:
# final data only (default)
sinan_years(status = "final")
# preliminary data
sinan_years(status = "preliminary")
# both
sinan_years(status = "all")Preliminary data (2023–2024) may still be revised by the Ministry of Health.
tb <- sinan_data(year = 2022, disease = "TUBE")
# decode age: 4th digit means years
tb_age <- tb |>
filter(CLASSI_FIN == "1") |>
mutate(
age_unit = substr(NU_IDADE_N, 1, 1),
age_value = as.integer(substr(NU_IDADE_N, 2, 3)),
age_years = ifelse(age_unit == "4", age_value, NA_integer_),
age_group = cut(age_years,
breaks = c(0, 15, 30, 45, 60, Inf),
labels = c("<15", "15-29", "30-44", "45-59", "60+"),
right = FALSE)
)
tb_age |>
filter(!is.na(age_group)) |>
count(CS_SEXO, age_group) |>
tidyr::pivot_wider(names_from = CS_SEXO, values_from = n)Combine SINAN data with Census population to calculate incidence rates:
# step 1: confirmed dengue by UF
dengue_uf <- sinan_data(year = 2022, disease = "DENG") |>
filter(CLASSI_FIN %in% c("1", "5")) |>
count(SG_UF_NOT, name = "cases")
# step 2: population from Census 2022
pop <- censo_populacao(year = 2022, territorial_level = "state")
# step 3: calculate incidence rate per 100,000
# incidence <- dengue_uf |>
# left_join(pop, by = ...) |>
# mutate(rate_100k = (cases / population) * 100000) |>
# arrange(desc(rate_100k))By default, sinan_data() parses columns to appropriate
types (dates, integers):
# parsed types (default)
dengue <- sinan_data(year = 2022, disease = "DENG")
class(dengue$DT_NOTIFIC) # Date
class(dengue$NU_ANO) # integer
# raw character columns (backward-compatible)
dengue_raw <- sinan_data(year = 2022, disease = "DENG", parse = FALSE)
# override specific columns
dengue_custom <- sinan_data(
year = 2022,
col_types = list(DT_NOTIFIC = "character")
)Downloaded data is cached locally for faster future access:
If the arrow package is installed, data is cached in
Parquet format for faster loading. You can also use lazy evaluation:
portalsinan.saude.gov.br)