The SI-PNI (Sistema de Informacao do Programa Nacional de Imunizacoes) is Brazil’s national immunization information system, managed by the Ministry of Health. It tracks vaccination doses applied and coverage rates across the country.
The healthbR package provides access to SI-PNI data from
two sources:
| Source | Years | Data type | Granularity | Format |
|---|---|---|---|---|
| FTP DATASUS | 1994–2019 | Aggregated counts | Annual per UF | .DBF files |
| OpenDataSUS CSV | 2020–2025 | Individual-level microdata | Monthly national | CSV bulk downloads |
sipni_data() automatically routes to the correct source
based on the requested year.
| Feature | FTP (1994–2019) | CSV (2020–2025) |
|---|---|---|
| Record type | Aggregated (dose counts per municipality/vaccine/age) | Individual (one row per vaccination dose) |
| File types | DPNI (doses) or CPNI (coverage) | Single type (microdata) |
| Variables | 7–12 per type | ~47 per record |
| File size | Small (~100 KB per UF/year) | Large (~1.4 GB ZIP per month, national) |
| Naming | UPPERCASE column names | snake_case column names |
The default type downloads aggregated dose counts (1994–2019):
| Variable | Description |
|---|---|
| ANO | Reference year |
| UF | UF code (IBGE 2 digits) |
| MUNIC | Municipality code (IBGE 6 digits) |
| IMUNO | Immunobiological code |
| DOSE | Dose type (1st, 2nd, booster, etc.) |
| QT_DOSE | Number of doses applied |
| FX_ETARIA | Age group (coded) |
The CPNI type provides coverage rates per municipality:
# vaccination coverage in Acre, 2019
ac_coverage <- sipni_data(year = 2019, type = "CPNI", uf = "AC")
ac_coverage| Variable | Description |
|---|---|
| ANO | Reference year |
| UF | UF code (IBGE 2 digits) |
| MUNIC | Municipality code (IBGE 6 digits) |
| IMUNO | Immunobiological code |
| QT_DOSE | Number of doses applied |
| POP | Target population |
| COBERT | Vaccination coverage (%) |
For years 2020 and later, SI-PNI provides individual-level microdata
(one row per vaccination dose). The type parameter is
ignored for these years:
# microdata for Acre, January 2024
ac_micro <- sipni_data(year = 2024, uf = "AC", month = 1)
ac_micro| Variable | Description |
|---|---|
| sigla_uf_estabelecimento | UF of the health facility |
| codigo_municipio_estabelecimento | Municipality (IBGE) |
| tipo_sexo_paciente | Sex (M/F) |
| numero_idade_paciente | Patient age |
| nome_raca_cor_paciente | Race/color (descriptive) |
| descricao_vacina | Vaccine name |
| descricao_dose_vacina | Dose description |
| data_vacina | Vaccination date |
For years >= 2020, each month is a separate ~1.4 GB national CSV
file. Use month to select specific months:
# single month
jan <- sipni_data(year = 2024, uf = "AC", month = 1)
# first quarter
q1 <- sipni_data(year = 2024, uf = "AC", month = 1:3)
# all 12 months (default, downloads ~17 GB total)
full_year <- sipni_data(year = 2024, uf = "AC")For FTP data (1994–2019), the month parameter is ignored
because FTP files are annual.
ac_2019 <- sipni_data(year = 2019, uf = "AC")
# decode immunobiological names
imuno_labels <- sipni_dictionary("IMUNO") |>
select(code, label)
doses_by_vaccine <- ac_2019 |>
group_by(IMUNO) |>
summarize(total_doses = sum(as.integer(QT_DOSE), na.rm = TRUE),
.groups = "drop") |>
left_join(imuno_labels, by = c("IMUNO" = "code")) |>
arrange(desc(total_doses))
doses_by_vaccine# COVID-19 vaccinations in Acre, January 2024
ac_jan <- sipni_data(year = 2024, uf = "AC", month = 1)
# vaccines administered
ac_jan |>
count(descricao_vacina, sort = TRUE)
# doses by sex
ac_jan |>
count(tipo_sexo_paciente)
# age distribution
ac_jan |>
mutate(age = as.integer(numero_idade_paciente)) |>
filter(!is.na(age)) |>
mutate(age_group = cut(age,
breaks = c(0, 5, 12, 18, 30, 60, Inf),
right = FALSE)) |>
count(age_group)When requesting years that span both sources (e.g., 2019 and 2024),
sipni_data() fetches from FTP and CSV respectively and
combines the results. Note that column names and structure differ
between sources:
Downloaded data is cached locally for faster future access:
If the arrow package is installed, data is cached in
Parquet format. You can also use lazy evaluation:
dadosabertos.saude.gov.br)