ibger provides a tidyverse-friendly interface to the IBGE Aggregate Data API (version 3). This is the same API that powers SIDRA — the automatic data retrieval system for all surveys and censuses conducted by the Brazilian Institute of Geography and Statistics (IBGE).
Each SIDRA table corresponds to an aggregate in the API. With ibger you can browse aggregates, inspect their metadata, and retrieve tidy data — all from R.
Use ibge_aggregates() to list every aggregate grouped by
survey. Optional filters let you narrow the search:
# All aggregates
ibge_aggregates()
#> ✔ 1420 aggregates found.
#> # A tibble: 1,420 × 4
#> survey_id survey_name aggregate_id aggregate_name
#> <chr> <chr> <chr> <chr>
#> 1 AB Abate de animais 1705 Animais abatidos …
#> 2 AB Abate de animais 1706 Peso total das ca…
#> ...
# Monthly aggregates only
ibge_aggregates(periodicity = "P5")
# Aggregates with municipality-level data
ibge_aggregates(level = "N6")Once you have an aggregate ID, ibge_metadata() tells you
everything about its structure:
The print method shows a structured summary:
── Animais abatidos ──
ID: 1705
Survey: Pesquisa Trimestral do Abate de Animais
Periodicity: trimestral (200101 to 202404)
Territorial levels: N1, N2, N3
── Variables (2) ──
284: Número de informantes (Unidades)
285: Cabeças abatidas (Cabeças)
── Classifications (1) ──
12529: Tipo de rebanho bovino (9 categories)
115236: Total [level 0]
115237: Bois [level 1]
115238: Vacas [level 1]
...
Each component is accessible directly:
meta$variables
#> # A tibble: 2 × 3
#> id name unit
#> <chr> <chr> <chr>
#> 1 284 Número de informantes Unidades
#> 2 285 Cabeças abatidas Cabeças
meta$classifications
#> # A tibble: 1 × 3
#> id name categories
#> <chr> <chr> <list>
#> 1 12529 Tipo de rebanho bovino <tibble [9 × 4]>
# Unnest to see every category
tidyr::unnest(meta$classifications, categories)
# Geographic levels
meta$territorial_level
#> $administrative
#> [1] "N1" "N2" "N3"
# Time range
meta$periodicity
#> $frequency [1] "trimestral"
#> $start [1] "200101"
#> $end [1] "202404"ibge_variables() is the main workhorse. It sends a
single request and returns a tidy tibble:
ibge_variables(1705, localities = "BR")
#> ✔ 12 records retrieved.
#> # A tibble: 12 × 9
#> variable_id variable_name variable_unit classification_12529
#> <chr> <chr> <chr> <chr>
#> 1 284 Número de inform… Unidades Total
#> 2 285 Cabeças abatidas Cabeças Total
#> ...
#> locality_id locality_name locality_level period value
#> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Brasil Brasil 202303 2584
#> 2 1 Brasil Brasil 202303 7802044
#> ...The localities parameter accepts several convenient
formats:
# Country total
ibge_variables(1705, localities = "BR")
# All states
ibge_variables(8884, localities = "N3")
# Specific states (RJ = 33, SP = 35)
ibge_variables(8884, localities = list(N3 = c(33, 35)))
# Mix levels: metropolitan areas + a specific municipality
ibge_variables(1705, localities = list(N7 = c(3501, 3301), N6 = 5208707))The geographic level codes follow the IBGE convention:
| Code | Level | Example |
|---|---|---|
N1 |
Brazil | "BR" or list(N1 = 1) |
N2 |
Major region | list(N2 = 1) — North |
N3 |
State (UF) | list(N3 = 33) — Rio de Janeiro |
N6 |
Municipality | list(N6 = 3550308) — São Paulo/SP |
N7 |
Metropolitan area | list(N7 = 3501) — RM São Paulo |
Tip: Not every aggregate is available at every level. Aggregate 1705 has data for N1, N2, and N3 but not N6. Use
ibge_metadata()to check.
Periods follow the API convention — negative values mean “last N”:
# Last 6 periods (the default)
ibge_variables(1705, periods = -6, localities = "BR")
# Last 12 periods
ibge_variables(1705, periods = -12, localities = "BR")
# Specific period codes
ibge_variables(8884, periods = c(202301, 202302, 202303), localities = "BR")
# Range (inclusive)
ibge_variables(8884, periods = "202101-202304", localities = "BR")
# Range + extra period
ibge_variables(8884, periods = "202101-202106|202301", localities = "BR")Note: Negative values cannot be mixed with specific periods. Period codes encode both the date and the periodicity —
202001could mean January 2020 (monthly), Q1 2020 (quarterly), or S1 2020 (semi-annual), depending on the aggregate.
Many aggregates break their data further by classifications (dimensions). For instance, aggregate 1712 (crop production) has a classification for the type of product (226) and another for the producer condition (218).
# Single category: pineapple (4844) from product classification (226)
ibge_variables(
aggregate = 1712,
localities = "BR",
classification = list("226" = 4844)
)
# Multiple categories
ibge_variables(
aggregate = 1712,
localities = "BR",
classification = list("226" = c(4844, 96608, 96609))
)
# Multiple classifications
ibge_variables(
aggregate = 1712,
localities = "BR",
classification = list("226" = c(4844, 96608), "218" = 4780)
)
# All categories of a classification (can be large!)
ibge_variables(
aggregate = 1712,
periods = -1,
localities = "BR",
classification = list("226" = "all")
)When no classification is specified, the API returns the Total category (ID = 0) — an aggregate across all categories.
Before sending any request, ibge_variables() and
ibge_localities() validate your parameters against the
aggregate’s metadata. If something doesn’t match, you get a clear error
with the allowed values:
# N3 (states) is not available for aggregate 1705
ibge_variables(1705, localities = "N3")
#> Error:
#> ! Geographic level(s) "N3" not available for aggregate 1705.
#> ℹ Available levels: "N1", "N6", and "N7".
# Period out of range
ibge_variables(1705, periods = 199901, localities = "BR")
#> Error:
#> ! Period(s) "199901" out of range for aggregate 1705.
#> ℹ Valid range: "201202" to "202001" (monthly).
# Non-existent variable
ibge_variables(1705, variable = 999, localities = "BR")
#> Error:
#> 355 - IPCA15 - Variação mensal (%)
#> 356 - IPCA15 - Variação acumulada no ano (%)
#> 1120 - IPCA15 - Variação acumulada em 12 meses (%)
#> 357 - IPCA15 - Peso mensal (%)Metadata is fetched once per session and cached. To force a refresh:
Skip validation entirely with validate = FALSE:
Beyond aggregate-level data, ibger also provides access to the IBGE Metadata API (v2), which catalogs IBGE’s surveys with institutional and methodological information such as status, category, collection frequency, and thematic classifications.
This is useful when you want to understand what surveys exist and how they are structured before diving into specific aggregates.
# List all 98 IBGE surveys
ibge_surveys()
#> # A tibble: 98 × 8
#> id name status category ...
#> <chr> <chr> <chr> <chr>
#> 1 AC Pesquisa Anual da Indústria da Cons… Ativa Estrutural
#> 2 AA Pesquisa Nacional de Saúde do Escol… Ativa Especial
#> ...
# Filter active monthly surveys
library(dplyr)
ibge_surveys(thematic_classifications = FALSE) |>
filter(status == "Ativa", category == "Conjuntural")
# Check which periods have metadata for the Censo Demográfico
ibge_survey_periods("CD")
#> # A tibble: 9 × 3
#> year month order
#> <int> <int> <int>
#> 1 2022 NA 0
#> 2 2010 NA 0
#> ...
# Get full institutional metadata for a specific period
meta <- ibge_survey_metadata("CD", year = 2022)
meta
#> ── CD ──
#> Status: Ativa
#> Category: Estrutural
#> ...
#> ── Metadata occurrences (1) ──
#> Use `meta$occurrences` to explore the full metadata.
# Explore methodology fields
names(meta$occurrences[[1]])Survey codes are validated before each request. If you use a wrong code, the error suggests similar alternatives:
Each request can return at most 100,000 values, computed as:
categories × periods × localities ≤ 100,000
If exceeded, the API returns HTTP 500. Split your request into smaller chunks when working with many localities or categories.
The value column may contain special characters instead
of numbers:
| Value | Meaning |
|---|---|
- |
Numeric zero (not from rounding) |
.. |
Not applicable |
... |
Data not available |
X |
Suppressed to avoid identifying individual respondents |
These come through as character strings in the value
column. Use parse_ibge_value() to convert to numeric in one
step: