The SIM (Sistema de Informacoes sobre Mortalidade) is Brazil’s national mortality information system, managed by the Ministry of Health through DATASUS. It records individual death certificates (Declaracao de Obito) with cause of death coded by ICD-10.
| Feature | Details |
|---|---|
| Coverage | Per state (UF), all 27 states |
| Years | 1996–2024 (CID-10 era) |
| Unit | One row per death certificate |
| Format | .dbc files from DATASUS FTP |
Use CID-10 code prefixes to filter by cause:
The IDADE variable uses a 3-digit encoding where the
first digit indicates the unit and the remaining two indicate the
value:
| First digit | Unit | Example |
|---|---|---|
| 0 | Minutes | 005 = 5 minutes |
| 1 | Hours | 112 = 12 hours |
| 2 | Days | 215 = 15 days |
| 3 | Months | 306 = 6 months |
| 4 | Years | 445 = 45 years |
| 5 | 100+ years | 502 = 102 years |
By default, decode_age = TRUE adds an
age_years column:
| Variable | Description |
|---|---|
| CAUSABAS | Underlying cause of death (CID-10) |
| DTOBITO | Date of death |
| SEXO | Sex (1=Male, 2=Female, 0=Unknown) |
| IDADE | Age (3-digit encoded) |
| RACACOR | Race/color (1=White, 2=Black, 3=Yellow, 4=Brown, 5=Indigenous) |
| CODMUNRES | Municipality of residence (IBGE 6 digits) |
| LINHAA-D | Cause of death lines A-D from the certificate |
| ESCMAE | Mother’s education level |
| ESTCIV | Marital status |
Combine SIM data with Census population denominators:
# deaths by age group
deaths <- sim_data(year = 2022, uf = "SP") |>
filter(!is.na(age_years)) |>
mutate(age_group = cut(age_years,
breaks = c(0, 1, 5, 15, 30, 45, 60, 80, Inf),
right = FALSE
)) |>
count(age_group, name = "deaths")
# population from Census 2022
pop <- censo_populacao(year = 2022, territorial_level = "state", geo_code = "35")
# join and calculate rates per 100,000datasus.saude.gov.br)