---
title: "Real-world example: IPCA inflation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Real-world example: IPCA inflation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Introduction

This vignette walks through a realistic analysis using ibger: downloading
IPCA (Índice Nacional de Preços ao Consumidor Amplo) inflation data and
preparing it for visualization and reporting.

IPCA is Brazil's official consumer price index, calculated monthly by IBGE.
The data is available through aggregate **7060**.

## Step 1 — Explore the aggregate

```{r}
library(ibger)
library(dplyr)
library(tidyr)
library(ggplot2)

meta <- ibge_metadata(7060)
meta
```

Key information from the metadata:

- **Periodicity**: monthly, from 202001 onwards
- **Geographic levels**: N1 (Brazil), N6 (municipality), N7 (metropolitan area)
- **Variables**: monthly change (63), year-to-date (69), 12-month
  cumulative (2265), and monthly weight (66)
- **Classification 315**: product groups — 365 categories organized in a
  hierarchy from the general index down to individual items

```{r}
# See what variables are available
meta$variables

# Peek at the classification categories
tidyr::unnest(meta$classifications, categories) |>
  head(20)
```

## Step 2 — Monthly IPCA for Brazil

Let's get the monthly variation for the last 24 months:

```{r}
ipca_br <- ibge_variables(
  aggregate  = 7060,
  variable   = 63,
  periods    = -24,
  localities = "BR"
)

ipca_br
```

The `value` column is character because of the API's special values
(see `?parse_ibge_value`). Use `parse_ibge_value()` to convert:

```{r}
ipca_br <- ipca_br |>
  mutate(
    value = parse_ibge_value(value),
    date  = as.Date(paste0(period, "01"), format = "%Y%m%d")
  )
```

### Plot the monthly variation

```{r}
ggplot(ipca_br, aes(date, value)) +
  geom_col(fill = "#2e86c1", alpha = 0.8) +
  geom_hline(yintercept = 0, linewidth = 0.3) +
  labs(
    title    = "IPCA — Monthly variation (%)",
    subtitle = "Brazil, last 24 months",
    x = NULL, y = "Variation (%)",
    caption  = "Source: IBGE via ibger"
  ) +
  theme_minimal()
```

## Step 3 — Compare accumulation measures

Get all three variation variables at once:

```{r}
ipca_vars <- ibge_variables(
  aggregate  = 7060,
  variable   = c(63, 69, 2265),
  periods    = -12,
  localities = "BR"
)

ipca_vars <- ipca_vars |>
  mutate(
    value = parse_ibge_value(value),
    date  = as.Date(paste0(period, "01"), format = "%Y%m%d")
  )

ggplot(ipca_vars, aes(date, value, colour = variable_name)) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 1.5) +
  labs(
    title  = "IPCA — Three measures of variation",
    x = NULL, y = "%", colour = NULL,
    caption = "Source: IBGE via ibger"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

## Step 4 — Breakdown by product group

Classification 315 organizes IPCA items hierarchically. The top-level
groups (level 1 categories) are things like Food and beverages (7170),
Housing (7445), Transportation (7486), etc.

First, find the category IDs you need:

```{r}
cats <- tidyr::unnest(meta$classifications, categories)

# Level-1 groups (just below the general index)
cats |> filter(category_level == "1")
```

Now query specific groups:

```{r}
groups <- ibge_variables(
  aggregate      = 7060,
  variable       = 63,
  periods        = -12,
  localities     = "BR",
  classification = list("315" = c(7170, 7445, 7486, 7558, 7625, 7660, 7712, 7766, 7786))
)

groups <- groups |>
  mutate(
    value = parse_ibge_value(value),
    date  = as.Date(paste0(period, "01"), format = "%Y%m%d")
  )

ggplot(groups, aes(date, value, fill = classification_315)) +
  geom_col(position = "dodge", alpha = 0.85) +
  labs(
    title = "IPCA by product group — Monthly variation",
    x = NULL, y = "%", fill = NULL,
    caption = "Source: IBGE via ibger"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom", legend.text = element_text(size = 7))
```

## Step 5 — Metropolitan area comparison

Aggregate 7060 is available at level N7 (metropolitan areas). Compare
inflation across major cities:

```{r}
# Check available metropolitan areas
metros <- ibge_localities(7060, level = "N7")
metros
```

Pick a few and compare:

```{r}
ipca_metros <- ibge_variables(
  aggregate  = 7060,
  variable   = 2265,
  periods    = -12,
  localities = list(N7 = c(3501, 3301, 2901, 4101, 1501))
)

ipca_metros <- ipca_metros |>
  mutate(
    value = parse_ibge_value(value),
    date  = as.Date(paste0(period, "01"), format = "%Y%m%d")
  )

ggplot(ipca_metros, aes(date, value, colour = locality_name)) +
  geom_line(linewidth = 0.8) +
  labs(
    title   = "IPCA — 12-month cumulative by metro area",
    x = NULL, y = "%", colour = NULL,
    caption = "Source: IBGE via ibger"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

## Step 6 — Building a complete dataset

For a more complete analysis, combine multiple queries. For instance,
download the general index for all metro areas and reshape:

```{r}
all_metros <- ibge_variables(
  aggregate      = 7060,
  variable       = c(63, 69, 2265),
  periods        = -12,
  localities     = "N7",
  classification = list("315" = 7169)
)

all_metros <- all_metros |>
  mutate(value = parse_ibge_value(value)) |>
  select(variable_name, locality_name, period, value) |>
  pivot_wider(names_from = variable_name, values_from = value)

all_metros
```

## Tips for large queries

IPCA has 365 categories in classification 315. Querying all categories
for all periods and all metro areas can easily exceed the 100,000-value
limit. Strategies:

1. **Reduce periods**: use `-1` or `-3` instead of `-12`
2. **Reduce localities**: query one metro at a time
3. **Reduce categories**: pick only the groups you need
4. **Loop and bind**: query in chunks and combine with `dplyr::bind_rows()`

```{r}
# Example: query all categories for just 1 period, Brazil only
full_breakdown <- ibge_variables(
  aggregate      = 7060,
  variable       = 63,
  periods        = -1,
  localities     = "BR",
  classification = list("315" = "all")
)

nrow(full_breakdown)
```

## Handling special values

The `value` column returned by `ibge_variables()` is always character,
because the IBGE API uses special codes for certain data conditions.
Use `parse_ibge_value()` to convert to numeric in one step:

```{r}
ibge_variables(7060, localities = "BR") |>
  mutate(value = parse_ibge_value(value))
```

The function handles all IBGE conventions:

| Code  | Becomes      | Meaning                                              |
|-------|--------------|------------------------------------------------------|
| `-`   | `0`          | Numeric zero (not from rounding)                     |
| `..`  | `NA`         | Not applicable                                       |
| `...` | `NA`         | Data not available                                   |
| `X`   | `NA`         | Suppressed to protect confidentiality                |

```{r}
parse_ibge_value(c("1.5", "10", "-", "..", "...", "X"))
#> [1] 1.5  10.0  0.0   NA    NA    NA
```