Tidy R access to the Anthropic Economic Index dataset.
In early 2025 Anthropic published a new kind of economic dataset. Until then, almost everything we knew about how AI was actually being used in the economy came from one of two places: surveys (asking people what they used AI for) or indirect proxies (working out which jobs theoretically had the most AI-exposed tasks, and assuming usage followed exposure). The Anthropic Economic Index (AEI) took a more direct approach. Anthropic took millions of real Claude conversations, classified each one against the U.S. Department of Labor’s O*NET task taxonomy and its Standard Occupational Classification system, and reported the resulting usage shares as open data. The methodology is documented in Handa et al. (2025). The privacy-preserving system that does the classification, in which Claude itself summarises other Claude conversations under tight rules, is described in Tamkin et al. (2024).
The dataset is published on Hugging Face as a recurring set of snapshots, all released under CC-BY-4.0. Five releases have shipped between February 2025 and March 2026, covering Claude 3.5 Sonnet through Opus 4.5/4.6. Each release reports, for the time window it covers, the share of classified conversations that mapped to each O*NET task and each SOC occupational group, plus a split between automation-style interactions (where the user delegates a task to Claude) and augmentation-style interactions (where the user works through a task with Claude). From the September 2025 release onwards, the data is also broken down by country and US state, and a hierarchical clustering of request types produced by Clio is shipped as JSON. From the January 2026 release onwards, a set of derived “economic primitives” sit alongside the raw shares.
The data is open, the methodology is documented, and Anthropic ships replication notebooks alongside each release. But for anyone working in R, the dataset was inconvenient to use:
data/output/ layout. Code written for one
release would silently break on the next.aieconindex is the R-side answer to those gaps. It lists
releases, fetches the raw and enriched usage tables, retrieves task
statements and request hierarchies, exposes country and US-state slices
through a single function, caches downloads locally, and produces
ready-made citations that include the methodological source paper by
default. Schema differences across releases are handled internally;
pinning a release id keeps downstream pipelines reproducible. No API key
is required. Three runtime dependencies (cli,
httr2, jsonlite) plus base R.
A companion paper (R Journal style) lives under paper/rj/
in this repo.
From GitHub (development version):
# install.packages("remotes")
remotes::install_github("charlescoverdale/aieconindex")CRAN release: planned.
The package is pure R with three runtime dependencies
(cli, httr2, jsonlite) and the
base tools, stats, and utils
packages. R 4.1.0 or later is required. No API key is needed.
library(aieconindex)
# 1. See what's available
aei_releases()
#> # AEI: releases · 5 rows
#> release_id release_date model
#> 1 release_2026_03_24 2026-03-24 Claude Opus 4.5/4.6
#> 2 release_2026_01_15 2026-01-15 Claude Sonnet 4.5
#> 3 release_2025_09_15 2025-09-15 Claude Sonnet 4
#> 4 release_2025_03_27 2025-03-27 Claude 3.7 Sonnet
#> 5 release_2025_02_10 2025-02-10 Claude 3.5 Sonnet
#> ...
# 2. Look inside a release
aei_files("2025-09-15", recursive = TRUE)
# 3. Fetch the canonical usage table
df <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
# 4. Slice to a country
uk <- aei_geography("2025-09-15", country = "GBR")
# 5. Cite the dataset
aei_cite("2025-09-15", format = "bibtex")| Function | Returns |
|---|---|
aei_releases(live = TRUE) |
Available releases (live + bundled metadata) as an
aei_tbl |
aei_files(release, recursive = TRUE) |
Recursive file tree for a release as an aei_tbl with
path, type, size_bytes |
aei_releases(live = FALSE) # offline-safe (uses bundled metadata)
aei_files("latest") # tree of the most recent release
aei_files("2025-03-27", recursive = FALSE) # top-level only| Function | Returns |
|---|---|
aei_index(release, source, variant) |
Canonical usage table as an aei_tbl |
aei_download(release, path) |
CSVs as aei_tbl, JSON as parsed list, other extensions
as local path |
aei_index() is a convenience wrapper that locates the
canonical usage CSV by file-pattern matching (the AEI naming convention
has shifted across releases). Arguments:
source is one of "claude_ai" (Claude.ai
consumer product traffic) or "1p_api" (first-party API
traffic). Not all releases include both.variant is one of "raw" (counts and
percentages straight from Anthropic’s pipeline) or
"enriched" (joined to O*NET / SOC metadata, with derived
per-capita and tier metrics). Older releases may only ship one
variant.df_raw <- aei_index("2026-03-24", source = "claude_ai", variant = "raw")
df_enriched <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
df_api <- aei_index("2026-03-24", source = "1p_api", variant = "raw")aei_download() is the lower-level escape hatch. Pass any
path returned by aei_files():
soc <- aei_download("2025-03-27", "SOC_Structure.csv")
hierarchy <- aei_download("2025-09-15",
"data/output/request_hierarchy_tree_claude_ai.json")
report <- aei_download("2026-01-15", "aei_v4_appendix.pdf") # returns local path| Function | Returns |
|---|---|
aei_clusters(release, source) |
Request-hierarchy tree (Clio output) as a parsed nested list |
aei_tasks(release) |
O*NET task statements bundled with the release as an
aei_tbl |
aei_geography(release, country, geography) |
Country or US-state filter on the enriched table |
# Clio-derived request hierarchy (from 2025-09-15 onwards)
tree <- aei_clusters("2025-09-15", source = "claude_ai")
# Bundled O*NET task statements (ships in 2025-03-27)
tasks <- aei_tasks("2025-03-27")
# UK country slice (geographic facets ship from 2025-09-15 onwards)
uk <- aei_geography("2025-09-15", country = "GBR")
# Australia country slice
au <- aei_geography("2025-09-15", country = "AUS")
# US state-level breakdown
us_states <- aei_geography("2025-09-15", geography = "state_us")aei_geography() filters the enriched long-format table
on the geography column. Country codes are ISO-3 in the
enriched data ("GBR", "AUS",
"USA"). Releases before 2025-09-15 do not contain
geographic data and the function errors informatively.
| Function | Returns |
|---|---|
aei_compare(release_a, release_b, ...) |
Release-on-release diff with value_a,
value_b, delta, pct_change |
aei_link(x, y, by, type) |
Generic merge that preserves the aei_tbl class; for
splicing AEI to user-supplied data on a shared key |
aei_concentration(x, share_col, group_cols, top_n) |
HHI, top-N concentration ratio, Shannon entropy on usage shares |
# How did the cluster shares move between Sept 2025 and March 2026?
diff <- aei_compare("2025-09-15", "2026-03-24")
head(diff[order(-abs(diff$delta)), ])
# Splice AEI country shares to your own GDP-per-capita table
overlay <- data.frame(
geo_id = c("GBR", "AUS", "USA"),
gdp_pc = c(48000, 65000, 80000)
)
joined <- aei_link(aei_geography("2025-09-15"), overlay, by = "geo_id")
# How concentrated is UK Claude.ai usage across O*NET tasks?
uk <- aei_geography("2025-09-15", country = "GBR")
uk_tasks <- uk[uk$facet == "onet_task" & uk$variable == "onet_task_pct", ]
aei_concentration(uk_tasks)aei_link() is the entry point for “bring your own data”
workflows. It’s a thin wrapper over base::merge() that
preserves the aei_tbl class and provenance metadata,
supports left / inner / full joins, and warns when a join produces zero
rows. Use it to link the AEI to occupational crosswalks (SOC ↔︎ ANZSCO
↔︎ ISCO ↔︎ SOC2020 UK), to national labour-force survey data (ONS, BLS
OEWS, ABS), or to any other external table keyed on country code or task
identifier.
| Function | Returns |
|---|---|
aei_cite(release, format, method = TRUE) |
Citation in plain text, BibTeX, or bibentry form |
By default aei_cite() includes the methodological source
paper of Handa et
al. (2025) alongside the dataset citation, because attribution under
the dataset’s CC-BY licence is required for any redistribution. Set
method = FALSE to return only the dataset citation.
aei_cite() # text, project-wide, with paper
aei_cite("2025-09-15", format = "bibtex") # BibTeX, both refs
aei_cite("2026-03-24", format = "bibentry") # bibentry object (multi-entry)
aei_cite(format = "text", method = FALSE) # dataset only| Function | Returns |
|---|---|
aei_cache_dir() |
Path of the cache directory (override-aware) |
aei_cache_info() |
List with dir, n_files,
size_bytes, size_human,
files |
aei_cache_clear() |
Clears the cache; invisible NULL |
All data-returning functions emit an aei_tbl: a
data.frame subclass with provenance metadata stored in the
aei_query attribute. The metadata carries
endpoint, the resolved release identifier, the source URL,
and the fetch timestamp; it is preserved across row and column
subsetting.
df <- aei_index("2025-09-15")
attr(df, "aei_query")
#> $endpoint "index"
#> $release "release_2025_09_15"
#> $facet "raw/claude_ai"
#> $source_url "https://huggingface.co/datasets/Anthropic/EconomicIndex/.../aei_raw_claude_ai_*.csv"
#> $fetched_at "2026-04-28 18:34:00 BST"
# Custom print method shows the provenance header
print(df)
#> # AEI: index · release=release_2025_09_15 · facet=raw/claude_ai · 12345 rows
#> ...
# Subsetting preserves the class and attribute
sub <- df[df$value > 1, ]
class(sub)
#> [1] "aei_tbl" "data.frame"The class inherits from data.frame, so any function that
takes a data frame works without conversion. Drop the class with
as.data.frame() if you need a plain frame.
Pin a release for production. Default
release = "latest" resolves to the most recent release at
call time, which is fine for exploration but unsuitable for reproducible
pipelines. Pin a release identifier explicitly:
RELEASE <- "2025-09-15" # or "release_2025_09_15"
df <- aei_index(RELEASE, source = "claude_ai", variant = "enriched")Replicate an Anthropic figure. Anthropic ships
Python replication notebooks (v2_report_replication.ipynb)
inside several releases. To replicate the augmentation-vs-automation
headline figure in R:
df <- aei_download("2025-03-27", "automation_vs_augmentation_v2.csv")
df$family <- ifelse(df$interaction_type %in% c("directive", "feedback loop"),
"Automation", "Augmentation")Country exposure ranking. Top O*NET tasks for the UK by share of Claude.ai usage:
uk <- aei_geography("2025-09-15", country = "GBR")
top <- subset(uk, facet == "onet_task" & variable == "onet_task_pct")
top <- top[order(-top$value), ][1:15, c("cluster_name", "value")]Cross-country comparison. Per-capita usage index for selected economies:
df <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
country_overall <- subset(df,
geography == "country" &
variable == "usage_per_capita_index" &
cluster_name == "not_classified" &
level == 0
)
country_overall <- country_overall[order(-country_overall$value), ]Cite in a paper. Drop the BibTeX form straight into
your .bib:
cat(aei_cite("2025-09-15", format = "bibtex"), file = "refs.bib", append = TRUE)The package recognises every release published to Hugging Face up to 2026-03-24 and discovers any newer releases automatically via the Hugging Face tree API.
| Release | Headline model | Notes |
|---|---|---|
release_2025_02_10 |
Claude 3.5 Sonnet | Initial release; O*NET task mappings; automation vs augmentation |
release_2025_03_27 |
Claude 3.7 Sonnet | Cluster-level insights; v2 report replication notebook |
release_2025_09_15 |
Claude Sonnet 4 | Geographic + first-party API data added; long-format schema |
release_2026_01_15 |
Claude Sonnet 4.5 | Economic primitives added |
release_2026_03_24 |
Claude Opus 4.5/4.6 | Learning curves added |
Each release ships its own data_documentation.md on
Hugging Face. The package’s aei_releases() blends bundled
metadata (model, report URL) with a live Hugging Face listing.
Downloaded files are cached under the path returned by
aei_cache_dir(), which defaults to
tools::R_user_dir("aieconindex", "cache"). Override before
the first call:
options(aieconindex.cache_dir = "/your/preferred/path")Cache is keyed by release identifier and relative path, so re-downloads are byte-identical to the original.
aei_cache_info()
#> $dir "/Users/.../aieconindex/cache"
#> $n_files 3
#> $size_bytes 126839425
#> $size_human "121.0 MB"
#> $files <data.frame: 3 rows>
aei_cache_clear() # removes all cached filesThe latest release usage CSVs are around 100 MB each, so the first call to a fresh release is bandwidth-heavy. Subsequent calls are served from disk.
Anthropic ships its own replication code as Jupyter notebooks
(Python) inside several releases: for example,
release_2025_03_27/v2_report_replication.ipynb reproduces
the figures in the corresponding report PDF.
aieconindex is a complement to that workflow, not a
port. The package gives you typed, cached, R-side access to the same
source CSVs and JSONs, leaving downstream analysis to you. If you want
to reproduce a specific Anthropic figure, the notebook is the most
reliable starting point. If you want to feed AEI data into an existing R
pipeline (joining with ONS Labour Force Survey, BLS OEWS, or ABS Labour
Force data; weighting by national working-age employment), this package
is the most direct route.
The Hugging Face Python datasets library can also load
the dataset
(datasets.load_dataset("Anthropic/EconomicIndex"));
aieconindex is the R-side equivalent for that workflow.
data_documentation.md for the authoritative
caveat list.data/output/
layout. aei_index() and aei_geography() paper
over this with file-pattern heuristics. If Anthropic restructures again,
the heuristics will break and the package will need an update."latest" to remain
reproducible across re-runs.tools::R_user_dir("aieconindex", "cache")) avoids
re-downloading, but the first call to a new release is
bandwidth-heavy.| Package | Description |
|---|---|
inequality |
Inequality and poverty measurement (labour-market distributional context) |
ons |
UK labour market data (employment, wages by occupation) |
fred |
US labour market data (employment, productivity, occupational wages) |
readoecd |
OECD international labour and skills data |
Please cite both the package and the underlying dataset.
citation("aieconindex")For the dataset specifically, aei_cite() returns
ready-made strings:
aei_cite("2025-09-15", format = "bibtex")If you use the AEI in academic work, also cite Handa et al. (2025),
arXiv:2503.04761 — the methodological source paper.
aei_cite() includes it by default.
Issues and pull requests welcome at https://github.com/charlescoverdale/aieconindex/issues. Useful contributions for v0.2 include:
For Anthropic-introduced schema changes that break
aei_index() or aei_geography(), please open an
issue with a sample of the new file structure (output of
aei_files(<new_release>)).
This package is released under the MIT License.
The underlying Anthropic Economic Index dataset is released by
Anthropic under Creative Commons
Attribution 4.0 International (CC-BY-4.0). When using this package
to retrieve or redistribute that data, attribution to Anthropic and to
Handa et al. (2025) is
required. Use aei_cite() for ready-made citation
strings.
The bundled O*NET and SOC reference data (when accessed through the AEI) inherit their respective licences. See the O*NET licensing page and the SOC documentation.
This product uses the Anthropic Economic Index data but is not endorsed or certified by Anthropic.