The dataset
package helps you create
semantically rich, machine-readable,
and interoperable datasets in R. It introduces S3
classes that extend data frames, vectors, and bibliographic entries with
formal metadata structures inspired by:
The goal is to preserve metadata when reusing statistical and repository datasets, improve interoperability, and make it easy to turn tidy data frames into web-ready, publishable datasets that comply with ISO and W3C standards.
You can install the latest released version of
dataset
from CRAN with:
install.packages("dataset")
To install the development version from GitHub with pak
or remotes
:
# install.packages("pak")
::pak("dataobservatory-eu/dataset")
pak
# install.packages("remotes")
::install_github("dataobservatory-eu/dataset") remotes
library(dataset)
<- dataset_df(
df country = defined(
c("AD", "LI"),
label = "Country",
namespace = "https://www.geonames.org/countries/$1/"
),gdp = defined(c(3897, 7365),
label = "GDP",
unit = "million euros"
),dataset_bibentry = dublincore(
title = "GDP Dataset",
creator = person("Jane", "Doe", role = "aut"),
publisher = "Small Repository"
)
)print(df)
#> Doe (2025): GDP Dataset [dataset]
#> rowid country gdp
#> <defined> <defined> <defined>
#> 1 obs1 AD 3897
#> 2 obs2 LI 7365
Export as RDF triples:
dataset_to_triples(df, format = "nt")
#> [1] "<http://example.com/dataset#obsobs1> <http://example.com/prop/country> <https://www.geonames.org/countries/AD/> ."
#> [2] "<http://example.com/dataset#obsobs2> <http://example.com/prop/country> <https://www.geonames.org/countries/LI/> ."
#> [3] "<http://example.com/dataset#obsobs1> <http://example.com/prop/gdp> \"3897\"^^<xsd:decimal> ."
#> [4] "<http://example.com/dataset#obsobs2> <http://example.com/prop/gdp> \"7365\"^^<xsd:decimal> ."
Retain automatically recorded provenance:
provenance(df)
#> [1] "<http://example.com/dataset_prov.nt> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Bundle> ."
#> [2] "<http://example.com/dataset#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Entity> ."
#> [3] "<http://example.com/dataset#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/cube#DataSet> ."
#> [4] "_:doejane <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Agent> ."
#> [5] "<https://doi.org/10.32614/CRAN.package.dataset> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#SoftwareAgent> ."
#> [6] "<http://example.com/creation> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> ."
#> [7] "<http://example.com/creation> <http://www.w3.org/ns/prov#generatedAtTime> \"2025-08-25T21:44:14Z\"^^<xsd:dateTime> ."
We welcome contributions and discussion!
This project follows the rOpenSci Code of Conduct. By participating, you are expected to uphold these guidelines.