Package {osdc}


Type: Package
Title: Open Source Diabetes Classifier for Danish Registers
Version: 0.11.3
Description: The algorithm first identifies a population of individuals from Danish register data with any type of diabetes as individuals with two or more inclusion events. Then, it splits this population into individuals with either type 1 diabetes or type 2 diabetes by identifying individuals with type 1 diabetes and classifying the remainder of the diabetes population as having type 2 diabetes.
License: MIT + file LICENSE
URL: https://github.com/steno-aarhus/osdc, https://steno-aarhus.github.io/osdc/
BugReports: https://github.com/steno-aarhus/osdc/issues
Depends: R (≥ 4.2.0)
Imports: checkmate, cli, codeCollection, dplyr (≥ 1.2.0), dbplyr (≥ 2.5.1), duckplyr (≥ 1.1.3), fabricatr, lifecycle, lubridate (≥ 1.9.5), purrr (≥ 1.2.1), rlang (≥ 1.1.7), stats, tidyselect (≥ 1.2.1), utils
Suggests: glue, knitr, quarto, rmarkdown, spelling, stringr, testthat (≥ 3.0.0), tidyr, tibble, arrow (≥ 22.0.0.1), DBI (≥ 1.3.0)
VignetteBuilder: quarto
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
Config/roxygen2/version: 8.0.0
NeedsCompilation: no
Packaged: 2026-06-04 11:29:43 UTC; luke
Author: Signe Kirk Brødbæk ORCID iD [aut], Anders Aasted Isaksen ORCID iD [aut], Luke William Johnston ORCID iD [aut, cre], Steno Diabetes Center Aarhus [cph], Aarhus University [cph]
Maintainer: Luke William Johnston <lwjohnst@gmail.com>
Repository: CRAN
Date/Publication: 2026-06-04 15:30:02 UTC

osdc: Open Source Diabetes Classifier for Danish Registers

Description

logo

The algorithm first identifies a population of individuals from Danish register data with any type of diabetes as individuals with two or more inclusion events. Then, it splits this population into individuals with either type 1 diabetes or type 2 diabetes by identifying individuals with type 1 diabetes and classifying the remainder of the diabetes population as having type 2 diabetes.

Author(s)

Maintainer: Luke William Johnston lwjohnst@gmail.com (ORCID)

Authors:

Other contributors:

See Also

Useful links:


A list of the algorithmic logic underlying osdc.

Description

This nested list contains the logic details of the algorithm.

Usage

algorithm()

Format

Is a list with nested lists that have these named elements:

register

Optional. The register used for this logic

title

The title to use when displaying the logic in tables.

logic

The logic itself.

comments

Some additional comments on the logic.

Value

A nested list with the algorithmic logic. Contains fields register, title, logic, and comments.

See Also

See the vignette("algorithm") for the logic used to filter these patients.

Examples

algorithm()$is_hba1c_over_threshold
algorithm()$is_gld_code$logic

Classify diabetes status using Danish registers.

Description

This function requires that each source of register data is represented as a single DuckDB object in R (e.g. a connection to Parquet files). Each DuckDB object must contain a single table covering all years of that data source, or at least the years you have and are interested in.

Usage

classify_diabetes(
  lpr,
  hsr,
  lab_forsker,
  bef,
  lmdb,
  stable_inclusion_start_date = "1998-01-01"
)

Arguments

lpr

The unified LPR register, see join_registers()

hsr

The unified health services registers (SYSI and SSSY), see join_registers()

lab_forsker

The register for laboratory results for research

bef

The BEF table from the civil register

lmdb

The LMDB table from the prescription register

stable_inclusion_start_date

Cutoff date after which inclusion events are considered true incident diabetes cases. Defaults to "1998-01-01", which is one year after the data on pregnancy events from the Patient Register are considered valid for dropping gestational diabetes-related purchases of glucose-lowering drugs. This default assumes that the user is using LPR and LMDB data from at least Jan 1 1997 onward. If the user only has access to LPR and LMDB data from a later date, this parameter should be set to one year after the beginning of the user's data coverage.

Value

The same object type as the input data, which would be a duckplyr::duckdb_tibble() type object.

See Also

See the vignette("osdc", package = "osdc") vignette for a more details and on how to use this function.


Create a synthetic dataset of edge case inputs

Description

This function generates a list of tibbles representing the Danish health registers and the data necessary to run the algorithm. The dataset contains 23 individual cases (pnrs), each designed to test a specific logical branch of the diabetes classification algorithm, including inclusion, exclusion, censoring, and type classification rules.

The generated data is used in testthat tests to ensure the algorithm behaves as expected under a wide range of conditions, but it is also intended to be explored by users to better understand how the algorithm logic works.

Usage

edge_cases()

Value

A named list of 9 tibble::tibble() objects, each representing a different health register: bef, lmdb, lpr_adm, lpr_diag, lpr3a_kontakt, lpr3a_diagnose, lpr3f_kontakter, lpr3f_diagnoser, sysi, sssy, and lab_forsker.

Examples

edge_cases()

Join prepared registers

Description

Join prepared registers

Usage

join_registers(register_list)

Arguments

register_list

A list of the prepared registers, from e.g. prepare_lpr2().

Value

A single object with all rows from each register in register_list.

Examples

register_data <- simulate_registers(c(
  "lpr_adm",
  "lpr_diag",
  "lpr3f_kontakter",
  "lpr3f_diagnoser",
  "sssy",
  "sysi"
))
join_registers(list(
  prepare_lpr2(register_data$lpr_adm, register_data$lpr_diag),
  prepare_lpr3f(
    register_data$lpr3f_kontakter,
    register_data$lpr3f_diagnoser
  )
))
join_registers(list(register_data$sysi, register_data$sssy))

List of non-cases to test the diabetes classification algorithm

Description

This function generates a list of tibbles representing the Danish health registers and the data necessary to run the algorithm. The dataset contains individuals who should not be included in the final classified cohort.

Usage

non_cases()

Details

The generated data is used in testthat tests to ensure the algorithm behaves as expected under a wide range of conditions, but it is also intended to be explored by users to better understand how the algorithm logic works and to be shown in the documentation.

Value

A named list of 9 tibble::tibble() objects, each representing a different health register: bef, lmdb, lpr_adm, lpr_diag, lpr3a_kontakt, lpr3a_diagnose, lpr3f_kontakter, lpr3f_diagnoser, sysi, sssy, and lab_forsker.

Examples

non_cases()

Description of the different non-cases included in non_cases()

Description

All cases, aside from what would exclude them from being classified as described in the metadata here, would otherwise be classified as having diabetes.

Usage

non_cases_metadata()

Value

A named list of character strings, where each name corresponds to a non-case PNR in the dataset generated by non_cases().

Examples

non_cases_metadata()

Prepare and join the two LPR2 registers to extract diabetes and pregnancy diagnoses.

Description

Prepare and join the two LPR2 registers to extract diabetes and pregnancy diagnoses.

Usage

prepare_lpr2(lpr_adm, lpr_diag)

Arguments

lpr_adm

The LPR2 register containing hospital admissions.

lpr_diag

The LPR2 register containing diabetes diagnoses.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with the following columns:

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Prepare and join the two LPR3A registers to extract diabetes and pregnancy diagnoses.

Description

Prepare and join the two LPR3A registers to extract diabetes and pregnancy diagnoses.

Usage

prepare_lpr3a(lpr3a_kontakt, lpr3a_diagnose)

Arguments

lpr3a_kontakt

The LPR3A register containing hospital contacts/admissions.

lpr3a_diagnose

The LPR3A register containing diabetes diagnoses.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with the following columns:

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Prepare and join the two LPR3F registers to extract diabetes and pregnancy diagnoses.

Description

Prepare and join the two LPR3F registers to extract diabetes and pregnancy diagnoses.

Usage

prepare_lpr3f(lpr3f_kontakter, lpr3f_diagnoser)

Arguments

lpr3f_kontakter

The LPR3F register containing hospital contacts/admissions.

lpr3f_diagnoser

The LPR3F register containing diabetes diagnoses.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with the following columns:

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Register variables (with descriptions) required for the osdc algorithm.

Description

Register variables (with descriptions) required for the osdc algorithm.

Usage

registers()

Value

Outputs a list of registers and variables required by osdc. Each list item contains the official Danish name of the register, the start year, the end year, and the variables with their descriptions. Each register item is a list with 4 items:

name

The official name of the variable found in the register.

danish_description

The official Danish description of the variable.

english_description

The translated English description of the variable.

data_type

The data type, e.g. "character" of the variable.

Source

Many of the details within the registers() metadata come from the full official list of registers from Statistics Denmark (DST): https://www.dst.dk/extranet/forskningvariabellister/Oversigt%20over%20registre.html

Examples

registers()

Simulate a fake data frame of one or more Danish registers

Description

Simulate a fake data frame of one or more Danish registers

Usage

simulate_registers(registers, n = 1000)

Arguments

registers

The name of the register you want to simulate.

n

The number of rows to simulate for the resulting register.

Value

A list with simulated register data, as a tibble::tibble().

Examples

simulate_registers(c("bef", "sysi"))
simulate_registers("bef")