Data Imputation

Bill Denney

2024-06-19

Imputation may be required for noncompartmental analysis (NCA) calculations. Typical imputations may require setting the concentration before the first dose to zero or shifting actual time predose concentrations to the beginning of the dosing interval.

PKNCA supports imputation either for the full analysis dataset or per calculation interval.

The current list of imputation methods built into PKNCA can be found by looking at ?PKNCA_impute_method:

library(PKNCA)
cat(paste(
  "*", ls("package:PKNCA", pattern = "^PKNCA_impute_method")
), sep = "\n")
#> * PKNCA_impute_method_start_cmin
#> * PKNCA_impute_method_start_conc0
#> * PKNCA_impute_method_start_predose

How does imputation occur?

(You can skip this section if you don’t desire the details of the methods of imputation.)

Imputation occurs just before calculations are performed within PKNCA. Imputation occurs only on a single interval definition at a time, so the same group (usually meaning the same subject with the same analyte) at the same time range can have different imputations for different parameter calculations.

The reason that this is done is to ensure that there are no unintentional modifications to the data. As an example, if an AUC0-24 were calculated on Day 1 and Day 2 of a study with actual times, the nominal 24 hour sample may be collected at 23.5 hours. It may be preferable to keep the 23.5 hour sample at 23.5 hours for the Day 1 calculation, and at the same time, it may be preferred to shift the same 23.5 hr sample to 24 hours (time 0 on Day 2) for the Day 2 calculation.

How to select imputation methods to use

The selection of imputation methods uses a string of text with commas or spaces (or both) separating the imputation methods to use. No imputation will be performed if the imputation method is requested as NA or "".

Imputation method functions are named PKNCA_impute_method_[method name]. For example, the method to impute a concentration of 0 at time 0 is named PKNCA_impute_method_start_conc0. When specifying the imputation method to use, give the [method name] part of the function name. So for the example above, use "start_conc0".

To specify more than one, give all the methods in order with a comma or space separating them. For example, to first move a predose concentration up to the time of dosing and then set time 0 to concentration 0, use "start_predose,start_conc0", and the two methods will be applied in order.

Imputation for the full dataset

If an imputation applies to the full dataset, it can be provided in the impute argument to PKNCAdata():

library(PKNCA)
# Remove time 0 to illustrate that imputation works
d_conc <- as.data.frame(datasets::Theoph)[!datasets::Theoph$Time == 0, ]
conc_obj <- PKNCAconc(d_conc, conc~Time|Subject)
d_dose <- unique(datasets::Theoph[datasets::Theoph$Time == 0,
                                  c("Dose", "Time", "Subject")])
dose_obj <- PKNCAdose(d_dose, Dose~Time|Subject)
data_obj <- PKNCAdata(conc_obj, dose_obj, impute = "start_predose,start_conc0")
nca_obj <- pk.nca(data_obj)
summary(nca_obj)
#>  start end  N     auclast        cmax               tmax   half.life aucinf.obs
#>      0  24 12 74.6 [24.2]           .                  .           .          .
#>      0 Inf 12           . 8.65 [17.0] 1.14 [0.630, 3.55] 8.18 [2.12] 115 [28.4]
#> 
#> Caption: auclast, cmax, aucinf.obs: geometric mean and geometric coefficient of variation; tmax: median and range; half.life: arithmetic mean and standard deviation; N: number of subjects

Imputation by calculation interval

If an imputation applies to specific intervals, the column in the interval data.frame can be provided in the impute argument to PKNCAdata():

library(PKNCA)
# Remove time 0 to illustrate that imputation works
d_conc <- as.data.frame(datasets::Theoph)[!datasets::Theoph$Time == 0, ]
conc_obj <- PKNCAconc(d_conc, conc~Time|Subject)
d_dose <- unique(datasets::Theoph[datasets::Theoph$Time == 0,
                                  c("Dose", "Time", "Subject")])
dose_obj <- PKNCAdose(d_dose, Dose~Time|Subject)

d_intervals <-
  data.frame(
    start=0, end=c(24, 24.1),
    auclast=TRUE,
    impute=c(NA, "start_conc0")
  )

data_obj <- PKNCAdata(conc_obj, dose_obj, intervals = d_intervals, impute = "impute")
nca_obj <- pk.nca(data_obj)
#> Warning: Requesting an AUC range starting (0) before the first measurement
#> (0.27) is not allowed
#> Warning: Requesting an AUC range starting (0) before the first measurement (0.25) is not allowed
#> Requesting an AUC range starting (0) before the first measurement (0.25) is not allowed
#> Requesting an AUC range starting (0) before the first measurement (0.25) is not allowed
#> Warning: Requesting an AUC range starting (0) before the first measurement (0.27) is not allowed
#> Requesting an AUC range starting (0) before the first measurement (0.27) is not allowed
#> Warning: Requesting an AUC range starting (0) before the first measurement
#> (0.35) is not allowed
#> Warning: Requesting an AUC range starting (0) before the first measurement
#> (0.3) is not allowed
#> Warning: Requesting an AUC range starting (0) before the first measurement
#> (0.25) is not allowed
#> Warning: Requesting an AUC range starting (0) before the first measurement
#> (0.37) is not allowed
#> Warning: Requesting an AUC range starting (0) before the first measurement
#> (0.25) is not allowed
#> Warning: Requesting an AUC range starting (0) before the first measurement
#> (0.3) is not allowed
# PKNCA does not impute time 0 by default, so AUClast in the 0-24 interval is
# not calculated
summary(nca_obj)
#>  start  end  N     auclast
#>      0 24.0 12          NC
#>      0 24.1 12 76.4 [23.0]
#> 
#> Caption: auclast: geometric mean and geometric coefficient of variation; N: number of subjects

Advanced: Writing your own imputation functions

Writing your own imputation function is intended to be a simple process. To create an imputation function requires the following steps:

  1. Write a function where the name starts with PKNCA_impute_method_ and the remainder of the function name is a brief description of the method. (Such as PKNCA_impute_method_start_conc0.)
  2. The function should have 4 arguments: conc, time, ..., and options.
  3. The function should return a single data.frame with two columns named conc and time. The rows in the data.frame must be sorted by time.

In addition to the above, the function may take named arguments of: