| Title: | Regression with Interval-Censored Covariates | 
| Version: | 0.1.3 | 
| Description: | Provides functions to simulate and analyze data for a regression model with an interval censored covariate, as described in Morrison et al. (2021) <doi:10.1111/biom.13472>. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.1.2 | 
| VignetteBuilder: | knitr | 
| Config/testthat/edition: | 3 | 
| Imports: | biglm, dplyr, lubridate, magrittr, stats, pryr, arm, ggplot2, scales | 
| Suggests: | spelling, rmarkdown, knitr, testthat, markdown, pander | 
| Language: | en-US | 
| URL: | https://d-morrison.github.io/rwicc/, https://github.com/d-morrison/rwicc | 
| BugReports: | https://github.com/d-morrison/rwicc/issues | 
| NeedsCompilation: | no | 
| Packaged: | 2022-03-09 01:15:52 UTC; dmorrison | 
| Author: | Douglas Morrison  | 
| Maintainer: | Douglas Morrison <dmorrison01@ucla.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2022-03-09 21:40:06 UTC | 
convert a pair of simple logistic regression coefficients into P(Y|T) curve:
Description
convert a pair of simple logistic regression coefficients into P(Y|T) curve:
Usage
build_phi_function_from_coefs(coefs)
Arguments
coefs | 
 numeric vector of coefficients  | 
Value
function(t) P(Y=1|T=t)
compute mean window period duration from simple logistic regression coefficients
Description
compute mean window period duration from simple logistic regression coefficients
Usage
compute_mu(theta)
Arguments
theta | 
 numeric vector of coefficients  | 
Value
numeric scalar: mean window period duration
Fit a logistic regression model with an interval-censored covariate
Description
This function fits a logistic regression model for a binary outcome Y with an interval-censored covariate T, using an EM algorithm, as described in Morrison et al (2021); doi: 10.1111/biom.13472.
Usage
fit_joint_model(
  participant_level_data,
  obs_level_data,
  model_formula = stats::formula(Y ~ T),
  mu_function = compute_mu,
  bin_width = 1,
  denom_offset = 0.1,
  EM_toler_loglik = 0.1,
  EM_toler_est = 1e-04,
  EM_max_iterations = Inf,
  glm_tolerance = 1e-07,
  glm_maxit = 20,
  initial_S_estimate_location = 0.25,
  coef_change_metric = "max abs rel diff coefs",
  verbose = FALSE
)
Arguments
participant_level_data | 
 a data.frame or tibble with the following variables: 
  | 
obs_level_data | 
 a data.frame or tibble with the following variables: 
  | 
model_formula | 
 the functional form for the regression model for p(y|t) (as a formula() object)  | 
mu_function | 
 a function taking a vector of regression coefficient estimates as input and outputting an estimate of mu (mean duration of MAA-positive infection).  | 
bin_width | 
 the number of days between possible seroconversion dates (should be an integer)  | 
denom_offset | 
 an offset value added to the denominator of the hazard estimates to improve numerical stability  | 
EM_toler_loglik | 
 the convergence cutoff for the log-likelihood criterion ("Delta_L" in the paper)  | 
EM_toler_est | 
 the convergence cutoff for the parameter estimate criterion ("Delta_theta" in the paper)  | 
EM_max_iterations | 
 the number of EM iterations to perform before giving up if still not converged.  | 
glm_tolerance | 
 the convergence cutoff for the glm fit in the M step  | 
glm_maxit | 
 the iterations cutoff for the glm fit in the M step  | 
initial_S_estimate_location | 
 determines how seroconversion date is guessed to initialize the algorithm; can be any decimal between 0 and 1; 0.5 = midpoint imputation, 0.25 = 1st quartile, 0 = last negative, etc.  | 
coef_change_metric | 
 a string indicating the type of parameter estimate criterion to use: 
  | 
verbose | 
 whether to print algorithm progress details to the console  | 
Value
a list with the following elements:
-  
Theta: the estimated regression coefficients for the model of p(Y|T) -  
Mu: the estimated mean window period (a transformation ofTheta) -  
Omega: a table with the estimated parameters for the model of p(S|E). -  
converged: indicator of whether the algorithm reached its cutoff criteria before reaching the specified maximum iterations. 1 = reached cutoffs, 0 = not. -  
iterations: the number of EM iterations completed before the algorithm stopped. -  
convergence_metrics: the four convergence metrics 
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi: 10.1111/biom.13472.
Examples
## Not run: 
# simulate data:
study_data <- simulate_interval_censoring()
# fit model:
EM_algorithm_outputs <- fit_joint_model(
  obs_level_data = study_data$obs_data,
  participant_level_data = study_data$pt_data
)
## End(Not run)
Fit model using midpoint imputation
Description
Fit model using midpoint imputation
Usage
fit_midpoint_model(
  participant_level_data,
  obs_level_data,
  maxit = 1000,
  tolerance = 1e-08
)
Arguments
participant_level_data | 
 a data.frame or tibble with the following variables: 
  | 
obs_level_data | 
 a data.frame or tibble with the following variables: 
  | 
maxit | 
 maximum iterations, passed to   | 
tolerance | 
 convergence criterion, passed to   | 
Value
a vector of logistic regression coefficient estimates
Examples
sim_data = simulate_interval_censoring(
  "theta" = c(0.986, -3.88),
  "study_cohort_size" = 4500,
  "preconversion_interval_length" = 365,
  "hazard_alpha" = 1,
  "hazard_beta" = 0.5)
theta_est_midpoint = fit_midpoint_model(
  obs_level_data = sim_data$obs_data,
  participant_level_data = sim_data$pt_data
)
Fit model using uniform imputation
Description
Fit model using uniform imputation
Usage
fit_uniform_model(
  participant_level_data,
  obs_level_data,
  maxit = 1000,
  tolerance = 1e-08,
  n_imputations = 10
)
Arguments
participant_level_data | 
 a data.frame or tibble with the following variables: 
  | 
obs_level_data | 
 a data.frame or tibble with the following variables: 
  | 
maxit | 
 maximum iterations, passed to   | 
tolerance | 
 convergence criterion, passed to   | 
n_imputations | 
 number of imputed data sets to create  | 
Value
a vector of logistic regression coefficient estimates
Examples
sim_data = simulate_interval_censoring(
  "theta" = c(0.986, -3.88),
  "study_cohort_size" = 4500,
  "preconversion_interval_length" = 365,
  "hazard_alpha" = 1,
  "hazard_beta" = 0.5)
theta_est_midpoint = fit_uniform_model(
  obs_level_data = sim_data$obs_data,
  participant_level_data = sim_data$pt_data
)
plot estimated and true CDFs for seroconversion date distribution
Description
plot estimated and true CDFs for seroconversion date distribution
Usage
plot_CDF(true_hazard_alpha, true_hazard_beta, omega.hat)
Arguments
true_hazard_alpha | 
 The data-generating hazard at the start of the study  | 
true_hazard_beta | 
 The change in data-generating hazard per calendar year  | 
omega.hat | 
 tibble of estimated discrete hazards  | 
Value
a ggplot
Examples
## Not run: 
hazard_alpha = 1
hazard_beta = 0.5
study_data <- simulate_interval_censoring(
  "hazard_alpha" = hazard_alpha,
  "hazard_beta" = hazard_beta)
# fit model:
EM_algorithm_outputs <- fit_joint_model(
  obs_level_data = study_data$obs_data,
  participant_level_data = study_data$pt_data
)
plot1 = plot_CDF(
  true_hazard_alpha = hazard_alpha,
  true_hazard_beta = hazard_beta,
  omega.hat = EM_algorithm_outputs$Omega)
print(plot1)
## End(Not run)
Plot true and estimated curves for P(Y=1|T=t)
Description
Plot true and estimated curves for P(Y=1|T=t)
Usage
plot_phi_curves(
  theta_true,
  theta.hat_joint,
  theta.hat_midpoint,
  theta.hat_uniform
)
Arguments
theta_true | 
 the coefficients of the data-generating model P(Y=1|T=t)  | 
theta.hat_joint | 
 the estimated coefficients from the joint model  | 
theta.hat_midpoint | 
 the estimated coefficients from midpoint imputation  | 
theta.hat_uniform | 
 the estimated coefficients from uniform imputation  | 
Value
a ggplot
Examples
## Not run: 
theta_true = c(0.986, -3.88)
hazard_alpha = 1
hazard_beta = 0.5
sim_data = simulate_interval_censoring(
  "theta" = theta_true,
  "study_cohort_size" = 4500,
  "preconversion_interval_length" = 365,
  "hazard_alpha" = hazard_alpha,
  "hazard_beta" = hazard_beta)
# extract the participant-level and observation-level simulated data:
sim_participant_data = sim_data$pt_data
sim_obs_data = sim_data$obs_data
rm(sim_data)
# joint model:
EM_algorithm_outputs = fit_joint_model(
  obs_level_data = sim_obs_data,
  participant_level_data = sim_participant_data,
  bin_width = 7,
  verbose = FALSE)
# midpoint imputation:
theta_est_midpoint = fit_midpoint_model(
  obs_level_data = sim_obs_data,
  participant_level_data = sim_participant_data
)
# uniform imputation:
theta_est_uniform = fit_uniform_model(
  obs_level_data = sim_obs_data,
  participant_level_data = sim_participant_data
)
plot2 = plot_phi_curves(
  theta_true = theta_true,
  theta.hat_uniform = theta_est_uniform,
  theta.hat_midpoint = theta_est_midpoint,
  theta.hat_joint = EM_algorithm_outputs$Theta)
print(plot2)
## End(Not run)
rwicc: Regression with Interval-Censored Covariates
Description
The rwicc package implements a regression model with an
interval-censored covariate using an EM algorithm, as described in Morrison et al (2021); doi: 10.1111/biom.13472.
rwicc functions
The main rwicc functions are:
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi: 10.1111/biom.13472.
Inverse survival function for time-to-event variable with linear hazard function
Description
This function determines the seroconversion date corresponding to a provided probability of survival. See doi: 10.1111/biom.13472, Supporting Information, Section A.4.
Usage
seroconversion_inverse_survival_function(u, e, hazard_alpha, hazard_beta)
Arguments
u | 
 a vector of seroconversion survival probabilities  | 
e | 
 a vector of time differences between study start and enrollment (in years)  | 
hazard_alpha | 
 the instantaneous hazard of seroconversion on the study start date  | 
hazard_beta | 
 the change in hazard per year after study start date  | 
Value
numeric vector of time differences between study start and seroconversion (in years)
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics, doi: 10.1111/biom.13472.
Simulate a dataset with interval-censored seroconversion dates
Description
simulate_interval_censoring generates a simulated data set from a
data-generating model based on the typical structure of a cohort study of HIV
biomarker progression, as described in Morrison et al (2021); doi: 10.1111/biom.13472.
Usage
simulate_interval_censoring(
  study_cohort_size = 4500,
  hazard_alpha = 1,
  hazard_beta = 0.5,
  preconversion_interval_length = 84,
  theta = c(0.986, -3.88),
  probability_of_ever_seroconverting = 0.05,
  years_in_study = 10,
  max_scheduling_offset = 7,
  days_from_study_start_to_recruitment_end = 365,
  study_start_date = lubridate::ymd("2001-01-01")
)
Arguments
study_cohort_size | 
 the number of participants to simulate (N_0 in the paper)  | 
hazard_alpha | 
 the hazard (instantaneous risk) of seroconversion at the start date of the cohort study for those participants at risk of seroconversion  | 
hazard_beta | 
 the change in hazard per calendar year  | 
preconversion_interval_length | 
 the number of days between tests for seroconversion  | 
theta | 
 the parameters of a logistic model (with linear functional from) specifying the probability of MAA-positive biomarkers as a function of time since seroconversion  | 
probability_of_ever_seroconverting | 
 the probability that each participant is at risk of HIV seroconversion  | 
years_in_study | 
 the duration of follow-up for each participant  | 
max_scheduling_offset | 
 the maximum divergence of pre-seroconversion followup visits from the prescribed schedule  | 
days_from_study_start_to_recruitment_end | 
 the length of the recruitment period  | 
study_start_date | 
 the date when the study starts recruitment ("d_0" in the main text). The value of this parameter does not affect the simulation results; it is only necessary as a reference point for generating E, L, R, O, and S.  | 
Value
A list containing the following two tibbles:
-  
pt_data: a tibble of participant-level information, with the following columns:-  
ID: participant ID -  
E: enrollment date -  
L: date of last HIV test prior to seroconversion -  
R: date of first HIV test after seroconversion 
 -  
 -  
obs_data: a tibble of longitudinal observations with the following columns:-  
ID: participant ID -  
O: dates of biomarker sample collection -  
Y: MAA classifications of biomarker samples 
 -  
 
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi: 10.1111/biom.13472.
Examples
study_data <- simulate_interval_censoring()
participant_characteristics <- study_data$pt_data
longitudinal_observations <- study_data$obs_data