| Type: | Package |
| Title: | Deriving Clinically Interpretable Cutoffs for Disease Biomarkers |
| Version: | 0.1.0 |
| Maintainer: | Bhrigu Kumar Rajbongshi <kumarbhrigu536@gmail.com> |
| Description: | Provides a reproducible pipeline for deriving two clinically meaningful cutoffs for disease biomarkers using a unified two-stage framework. The package integrates finite mixture modeling with risk prediction using biomarker plus clinical features, followed by decision curve analysis to evaluate clinical utility. Outputs include biomarker density plots, risk calibration curves, decision curves, and summary tables of diagnostic performance. Designed for researchers in bio-statistics, neurology, and data science, this package emphasizes reproducibility, transparency, and clear clinical relevance. |
| Imports: | dplyr, mixtools, stats, pROC, xgboost, ggplot2, caret, patchwork, gridExtra |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| LazyData: | true |
| Depends: | R (≥ 4.0.0) |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Suggests: | rmarkdown, DiagrammeR, knitr, devtools |
| VignetteBuilder: | knitr |
| Author: | Bhrigu Kumar Rajbongshi [aut, cre, ctb], Seyed Ehsan Saffari [aut, ctb], Nastaran Marzban [aut, ctb] |
| Packaged: | 2026-06-20 14:38:03 UTC; bhrigu |
| Repository: | CRAN |
| Date/Publication: | 2026-06-24 10:00:13 UTC |
Adjust biomarker score with confounder-adjusted risk model
Description
Function for biomarker score with confounder-adjusted risk model
Usage
adjust_score(
data_set,
biomarker,
outcome,
confounders = NULL,
k = 2,
maxit = 1000,
epsilon = 1e-08,
CompCase = FALSE,
xgb_params = NULL,
validate = FALSE,
test_size = 0.2,
seed = 123
)
Arguments
data_set |
Data frame containing biomarker, outcome, and confounders. |
biomarker |
Character, name of biomarker column. |
outcome |
Character, name of binary outcome column (0/1). |
confounders |
Character vector of confounder column names (default = NULL). |
k |
Integer, number of mixture components (default = 2). |
maxit |
Integer, maximum EM iterations. |
epsilon |
Numeric, convergence tolerance for EM. |
CompCase |
Logical, whether to perform complete case analysis (default = FALSE). |
xgb_params |
Optional list of hyperparameters passed to |
validate |
Logical, whether to perform a train/test split. If TRUE, model performance is evaluated on a held-out test set. Validation is only performed when confounders are provided and an xgboost model is fitted. |
test_size |
Numeric, proportion of data used for the test set.
Only used when |
seed |
Integer, random seed for reproducibility of the train/test split. |
Details
This function implements a two-stage modeling pipeline:
Stage 1: Finite Mixture Model (FMM)
Fits a 2-component Gaussian mixture model to the biomarker distribution using the Expectation-Maximization (EM) algorithm (up to
maxititerations, toleranceepsilon).Estimates component means (
\mu_1, \mu_2), standard deviations (\sigma_1, \sigma_2), and mixing proportions (\pi).Computes the posterior probability of disease for each subject:
\hat{p}(x) = \frac{\hat{\pi} \cdot N(x|\hat{\mu}_2,\hat{\sigma}_2)}{\hat{\pi} \cdot N(x|\hat{\mu}_2,\hat{\sigma}_2) + (1-\hat{\pi}) \cdot N(x|\hat{\mu}_1,\hat{\sigma}_1)}where
N(x|\mu,\sigma)is the Gaussian density.The resulting
p_diseaserepresents the latent probability of disease given the biomarker value.
Stage 2: Confounder-Adjusted Risk Model (xgboost)
Trains a binary logistic xgboost model using predictors:
p_disease(from FMM) plus specified confounders.Model complexity (number of boosting iterations) is selected via cross-validation with early stopping.
Produces an
adjusted_riskscore in [0,1] for each subject, representing the confounder-adjusted probability of disease.
The disease component is identified using which.max(mu), assuming that
higher biomarker values correspond to disease. If this assumption does not
hold for your dataset, results may be misleading. A warning is issued if the assumption may not hold.
Value
A list containing:
Mixture parameters (
mu,sigma,lambda).Fitted xgboost model object.
Training AUC (
train_auc).Cross-validated AUC from xgboost CV (
cv_auc).Test/validation AUC on held-out data (
test_auc, only ifvalidate = TRUE).Augmented dataset with columns:
p_disease(posterior probability from FMM) andadjusted_risk(confounder-adjusted risk score).
Examples
n <- 200
set.seed(123)
# outcome
DISEASE <- rbinom(n, 1, 0.35)
# biomarker depends on disease
PTAU <- rnorm(n, mean = 25 + 10 * DISEASE, sd = 8)
# confounders (optional noise)
AGE <- round(rnorm(n, 73, 7))
SEX <- sample(c(0, 1), n, replace = TRUE)
BMI <- rnorm(n, 25, 4)
HTN <- rbinom(n, 1, 0.3)
DM <- rbinom(n, 1, 0.2)
df <- data.frame(PTAU, AGE, SEX, BMI, HTN, DM, DISEASE)
res <- adjust_score(
df,
biomarker = "PTAU", outcome = "DISEASE",
confounders = c("AGE","SEX","BMI","HTN","DM"), validate = TRUE)
head(res$data[,c("PTAU","p_disease","adjusted_risk")])
Alzheimer’s Biomarker Dataset
Description
This dataset contains simulated patient-level data for Alzheimer’s disease biomarker analysis. It includes demographic variables, clinical covariates, biomarker values, and disease status. The dataset is suitable for testing finite mixture models, cutoff derivation, and performance evaluation.
Usage
data(alzheimer_data)
Format
A data frame with 500 rows and 7 variables:
- PTAU
numeric: Plasma p-tau biomarker concentration (pg/mL).- AGE
numeric: Age in years (approx. 57-96).- SEX
integer: Sex (0 = Female, 1 = Male).- BMI
numeric: Body Mass Index (kg/m^2).- HTN
integer: Hypertension status (0 = no, 1 = yes).- DM
integer: Diabetes status (0 = no, 1 = yes).- DISEASE
integer: Alzheimer’s disease status (0 = no disease, 1 = disease).
Details
The dataset mimics typical biomarker distributions observed in Alzheimer’s cohorts. Covariates (AGE, SEX, BMI, HTN, DM) act as confounders in biomarker-disease associations. Useful for demonstrating the two-cutoff approach in biomarker classification.
Compare cutoff methods
Description
Compares rule-out and rule-in cutoffs, classification results, and performance metrics across multiple cutoff methods.
Usage
compare_performance(percentile_result, sensspec_result, outcome)
Arguments
percentile_result |
Output list from |
sensspec_result |
Output list from |
outcome |
Character, name of binary disease outcome column (0/1). |
Value
Data frame summarizing cutoffs and performance metrics for each method.
Examples
n <- 200
set.seed(123)
# outcome
DISEASE <- rbinom(n, 1, 0.35)
# biomarker depends on disease
PTAU <- rnorm(n, mean = 25 + 10 * DISEASE, sd = 8)
# confounders (optional noise)
AGE <- round(rnorm(n, 73, 7))
SEX <- sample(c(0,1), n, replace = TRUE)
BMI <- rnorm(n, 25, 4)
HTN <- rbinom(n, 1, 0.3)
DM <- rbinom(n, 1, 0.2)
df <- data.frame(PTAU, AGE, SEX, BMI, HTN, DM, DISEASE)
res <- adjust_score(
df,
biomarker = "PTAU",
outcome = "DISEASE",
confounders = c("AGE","SEX","BMI","HTN","DM")
)
cutoffs_p <- derive_cutoffs_percentile(res, biomarker = "PTAU")
cutoffs_r <- derive_cutoffs_sensspec(res, biomarker = "PTAU", outcome = "DISEASE")
compare_performance(cutoffs_p, cutoffs_r, outcome = "DISEASE")
Decision Curve Analysis (DCA)
Description
This function evaluates the clinical utility of the confounder-adjusted risk model by quantifying net benefit across a range of decision thresholds and comparing against two reference strategies: treating all patients or treating none.
Usage
dca_analysis(
adjust_result,
outcome,
thresholds = seq(0.01, 0.6, by = 0.01),
risk_col = "adjusted_risk",
ylim_lower = NULL
)
Arguments
adjust_result |
Output list from |
outcome |
Character, name of binary disease outcome column (0/1). |
thresholds |
Numeric vector of risk thresholds (default = 0.01 to 0.60 by 0.01). |
risk_col |
Character, name of the risk score column to use (default = "adjusted_risk"). Allows generalisation beyond the default adjusted risk output. |
ylim_lower |
Numeric, lower bound for the y-axis in the decision
curve plot. If |
Details
Concept
DCA is a method to assess whether a predictive model improves clinical decision-making compared to default strategies.
Net benefit (NB) balances true positives against false positives, weighted by the risk threshold:
NB(t) = \frac{TP(t)}{N} - \frac{FP(t)}{N} \cdot \frac{t}{1-t}where
TP(t)andFP(t)are counts at thresholdt, andNis the total sample size.Interpretation: The model is clinically useful at thresholds where
NB(model) > \max(NB(treat\ all), NB(treat\ none)).
Implementation
Input: adjusted risk scores from
adjust_score()and true disease labels.Thresholds: defaults to 0.01-0.60 in steps of 0.01 (
1\%to60\%risk).Strategies compared:
Model: net benefit using adjusted risk predictions.
Treat All: assume all subjects are positive.
Treat None: assume all subjects are negative.
Output: a data frame of net benefit values across thresholds and a ggplot decision curve showing the three strategies.
Value
A list with:
-
data: Data frame of net benefit values for each strategy across thresholds. -
plot: ggplot object showing the decision curve.
Examples
n <- 200
set.seed(123)
# outcome
DISEASE <- rbinom(n, 1, 0.35)
# biomarker depends on disease
PTAU <- rnorm(n, mean = 25 + 10 * DISEASE, sd = 8)
# confounders (optional noise)
AGE <- round(rnorm(n, 73, 7))
SEX <- sample(c(0, 1), n, replace = TRUE)
BMI <- rnorm(n, 25, 4)
HTN <- rbinom(n, 1, 0.3)
DM <- rbinom(n, 1, 0.2)
df <- data.frame(PTAU, AGE, SEX, BMI, HTN, DM, DISEASE)
res <- adjust_score(df,
biomarker = "PTAU",
outcome = "DISEASE",
confounders = c("AGE","SEX","BMI","HTN","DM"))
dca_out <- dca_analysis(res, outcome = "DISEASE")
head(dca_out$data)
print(dca_out$plot)
Derive percentile-based biomarker cutoffs using adjusted risk
Description
Implements a two-cutoff approach to classify subjects into rule-out, rule-in, or indeterminate zones using confounder-adjusted risk scores.
Usage
derive_cutoffs_percentile(
adjust_result,
biomarker,
low_thresh = 0.05,
high_thresh = 0.7,
low_pct = 0.8,
high_pct = 0.2
)
Arguments
adjust_result |
Output list from |
biomarker |
Character, name of biomarker column. |
low_thresh |
Numeric, threshold for defining the low-risk group (default = 0.05). Very small datasets may produce too few low-risk subjects at extreme thresholds, triggering fallback cutoffs. |
high_thresh |
Numeric, threshold for defining the high-risk group (default = 0.70). Very small datasets may produce too few high-risk subjects at extreme thresholds, triggering fallback cutoffs. |
low_pct |
Numeric, percentile used to derive the rule-out cutoff within the low-risk group (default = 0.80). |
high_pct |
Numeric, percentile used to derive the rule-in cutoff within the high-risk group (default = 0.20). |
Details
Method
The function uses
adjusted_risk(fromadjust_score()) to define clinically meaningful risk groups.Low-risk group: subjects with adjusted risk <
low_thresh(default 0.05), representing individuals unlikely to have disease. The rule-out cutoff (c_L) is defined as the 80th percentile of biomarker values within this group.High-risk group: subjects with adjusted risk greater than
high_thresh(default 0.70), representing individuals likely to have disease. The rule-in cutoff (c_H) is defined as the 20th percentile of biomarker values within this group.If either group is too small, fallback cutoffs are used based on the global biomarker distribution (5th and 95th percentiles).
Final classification: biomarker <
c_L"Negative"; biomarker >c_H"Positive"; otherwise "Indeterminate".
Value
List with rule-out cutoff (c_L), rule-in cutoff (c_H),
classification vector, and augmented dataset.
Author(s)
Bhrigu Kumar Rajbongshi, Seyed Ehsan Saffari and Nastaran Marzban
References
Giacomucci, G., Crucitti, C., Ingannato, A., et al. (2025). The two cut-offs approach for plasma p-tau217 in detecting Alzheimer's disease. Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, 17:e70116.
Hazan, J., Liu, K.Y., Isaacs, J.D., Howard, R. (2025). Cut-points and gray zones: The challenges of integrating Alzheimer's disease plasma biomarkers into clinical practice. Alzheimer's & Dementia, 21:e70113.
Examples
n <- 500
set.seed(123)
# outcome
DISEASE <- rbinom(n, 1, 0.35)
# biomarker depends on disease
PTAU <- rnorm(n, mean = 25 + 10 * DISEASE, sd = 8)
# confounders (optional noise)
AGE <- round(rnorm(n, 73, 7))
SEX <- sample(c(0, 1), n, replace = TRUE)
BMI <- rnorm(n, 25, 4)
HTN <- rbinom(n, 1, 0.3)
DM <- rbinom(n, 1, 0.2)
df <- data.frame(PTAU, AGE, SEX, BMI, HTN, DM, DISEASE)
res <- adjust_score(
df,
biomarker = "PTAU",
outcome = "DISEASE",
confounders = c("AGE","SEX","BMI","HTN","DM")
)
cutoffs <- derive_cutoffs_percentile(res, biomarker = "PTAU")
cutoffs$c_L; cutoffs$c_H
table(cutoffs$class)
Derive ROC-based biomarker cutoffs using adjusted risk
Description
Implements a ROC-guided two-cutoff approach to classify subjects into rule-out, rule-in, or indeterminate zones using confounder-adjusted risk scores.
Usage
derive_cutoffs_sensspec(
adjust_result,
biomarker,
outcome,
sens_target = 0.8,
spec_target = 0.8,
loess_span = 0.35
)
Arguments
adjust_result |
Output list from |
biomarker |
Character, name of biomarker column. |
outcome |
Character, name of binary disease outcome column (0/1).
If missing, the outcome stored in |
sens_target |
Numeric, target sensitivity for rule-out threshold (default = 0.80). |
spec_target |
Numeric, target specificity for rule-in threshold (default = 0.80). |
loess_span |
Numeric, smoothing parameter for LOESS regression (default = 0.35). Smaller values give more flexibility, larger values give smoother fits. |
Details
Method
Compute ROC curve with predictor =
adjusted_riskand outcome = true disease status.Identify thresholds:
Rule-out threshold (
T_L): lowest risk threshold with sensitivity\geqsens_target(default 0.80).Rule-in threshold (
T_H): highest risk threshold with specificity\geqspec_target(default 0.80).
Map thresholds back to biomarker scale by fitting a LOESS regression (
adjusted_risk~ biomarker, default span = 0.35 (user-adjustable via loess_span)) and interpolating biomarker values atT_LandT_H.Classification:
biomarker <
c_L= "Negative" (rule-out).biomarker >
c_H= "Positive" (rule-in).otherwise = "Indeterminate".
Value
List with rule-out cutoff (c_L), rule-in cutoff (c_H),
classification vector, and augmented dataset.
Author(s)
Bhrigu Kumar Rajbongshi, Seyed Ehsan Saffari and Nastaran Marzban
References
Giacomucci, G., Crucitti, C., Ingannato, A., et al. (2025). The two cut-offs approach for plasma p-tau217 in detecting Alzheimer's disease. Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, 17:e70116.
Hazan, J., Liu, K.Y., Isaacs, J.D., Howard, R. (2025). Cut-points and gray zones: The challenges of integrating Alzheimer's disease plasma biomarkers into clinical practice. Alzheimer's & Dementia, 21:e70113.
Examples
n <- 500
set.seed(123)
# outcome
DISEASE <- rbinom(n, 1, 0.35)
# biomarker depends on disease
PTAU <- rnorm(n, mean = 25 + 10 * DISEASE, sd = 8)
# confounders (optional noise)
AGE <- round(rnorm(n, 73, 7))
SEX <- sample(c(0,1), n, replace = TRUE)
BMI <- rnorm(n, 25, 4)
HTN <- rbinom(n, 1, 0.3)
DM <- rbinom(n, 1, 0.2)
df <- data.frame(PTAU, AGE, SEX, BMI, HTN, DM, DISEASE)
res <- adjust_score(
df,
biomarker = "PTAU",
outcome = "DISEASE",
confounders = c("AGE","SEX","BMI","HTN","DM")
)
cutoffs <- derive_cutoffs_sensspec(res, biomarker = "PTAU", outcome = "DISEASE")
cutoffs$c_L; cutoffs$c_H
table(cutoffs$class)
Evaluate classification performance
Description
Computes diagnostic performance metrics based on classification results from cutoff functions (percentile-based or ROC-based).
Usage
evaluate_performance(cutoff_result, outcome)
Arguments
cutoff_result |
Output list from |
outcome |
Character, name of binary disease outcome column (0/1). |
Details
The function uses the classification vector returned by
derive_cutoffs_percentile() or derive_cutoffs_sensspec()
and compares it against the true disease labels to compute:
-
NPV (Negative Predictive Value): TN / (TN + FN), proportion of true negatives among predicted negatives.
-
PPV (Positive Predictive Value): TP / (TP + FP), proportion of true positives among predicted positives.
-
Overall Sensitivity: TP / (TP + FN) across the full cohort.
-
Overall Specificity: TN / (TN + FP) across the full cohort.
-
% in Middle Zone: Proportion of subjects classified as indeterminate (those with biomarker values between c_L and c_H).
Indeterminate subjects (those with biomarker values between the rule-out
cutoff c_L and the rule-in cutoff c_H) are excluded from the
sensitivity and specificity calculations. They do not contribute to the
counts of true positives, true negatives, false positives, or false negatives.
This design is intentional to reflect clinical practice, where patients in
the gray zone are not classified as definitively diseased or non-diseased.
As a result, sensitivity and specificity are computed only on subjects
outside the indeterminate zone, while the proportion of indeterminate cases
is reported separately.
Value
Data frame with performance metrics.
Examples
n <- 500
set.seed(123)
# outcome
DISEASE <- rbinom(n, 1, 0.35)
# biomarker depends on disease
PTAU <- rnorm(n, mean = 25 + 10 * DISEASE, sd = 8)
# confounders (optional noise)
AGE <- round(rnorm(n, 73, 7))
SEX <- sample(c(0, 1), n, replace = TRUE)
BMI <- rnorm(n, 25, 4)
HTN <- rbinom(n, 1, 0.3)
DM <- rbinom(n, 1, 0.2)
df <- data.frame(PTAU, AGE, SEX, BMI, HTN, DM, DISEASE)
res <- adjust_score(
df,
biomarker = "PTAU",
outcome = "DISEASE",
confounders = c("AGE","SEX","BMI","HTN","DM"))
cutoffs <- derive_cutoffs_percentile(res, biomarker = "PTAU")
# cutoffs <- derive_cutoffs_sensspec(res, biomarker = "PTAU")
perf <- evaluate_performance(cutoffs, outcome = "DISEASE")
Plot two-cutoff classification
Description
Visualizes biomarker values with rule-out and rule-in cutoffs, using results from percentile-based or ROC-based cutoff functions.
Usage
plot_two_cutoff(
cutoff_result,
biomarker,
outcome,
panel_title = NULL,
negative_label = "Control",
positive_label = "Disease",
ylab = NULL,
add_table = NULL,
point_alpha = 0.55
)
Arguments
cutoff_result |
Output list from
|
biomarker |
Character, name of biomarker column. |
outcome |
Character, name of binary disease outcome column (0/1). |
panel_title |
Optional character vector of panel titles. |
negative_label |
Label for non-diseased group (default = "Control"). |
positive_label |
Label for diseased group (default = "Disease"). |
ylab |
Y-axis label (default = "Plasma biomarker"). |
add_table |
Logical, whether to add confusion table below plot (default = NULL). |
point_alpha |
Numeric, transparency for points (default = 0.55). |
Details
Subjects are classified as:
Negative (rule-out): biomarker <
c_LPositive (rule-in): biomarker >
c_HIndeterminate: between
c_Landc_H
The plot shows jittered biomarker values by true group, with horizontal
dashed lines indicating the rule-out (c_L) and rule-in (c_H) thresholds.
Points are colored by predicted classification.
Value
Confusion tables for each panel (invisible).
Author(s)
Bhrigu Kumar Rajbongshi, Seyed Ehsan Saffari and Nastaran Marzban
Examples
n <- 500
set.seed(123)
# outcome
DISEASE <- rbinom(n, 1, 0.35)
# biomarker depends on disease
PTAU <- rnorm(n, mean = 25 + 10 * DISEASE, sd = 8)
# confounders (optional noise)
AGE <- round(rnorm(n, 73, 7))
SEX <- sample(c(0, 1), n, replace = TRUE)
BMI <- rnorm(n, 25, 4)
HTN <- rbinom(n, 1, 0.3)
DM <- rbinom(n, 1, 0.2)
df <- data.frame(PTAU, AGE, SEX, BMI, HTN, DM, DISEASE)
res <- adjust_score(
df,
biomarker = "PTAU",
outcome = "DISEASE",
confounders = c("AGE","SEX","BMI","HTN","DM"))
cutoffs <- derive_cutoffs_percentile(res, biomarker = "PTAU")
plot_two_cutoff(cutoffs, biomarker = "PTAU", outcome = "DISEASE")
Print method for derive_cutoffs_percentile objects
Description
Provides a concise summary of percentile-based cutoff results.
Usage
## S3 method for class 'cutoff.percentile'
print(x, ...)
Arguments
x |
An object returned by |
... |
Additional arguments (ignored) |
Value
Returns print output for the mentioned object
Print method for derive_cutoffs_sensspec objects
Description
Provides a concise summary of ROC-based cutoff results.
Usage
## S3 method for class 'cutoff.sensspec'
print(x, ...)
Arguments
x |
An object returned by |
... |
Additional arguments (ignored). |
Value
Returns print output for the mentioned object
Print Method for twocutoff.adjscore Objects
Description
The print.twocutoff.adjscore function provides a concise summary of the output
from the adjust_score function. It displays mixture parameters, cutoff values,
and basic classification counts.
Usage
## S3 method for class 'twocutoff.adjscore'
print(x, ...)
Arguments
x |
An object of class |
... |
Additional arguments (not used). |
Value
The object is returned invisibly after printing a summary.