Title: Mixture-of-Experts Wishart Models for Covariance Data
Version: 1.0
Description: Methods for maximum likelihood and Bayesian estimation for the Wishart mixture model and the mixture-of-experts Wishart (MoE-Wishart) model. The package provides four inference algorithms for these models, each implemented using the expectation–maximization (EM) algorithm for maximum likelihood estimation and a fully Bayesian approach via Gibbs-within-Metropolis–Hastings sampling.
Maintainer: Zhi Zhao <zhi.zhao@medisin.uio.no>
URL: https://github.com/zhizuio/moewishart
BugReports: https://github.com/zhizuio/moewishart/issues
License: GPL-3
VignetteBuilder: knitr
Depends: R (≥ 4.1.0)
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: utils, stats, loo
Suggests: rmarkdown, knitr
NeedsCompilation: no
Packaged: 2026-02-16 10:10:46 UTC; zhiz
Author: The Tien Mai [aut], Zhi Zhao [aut, cre]
Repository: CRAN
Date/Publication: 2026-02-19 19:50:02 UTC

Information criteria for Wishart mixtures and MoE models

Description

Compute AIC, BIC, and ICL for EM fits; and PSIS-LOO expected log predictive density (elpd_loo) for Bayesian fits. Supports mixturewishart (finite mixture) and moewishart (MoE with covariates in gating).

Usage

computeIC(fit)

Arguments

fit

A fitted object returned by mixturewishart() or moewishart().

Details

For EM fits:

Parameter counting k:

For Bayesian fits:

Notes:

Value

- For method="em": a list with fields AIC, BIC, ICL. - For method="bayes": a list with fields ICL and elpd of class '"loo"' as returned by [loo::loo()] that contains fields 'estimates', 'pointwise', 'diagnostics'

Examples


# Bayesian example (MoE)

# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
  Xq = 3, K = 3, betas = betas,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)
set.seed(123)
fit <- moewishart(
  dat$S,
  X = cbind(1, dat$X), K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
  mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
  niter = 100, burnin = 50
)
computeIC(fit)


density of Wishart distribution

Description

Compute the (log) density of a p-dimensional Wishart distribution W_p(\nu, \Sigma) at an SPD matrix S. Returns either the log-density or the density depending on logarithm.

Usage

dWishart(S, nu, Sigma, detS_val = NULL, logarithm = TRUE)

Arguments

S

Numeric p \times p SPD matrix at which to evaluate the density.

nu

Numeric. Degrees of freedom \nu (must exceed p-1).

Sigma

Numeric p \times p SPD scale matrix \Sigma.

detS_val

Optional numeric. Precomputed \log|S| to reuse; if NULL, it is computed internally.

logarithm

Logical. If TRUE, return log-density; otherwise return density.

Details

Let S \sim W_p(\nu, \Sigma) with degrees of freedom \nu and scale matrix \Sigma (SPD). The density is:

f(S \mid \nu, \Sigma) = \frac{|S|^{(\nu - p - 1)/2}\, \exp\{-\tfrac{1}{2}\,\mathrm{tr}(\Sigma^{-1}S)\}} {2^{\nu p/2}\,|\Sigma|^{\nu/2}\,\Gamma_p(\nu/2)},

where \Gamma_p(\cdot) is the multivariate gamma function and p is the dimension.

Note that (i) detS_val can be supplied to avoid recomputing \log|S|, which is useful inside EM/MCMC loops, and (ii) small diagonal jitter is added internally to S and \Sigma when computing determinants or solves for numerical stability.

Constraints: (i) S and \Sigma must be SPD, and (ii) the Wishart requires \nu > p - 1.

Value

A numeric scalar: the log-density if logarithm = TRUE, otherwise the density.

Examples


set.seed(123)
p <- 3
# Construct an SPD Sigma
A <- matrix(rnorm(p * p), p, p)
Sigma <- crossprod(A) + diag(p) * 0.5
# Draw a Wishart matrix using base stats::rWishart()
W <- drop(rWishart(1, df = p + 5, Sigma = Sigma))
# Evaluate log-density at W
dWishart(W, nu = p + 5, Sigma = Sigma, logarithm = TRUE)


Log multivariate gamma function

Description

Compute the log of the multivariate gamma function \log \Gamma_p(a) for dimension p and parameter a.

Usage

lmvgamma(a, p)

Arguments

a

Numeric. Argument of \Gamma_p(\cdot) (often \nu/2 in Wishart contexts).

p

Integer. Dimension p of the multivariate gamma.

Details

The multivariate gamma function \Gamma_p(a) is defined by:

\Gamma_p(a) = \pi^{\,p(p-1)/4} \prod_{j=1}^{p} \Gamma\!\left(a + \frac{1-j}{2}\right).

Constraints: (i) p \in \{1,2,\dots\} (positive integer), and (ii) a > (p-1)/2 to keep all gamma terms finite (as in the Wishart normalization constant).

Value

A numeric scalar equal to \log \Gamma_p(a).

Examples


# Dimension
p <- 3
# Evaluate log multivariate gamma at a = nu/2
nu <- p + 5
lmvgamma(a = nu / 2, p = p)


EM/Bayesian estimation for Wishart mixture model

Description

Fit finite mixtures of Wishart-distributed SPD matrices using either a Bayesian sampler or the EM algorithm. The input S_list is a list of p \times p SPD matrices. Under component k, S_i \mid z_i=k \sim W_p(\nu_k, \Sigma_k) with degrees of freedom \nu_k and SPD scale matrix \Sigma_k. Mixture weights \pi_k sum to 1.

Usage

mixturewishart(
  S_list,
  K,
  niter = 3000,
  burnin = 1000,
  method = "bayes",
  thin = 1,
  alpha = NULL,
  nu0 = NULL,
  Psi0 = NULL,
  init_pi = NULL,
  init_nu = NULL,
  init_Sigma = NULL,
  marginal.z = TRUE,
  estimate_nu = TRUE,
  nu_prior_a = 2,
  nu_prior_b = 0.1,
  mh_sigma = 1,
  n_restarts = 3,
  restart_iters = 20,
  tol = 1e-06,
  verbose = TRUE
)

Arguments

S_list

List of length n of SPD matrices, each p \times p. These are the observed matrices modeled by a mixture of Wisharts.

K

Integer. Number of mixture components.

niter

Integer. Total iterations. Bayesian mode: total MCMC iterations (including burn-in). EM mode: maximum EM iterations (alias to maxiter).

burnin

Integer. Number of burn-in iterations (Bayesian mode).

method

Character; one of c("bayes","em"). Selects sampler or optimizer.

thin

Integer. Thinning interval for saving draws (Bayesian).

alpha

Numeric vector length K (Dirichlet prior on \pi) or NULL to default to rep(1, K) (Bayesian).

nu0

Numeric. Inverse-Wishart prior df for \Sigma_k (Bayesian). Default: p + 2.

Psi0

Numeric p \times p SPD matrix. Inverse-Wishart prior scale for \Sigma_k (Bayesian). Default: diag(p).

init_pi

Optional numeric vector length K summing to 1. EM initialization for mixture weights. If NULL, random or data-driven initialization is used.

init_nu

Optional numeric vector length K of initial degrees of freedom. Used in both modes.

init_Sigma

Optional list of K SPD matrices (each p \times p). EM initialization for \Sigma_k.

marginal.z

Logical. If TRUE, integrates out \pi when sampling z (collapsed step) in Bayesian mode. If FALSE, samples z conditional on current \pi.

estimate_nu

Logical. If TRUE, estimate/update \nu_k (MH in Bayesian mode; Newton/EM in EM). If FALSE, \nu_k are fixed.

nu_prior_a

Numeric. Prior hyperparameter a for \nu_k (Bayesian), used when estimate_nu = TRUE.

nu_prior_b

Numeric. Prior hyperparameter b for \nu_k (Bayesian), used when estimate_nu = TRUE.

mh_sigma

Numeric scalar or length-K vector. Proposal sd for MH updates on \log(\nu_k) (Bayesian, when estimating \nu).

n_restarts

Integer. Number of random restarts for EM. Ignored in Bayesian mode.

restart_iters

Integer. Number of short EM iterations per restart used to select a good initialization. Ignored in Bayesian mode.

tol

Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally elsewhere.

verbose

Logical. If TRUE, print progress information.

Details

Mixture mixture model: p(S_i) = \sum_{k=1}^K \pi_k \, f_W(S_i \mid \nu_k, \Sigma_k).

Algorithms:

  1. method = "bayes": Samples latent labels z, weights \pi, component scales \Sigma_k, and optionally \nu_k. Uses a Dirichlet prior for \pi, inverse- Wishart prior for \Sigma_k, and a prior on \nu_k when estimate_nu = TRUE. Degrees-of-freedom are updated via MH on \log(\nu_k) with proposal sd mh_sigma. Can integrate out \pi when sampling z if marginal.z = TRUE.

  2. method = "em": Maximizes the observed-data log- likelihood via EM. The E-step computes responsibilities via Wishart log-densities. The M-step updates \pi_k and \Sigma_k; optionally updates \nu_k when estimate_nu = TRUE. Supports multiple random restarts.

Note that (i) All matrices in S_list must be SPD. Small ridge terms may be added internally for stability, and (ii) Multiple EM restarts are recommended for robustness on difficult datasets.

Value

A list whose structure depends on method:

Examples


# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
dat <- simData(n, p,
  K = 3,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)

set.seed(123)
fit <- mixturewishart(
  dat$S,
  K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # tune this for MH acceptance 20-40%
  niter = 100, burnin = 50
)

# Posterior means for degrees of freedom of Wishart distributions:
nu_mcmc <- fit$nu[-c(1:fit$burnin), ]
colMeans(nu_mcmc)


EM/Bayesian estimation for Wishart MoE model

Description

Fit a mixture-of-experts model for symmetric positive-definite (SPD) matrices with covariate-dependent mixing proportions (gating network). Components are Wishart-distributed. Supports Bayesian sampling and EM-based maximum-likelihood estimation.

Usage

moewishart(
  S_list,
  X,
  K,
  niter = 3000,
  burnin = 1000,
  method = "bayes",
  thin = 1,
  nu0 = NULL,
  Psi0 = NULL,
  init_nu = NULL,
  estimate_nu = TRUE,
  nu_prior_a = 2,
  nu_prior_b = 0.1,
  mh_sigma = 0.1,
  mh_beta = 0.05,
  sigma_beta = 10,
  init = NULL,
  tol = 1e-06,
  ridge = 1e-08,
  verbose = TRUE
)

Arguments

S_list

List of length n of SPD matrices, each p \times p. These are the observed responses modeled by the MoE.

X

Numeric matrix n \times q of covariates for the gating network. Include an intercept column if desired.

K

Integer. Number of mixture components (experts).

niter

Integer. Total iterations. Bayesian mode: total MCMC iterations (including burn-in). EM mode: maximum EM iterations.

burnin

Integer. Number of burn-in iterations (Bayesian mode).

method

Character; one of c("bayes", "em"). Selects sampler or optimizer.

thin

Integer. Thinning interval for saving draws (Bayesian).

nu0

Numeric. Inverse-Wishart prior df for \Sigma_k (Bayesian). Default: p + 2 if NULL.

Psi0

Numeric p \times p SPD matrix. Inverse-Wishart prior scale for \Sigma_k (Bayesian). Default: diag(p) if NULL.

init_nu

Optional numeric vector length K of initial dfs \nu_k. Used for initialization.

estimate_nu

Logical. If TRUE, estimate \nu_k (MH in Bayesian; Newton/EM in EM). If FALSE, keep \nu_k fixed at init_nu.

nu_prior_a

Numeric. Prior hyperparameter a for \nu_k (Bayesian), used when estimate_nu = TRUE.

nu_prior_b

Numeric. Prior hyperparameter b for \nu_k (Bayesian), used when estimate_nu = TRUE.

mh_sigma

Numeric scalar or length-K vector. Proposal sd for MH updates on \log(\nu_k) (Bayesian, when estimating \nu).

mh_beta

Numeric scalar or length-K-1 vector. Proposal sd for MH updates of the free B columns (Bayesian).

sigma_beta

Numeric. Prior sd of the Gaussian prior on B (Bayesian).

init

Optional list with fields for EM initialization, e.g., beta, Sigma, nu. See return structure.

tol

Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally.

ridge

Numeric. Small diagonal ridge added to \Sigma_k updates in EM for numerical stability.

verbose

Logical. If TRUE, print progress information.

Details

MoE-Wishart Model:

Algorithms:

  1. Bayesian (method = "bayes"): Metropolis-within-Gibbs sampler for z, \Sigma_k, optional \nu_k, and B. Gaussian priors on B with sd sigma_beta. Proposals use mh_sigma for \log(\nu_k) and mh_beta for B.

  2. EM (method = "em"): E-step responsibilities using Wishart log-densities and softmax gating. M-step updates \Sigma_k, optional \nu_k, and B via weighted multinomial logistic regression (BFGS).

Note that: (i) include an intercept column in X; none is added by default, and (ii) all S_list elements must be SPD. A small ridge may be added for stability.

Value

A list whose fields depend on method:

Examples


# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
  Xq = 3, K = 3, betas = betas,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)

set.seed(123)
fit <- moewishart(
  dat$S,
  X = cbind(1, dat$X), K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
  mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
  niter = 500, burnin = 200
)

# Posterior means for degrees of freedom of Wishart distributions:
nu_mcmc <- fit$nu[-c(1:fit$burnin), ]
colMeans(nu_mcmc)


Dirichlet random sampling

Description

Generate random draws from a Dirichlet distribution with parameter vector \alpha \in \mathbb{R}_+^K. Each draw is a length-K probability vector on the simplex.

Usage

rdirichlet(n, alpha)

Arguments

n

Integer. Number of independent Dirichlet draws to generate.

alpha

Numeric vector of positive concentration parameters \alpha = (\alpha_1,\dots,\alpha_K). Its length K defines the dimension of the simplex.

Details

Definition: If Y_k \sim \mathrm{Gamma}(\alpha_k, 1) independently for k=1,\dots,K (shape \alpha_k, rate 1), then the normalized vector X_k = Y_k / \sum_{j=1}^K Y_j follows \mathrm{Dirichlet}(\alpha).

Note that alpha must be a numeric vector with strictly positive entries.

Value

A numeric matrix of size n \times K, where each row is an independent Dirichlet draw that sums to 1.

Examples


set.seed(123)
# 3-dimensional Dirichlet with asymmetric concentration
alpha <- c(2, 5, 3)
rdirichlet(5, alpha)


Fast sampler for the inverse-Wishart distribution

Description

Draw a random sample from an inverse-Wishart distribution \mathcal{IW}_p(\nu, \Psi) using the identity S \sim \mathcal{IW}_p(\nu, \Psi) iff S^{-1} \sim W_p(\nu, \Psi^{-1}). This implementation accepts \Psi^{-1} directly for speed.

Usage

sampleIW(nu, Psi_inv)

Arguments

nu

Numeric. Degrees of freedom \nu of the inverse-Wishart (must exceed p - 1).

Psi_inv

Numeric p \times p SPD matrix equal to \Psi^{-1}, the inverse of the inverse-Wishart scale matrix.

Details

Sampling scheme:

Parameterization:

Note that: (i) internally calls rWishart(1, df = nu, Sigma = Psi_inv), and (ii) returns solve(W); if numerical issues arise, consider adding a small ridge to \Psi^{-1} prior to sampling.

Value

A p \times p SPD matrix S distributed as \mathcal{IW}_p(\nu, \Psi).

Examples


set.seed(123)
p <- 3
# Construct an SPD scale matrix Psi
A <- matrix(rnorm(p * p), p, p)
Psi <- crossprod(A) + diag(p) * 0.5
Psi_inv <- solve(Psi)

# Draw one inverse-Wishart sample
S <- sampleIW(nu = p + 5, Psi_inv = Psi_inv)
S


Simulate data from a Wishart mixture or mixture-of-experts model

Description

Generate synthetic SPD matrices from either: (i) a finite mixture of Wishart components with fixed mixing proportions, or (ii) a mixture-of-experts (MoE) where mixing proportions depend on covariates via a softmax gating model.

Usage

simData(
  n = 200,
  p = 2,
  Xq = 0,
  K = NA,
  betas = NULL,
  pis = c(0.4, 0.6),
  nus = c(8, 12),
  Sigma = NULL
)

Arguments

n

Integer. Number of observations to simulate.

p

Integer. Dimension of the Wishart distribution (matrix size p \times p).

Xq

Integer. Number of covariates for the gating network (MoE case). If Xq = 0, a standard mixture (no covariates) is simulated.

K

Integer. Number of latent components. Required when Xq > 0. If Xq = 0, defaults to length(pis).

betas

Numeric matrix Xq \times K of gating coefficients used when Xq > 0. If NULL, random coefficients are generated and the last column is set to zero (reference class).

pis

Numeric vector of length K giving fixed mixture proportions when Xq = 0. Ignored when Xq > 0.

nus

Numeric vector length K, degrees of freedom \nu_k for each component (must exceed p - 1).

Sigma

Optional list length K of SPD scale matrices \Sigma_k (each p \times p). If NULL, defaults are generated based on K and p.

Details

Models:

Simulation steps:

  1. Construct pis:

    • If Xq = 0, replicate the provided pis over n rows.

    • If Xq > 0, generate X ~ N(0, I) and compute softmax probabilities using betas (last column set to zero by default identifiability).

  2. If Sigma is not provided, create default \Sigma_k matrices (SPD) depending on K and p.

  3. Sample labels z_i \sim \mathrm{Categorical}(\pi_i).

  4. Draw S_i from W_p(\nu_{z_i}, \Sigma_{z_i}) via rWishart.

Note that: (i) in the MoE case, no intercept is automatically added to X. Use Xq to include desired covariates; the default betas is randomly generated with betas[, K] = 0, and (ii) provided Sigma must be a list of SPD p \times p matrices. Provided nus must exceed p - 1.

Value

A list with the following elements:

Examples


# simulate data from mixture model (no covariates)
set.seed(123)
n <- 200 # subjects
p <- 10
dat <- simData(n, p,
  K = 3,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 12, 3)
)
str(dat)