| Title: | Inference of Elliptical Distributions and Copulas |
|---|---|
| Description: | Provides functions for the simulation and the nonparametric estimation of elliptical distributions, meta-elliptical copulas and trans-elliptical distributions, following the article Derumigny and Fermanian (2022) <doi:10.1016/j.jmva.2022.104962>. |
| Authors: | Alexis Derumigny [aut, cre] (ORCID: <https://orcid.org/0000-0002-6163-8097>), Jean-David Fermanian [aut] (ORCID: <https://orcid.org/0000-0001-5960-5555>), Victor Ryan [aut], Rutger van der Spek [ctb] |
| Maintainer: | Alexis Derumigny <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.4.2 |
| Built: | 2026-05-31 07:55:52 UTC |
| Source: | https://github.com/alexisderumigny/elliptcopulas |
This function estimates the conditional covariance matrix
of a multivariate response variable given a conditioning variable
, using kernel smoothing. Three different estimators are provided.
CondCovEst( dataMatrix, observedZ, gridZ, h, Kernel = "epanechnikov", type = "grid_mean" )CondCovEst( dataMatrix, observedZ, gridZ, h, Kernel = "epanechnikov", type = "grid_mean" )
dataMatrix |
a matrix of size |
observedZ |
vector with |
gridZ |
vector of points |
h |
bandwidth of the kernel. |
Kernel |
name of the kernel. Possible choices are
|
type |
indicates which estimator to use. Possible choices are " |
The kernel weights are defined as
where is the chosen kernel and is the bandwidth.
The supported estimators are:
grid_mean"Covariance estimator using the conditional mean evaluated at the grid point:
obs_mean"Covariance estimator using the conditional mean evaluated at each observation:
pairwise"Pairwise covariance estimator:
Computational complexity:
grid_mean: O(n * d^2 * length(gridZ)).
obs_mean: O(n * d^2 * length(gridZ)).
pairwise: O(n^2 * d^2 * length(gridZ)).
An array of dimension ,
containing the estimated conditional covariance
matrices of the -dimensional random variable at each point of gridZ.
# Comparison between the estimated and true conditional covariance n = 10000 Z = runif(n, -2, 2) sigma12 = 0.3 * Z X1 = rnorm(n) X2 = sigma12 * X1 + sqrt(1 - sigma12^2) * rnorm(n) X = cbind(X1, X2) gridZ = seq(-2, 2, length.out = 50) h = 0.2 Sigma_est = CondCovEst(X, Z, gridZ, h, type = 1) cov_X1X2 = sapply(1:length(gridZ), function(i) Sigma_est[1,2,i]) true_cov = 0.3 * gridZ plot(gridZ, cov_X1X2, type = "l", col = "blue", lwd = 2, ylab = "Cov(X1,X2|Z)", xlab = "Z", ylim = range(c(cov_X1X2, true_cov))) lines(gridZ, true_cov, col = "red", lwd = 2, lty = 2) legend("topleft", legend = c("Estimated", "True"), col = c("blue", "red"), lty = c(1,2), lwd = 2)# Comparison between the estimated and true conditional covariance n = 10000 Z = runif(n, -2, 2) sigma12 = 0.3 * Z X1 = rnorm(n) X2 = sigma12 * X1 + sqrt(1 - sigma12^2) * rnorm(n) X = cbind(X1, X2) gridZ = seq(-2, 2, length.out = 50) h = 0.2 Sigma_est = CondCovEst(X, Z, gridZ, h, type = 1) cov_X1X2 = sapply(1:length(gridZ), function(i) Sigma_est[1,2,i]) true_cov = 0.3 * gridZ plot(gridZ, cov_X1X2, type = "l", col = "blue", lwd = 2, ylab = "Cov(X1,X2|Z)", xlab = "Z", ylim = range(c(cov_X1X2, true_cov))) lines(gridZ, true_cov, col = "red", lwd = 2, lty = 2) legend("topleft", legend = c("Estimated", "True"), col = c("blue", "red"), lty = c(1,2), lwd = 2)
This function estimates the conditional density generator of a
continuous elliptical distribution. For a -dimensional response
conditional on , the conditional density is of the form
where , , is the conditional mean, is the
conditional covariance matrix, and is the conditional density generator.
CondEllGenEst( dataMatrix, observedZ, mu, sigma, gridZ, grid, h, Kernel = "epanechnikov", a = 1, h_psi = NULL, mpfr = FALSE, precBits = 100, dopb = TRUE )CondEllGenEst( dataMatrix, observedZ, mu, sigma, gridZ, grid, h, Kernel = "epanechnikov", a = 1, h_psi = NULL, mpfr = FALSE, precBits = 100, dopb = TRUE )
dataMatrix |
a matrix of size |
observedZ |
vector with |
mu |
matrix of size |
sigma |
array of size |
gridZ |
vector of points |
grid |
vector of |
h |
bandwidth of the kernel for kernel smoothing over |
Kernel |
name of the kernel. Possible choices are
|
a |
tuning parameter controlling bias at |
h_psi |
optional bandwidth for kernel smoothing over
|
mpfr |
if |
precBits |
number of precBits used for floating point precision
(only used if |
dopb |
a Boolean value.
If |
Given a sample , the conditional density
generator at a point and covariate value is estimated by
where
and are tuning parameters, is a kernel
function, and are Nadaraya-Watson weights:
A matrix of size , containing the
estimated conditional density generator at each and .
Liebscher, E. (2005). A semiparametric density estimator based on elliptical distributions. Journal of Multivariate Analysis, 92(1), 205-222.
# Conditional generator with Z-dependent mean/covariance n = 5000 d = 3 Z = runif(n) data_X = matrix(NA, n, d) for (i in 1:n) { mean_i = rep(Z[i], d) sigma_i = matrix(c(Z[i], Z[i]/2, Z[i]/3, Z[i]/2, Z[i], Z[i]/2, Z[i]/3, Z[i]/2, Z[i]), nrow = d) data_X[i,] = EllDistrSim( n = 1, d = d, A = chol(sigma_i), mu = mean_i, density_R2 = function(x) { (2*pi)^(-d/2) * exp(-x/2) * (x > 0) } ) } h <- 1.06 * sd(Z) * n^(-1/5) gridZ <- c(0.2, 0.8) mu_est = CondMeanEst(data_X, Z, gridZ, h) Sigma_est = CondCovEst(data_X, Z, gridZ, h, type = "grid_mean") grid = seq(0, 8, length.out = 80) g_est = CondEllGenEst( dataMatrix = data_X, observedZ = Z, mu = mu_est, sigma = Sigma_est, gridZ = gridZ, grid = grid, h = h, Kernel = "epanechnikov", a = 1 ) g_true <- function(r){ (2*pi)^(-d/2) * exp(-r/2) } g_grid_true = g_true(grid) plot(grid, g_est[,1], type="l", col="blue", lwd=2, main="Conditional Generator: Estimated vs True", xlab="u", ylab="g(u | Z)") lines(grid, g_est[,2], col="red", lwd=2) lines(grid, g_grid_true, col="black", lwd=2, lty=2) legend("topright", legend=c("Estimated Z = 0.2", "Estimated Z = 0.8", "True Gaussian generator"), col=c("blue", "red", "black"), lwd=2, lty=c(1,1,2))# Conditional generator with Z-dependent mean/covariance n = 5000 d = 3 Z = runif(n) data_X = matrix(NA, n, d) for (i in 1:n) { mean_i = rep(Z[i], d) sigma_i = matrix(c(Z[i], Z[i]/2, Z[i]/3, Z[i]/2, Z[i], Z[i]/2, Z[i]/3, Z[i]/2, Z[i]), nrow = d) data_X[i,] = EllDistrSim( n = 1, d = d, A = chol(sigma_i), mu = mean_i, density_R2 = function(x) { (2*pi)^(-d/2) * exp(-x/2) * (x > 0) } ) } h <- 1.06 * sd(Z) * n^(-1/5) gridZ <- c(0.2, 0.8) mu_est = CondMeanEst(data_X, Z, gridZ, h) Sigma_est = CondCovEst(data_X, Z, gridZ, h, type = "grid_mean") grid = seq(0, 8, length.out = 80) g_est = CondEllGenEst( dataMatrix = data_X, observedZ = Z, mu = mu_est, sigma = Sigma_est, gridZ = gridZ, grid = grid, h = h, Kernel = "epanechnikov", a = 1 ) g_true <- function(r){ (2*pi)^(-d/2) * exp(-r/2) } g_grid_true = g_true(grid) plot(grid, g_est[,1], type="l", col="blue", lwd=2, main="Conditional Generator: Estimated vs True", xlab="u", ylab="g(u | Z)") lines(grid, g_est[,2], col="red", lwd=2) lines(grid, g_grid_true, col="black", lwd=2, lty=2) legend("topright", legend=c("Estimated Z = 0.2", "Estimated Z = 0.8", "True Gaussian generator"), col=c("blue", "red", "black"), lwd=2, lty=c(1,1,2))
This function estimates the conditional mean
of a multivariate response variable using kernel smoothing.
CondMeanEst(dataMatrix, observedZ, gridZ, h, Kernel = "epanechnikov")CondMeanEst(dataMatrix, observedZ, gridZ, h, Kernel = "epanechnikov")
dataMatrix |
a matrix of size |
observedZ |
vector with |
gridZ |
vector of points |
h |
bandwidth of the kernel. |
Kernel |
name of the kernel. Possible choices are
|
The estimator is the kernel-weighted average
where the normalized kernel weights are
Here, is the kernel function, and the bandwidth.
A matrix of size ,
containing the estimated conditional means
of the -dimensional random variable at each point of gridZ.
# Comparison between the estimated and true conditional mean n = 500 d = 1 Z = runif(n, -2, 2) X = matrix(2 * Z + rnorm(n), ncol = d) gridZ = seq(-2, 2, length.out = 50) h = 0.3 CondMean_estimates = CondMeanEst(X, Z, gridZ, h) true_mean = 2 * gridZ plot(gridZ, CondMean_estimates, type = "l", col = "blue", lwd = 2, ylab = "Conditional Mean", xlab = "Z") lines(gridZ, true_mean, col = "red", lwd = 2, lty = 2) legend("topleft", legend = c("Estimated", "True"), col = c("blue", "red"), lty = c(1, 2), lwd = 2)# Comparison between the estimated and true conditional mean n = 500 d = 1 Z = runif(n, -2, 2) X = matrix(2 * Z + rnorm(n), ncol = d) gridZ = seq(-2, 2, length.out = 50) h = 0.3 CondMean_estimates = CondMeanEst(X, Z, gridZ, h) true_mean = 2 * gridZ plot(gridZ, CondMean_estimates, type = "l", col = "blue", lwd = 2, ylab = "Conditional Mean", xlab = "Z") lines(gridZ, true_mean, col = "red", lwd = 2, lty = 2) legend("topleft", legend = c("Estimated", "True"), col = c("blue", "red"), lty = c(1, 2), lwd = 2)
An elliptical random vector X of density
can always be written as for some positive random variable
and a random vector on the -dimensional sphere.
Furthermore, there is a one-to-one mapping between g_d
and its one-dimensional marginal g_1.
Convert_gd_To_g1(grid, g_d, d) Convert_g1_To_Fg1(grid, g_1) Convert_g1_To_Qg1(grid, g_1) Convert_g1_To_f1(grid, g_1) Convert_gd_To_fR2(grid, g_d, d)Convert_gd_To_g1(grid, g_d, d) Convert_g1_To_Fg1(grid, g_1) Convert_g1_To_Qg1(grid, g_1) Convert_g1_To_f1(grid, g_1) Convert_gd_To_fR2(grid, g_d, d)
grid |
the grid on which the values of the functions in parameter are given. |
g_d |
the |
d |
the dimension of the random vector. |
g_1 |
the |
One of the following
g_1 the -dimensional density generator.
Fg1 the -dimensional marginal cumulative distribution function.
Qg1 the -dimensional marginal quantile function
(approximately equal to the inverse function of Fg1).
f1 the density of a -dimensional margin if
and is the identity matrix.
fR2 the density function of .
The function Convert_gd_To_g1 returns a numerical vector of (approximated) values
of g_1 on the same grid as gd.
In all other cases, a function is returned (see the examples section).
DensityGenerator.normalize
to compute the normalized version of a given -dimensional generator.
grid = seq(0,100,by = 0.01) g_d = DensityGenerator.normalize(grid = grid, grid_g = 1/(1+grid^3), d = 3) g_1 = Convert_gd_To_g1(grid = grid, g_d = g_d, d = 3) Fg_1 = Convert_g1_To_Fg1(grid = grid, g_1 = g_1) Qg_1 = Convert_g1_To_Qg1(grid = grid, g_1 = g_1) f1 = Convert_g1_To_f1(grid = grid, g_1 = g_1) fR2 = Convert_gd_To_fR2(grid = grid, g_d = g_d, d = 3) plot(grid, g_d, type = "l", xlim = c(0,10)) plot(grid, g_1, type = "l", xlim = c(0,10)) plot(Fg_1, xlim = c(-3,3)) plot(Qg_1, xlim = c(0.01,0.99)) plot(f1, xlim = c(-3,3)) plot(fR2, xlim = c(0,3))grid = seq(0,100,by = 0.01) g_d = DensityGenerator.normalize(grid = grid, grid_g = 1/(1+grid^3), d = 3) g_1 = Convert_gd_To_g1(grid = grid, g_d = g_d, d = 3) Fg_1 = Convert_g1_To_Fg1(grid = grid, g_1 = g_1) Qg_1 = Convert_g1_To_Qg1(grid = grid, g_1 = g_1) f1 = Convert_g1_To_f1(grid = grid, g_1 = g_1) fR2 = Convert_gd_To_fR2(grid = grid, g_d = g_d, d = 3) plot(grid, g_d, type = "l", xlim = c(0,10)) plot(grid, g_1, type = "l", xlim = c(0,10)) plot(Fg_1, xlim = c(-3,3)) plot(Qg_1, xlim = c(0.01,0.99)) plot(f1, xlim = c(-3,3)) plot(fR2, xlim = c(0,3))
The function DensityGenerator.normalize transforms an elliptical copula generator
into an elliptical copula generator,generating the same distribution
and which is normalized to follow the normalization constraint
as well as the identification constraint
The function DensityGenerator.check checks, for a given generator,
whether these two constraints are satisfied.
DensityGenerator.normalize(grid, grid_g, d, verbose = 0, b = 1) DensityGenerator.check(grid, grid_g, d, b = 1)DensityGenerator.normalize(grid, grid_g, d, verbose = 0, b = 1) DensityGenerator.check(grid, grid_g, d, b = 1)
grid |
the regularly spaced grid on which the values of the generator are given. |
grid_g |
the values of the |
d |
the dimension of the space. |
verbose |
if 1, prints the estimated (alpha, beta) such that
|
b |
the target value for the identification constraint. |
DensityGenerator.normalize returns
the normalized generator, as a list of values on the same grid.
DensityGenerator.check returns (invisibly) a vector of two booleans
where the first element is TRUE if the normalization constraint is satisfied
and the second element is TRUE if the identification constraint is satisfied.
Derumigny, A., & Fermanian, J. D. (2022). Identifiability and estimation of meta-elliptical copula generators. Journal of Multivariate Analysis, article 104962. doi:10.1016/j.jmva.2022.104962.
EllCopSim() for the simulation of elliptical copula samples,
EllCopEst() for the estimation of elliptical copula,
conversion functions for the conversion between different representation
of the generator of an elliptical copula.
This function estimates the density generator of a (meta-)elliptical copula
using the iterative procedure described in (Derumigny and Fermanian, 2022).
This iterative procedure consists in alternating a step of estimating the data
via Liebscher's procedure EllDistrEst() and estimating the quantile function
of the underlying elliptical distribution to transform the data back to the unit cube.
EllCopEst( dataU, Sigma_m1, h, grid = seq(0, 10, by = 0.01), niter = 10, a = 1, Kernel = "epanechnikov", verbose = 1, startPoint = "identity", prenormalization = FALSE )EllCopEst( dataU, Sigma_m1, h, grid = seq(0, 10, by = 0.01), niter = 10, a = 1, Kernel = "epanechnikov", verbose = 1, startPoint = "identity", prenormalization = FALSE )
dataU |
the data matrix on the |
Sigma_m1 |
the inverse of the correlation matrix of the components of data |
h |
bandwidth of the kernel for Liebscher's procedure |
grid |
the grid at which the density generator is estimated. |
niter |
the number of iterations |
a |
tuning parameter to improve the performance at 0. See Liebscher (2005), Example p.210 |
Kernel |
kernel used for the smoothing.
Possible choices are |
verbose |
if 1, prints the progress of the iterations.
If 2, prints the normalization constants used at each iteration,
as computed by |
startPoint |
is the given starting point of the procedure
|
prenormalization |
if |
a list of two elements:
g_d_norm: the estimated elliptical copula generator at each point of the grid;
list_path_gdh: the list of estimated elliptical copula generator at each iteration.
Derumigny, A., & Fermanian, J. D. (2022). Identifiability and estimation of meta-elliptical copula generators. Journal of Multivariate Analysis, article 104962. doi:10.1016/j.jmva.2022.104962.
Liebscher, E. (2005). A semiparametric density estimator based on elliptical distributions. Journal of Multivariate Analysis, 92(1), 205. doi:10.1016/j.jmva.2003.09.007
EllDistrEst for the estimation of elliptical distributions,
EllCopSim for the simulation of elliptical copula samples,
EllCopLikelihood for the computation of the likelihood of a given generator,
DensityGenerator.normalize to compute the normalized version of a given generator.
# Simulation from a Gaussian copula grid = seq(0,10,by = 0.01) g_d = DensityGenerator.normalize(grid, grid_g = exp(-grid), d = 3) n = 10 # To have a nice estimation, we suggest to use rather n=200 # (around 20s of computation time) U = EllCopSim(n = n, d = 3, grid = grid, g_d = g_d) result = EllCopEst(dataU = U, grid, Sigma_m1 = diag(3), h = 0.1, a = 0.5) plot(grid, g_d, type = "l", xlim = c(0,2)) lines(grid, result$g_d_norm, col = "red", xlim = c(0,2)) # Adding missing observations n_NA = 2 U_NA = U for (i in 1:n_NA){ U_NA[sample.int(n,1), sample.int(3,1)] = NA } resultNA = EllCopEst(dataU = U_NA, grid, Sigma_m1 = diag(3), h = 0.1, a = 0.5) lines(grid, resultNA$g_d_norm, col = "blue", xlim = c(0,2))# Simulation from a Gaussian copula grid = seq(0,10,by = 0.01) g_d = DensityGenerator.normalize(grid, grid_g = exp(-grid), d = 3) n = 10 # To have a nice estimation, we suggest to use rather n=200 # (around 20s of computation time) U = EllCopSim(n = n, d = 3, grid = grid, g_d = g_d) result = EllCopEst(dataU = U, grid, Sigma_m1 = diag(3), h = 0.1, a = 0.5) plot(grid, g_d, type = "l", xlim = c(0,2)) lines(grid, result$g_d_norm, col = "red", xlim = c(0,2)) # Adding missing observations n_NA = 2 U_NA = U for (i in 1:n_NA){ U_NA[sample.int(n,1), sample.int(3,1)] = NA } resultNA = EllCopEst(dataU = U_NA, grid, Sigma_m1 = diag(3), h = 0.1, a = 0.5) lines(grid, resultNA$g_d_norm, col = "blue", xlim = c(0,2))
Computes the likelihood
for a vector on the unit cube
and for a -dimensional generator whose univariate density and quantile functions
are respectively and .
This is to the likelihood of the copula associated with the elliptical distribution
having density .
EllCopLikelihood(grid, g_d, pointsToCompute, Sigma_m1, log = TRUE)EllCopLikelihood(grid, g_d, pointsToCompute, Sigma_m1, log = TRUE)
grid |
the discretization grid on which the generator is given. |
g_d |
the values of the |
pointsToCompute |
the points |
Sigma_m1 |
the inverse correlation matrix of the elliptical distribution. |
log |
if |
a vector (of length 1 if pointsToCompute is a vector) of likelihoods
associated with each observation.
Derumigny, A., & Fermanian, J. D. (2022). Identifiability and estimation of meta-elliptical copula generators. Journal of Multivariate Analysis, article 104962. doi:10.1016/j.jmva.2022.104962.
EllCopEst for the estimation of elliptical copula,
EllCopEst for the estimation of elliptical copula.
grid = seq(0,50,by = 0.01) gdnorm = DensityGenerator.normalize(grid = grid, grid_g = exp(-grid/2), d = 3) gdnorm2 = DensityGenerator.normalize(grid = grid, grid_g = 1/(1+grid^2), d = 3) X = EllCopSim(n = 30, d = 3, grid = grid, g_d = gdnorm) logLik = EllCopLikelihood(grid , g_d = gdnorm , X, Sigma_m1 = diag(3), log = TRUE) logLik2 = EllCopLikelihood(grid , g_d = gdnorm2 , X, Sigma_m1 = diag(3), log = TRUE) print(c(sum(logLik), sum(logLik2)))grid = seq(0,50,by = 0.01) gdnorm = DensityGenerator.normalize(grid = grid, grid_g = exp(-grid/2), d = 3) gdnorm2 = DensityGenerator.normalize(grid = grid, grid_g = 1/(1+grid^2), d = 3) X = EllCopSim(n = 30, d = 3, grid = grid, g_d = gdnorm) logLik = EllCopLikelihood(grid , g_d = gdnorm , X, Sigma_m1 = diag(3), log = TRUE) logLik2 = EllCopLikelihood(grid , g_d = gdnorm2 , X, Sigma_m1 = diag(3), log = TRUE) print(c(sum(logLik), sum(logLik2)))
Simulation from an elliptical copula model
EllCopSim( n, d, grid, g_d, A = diag(d), Sigma = NULL, genR = list(method = "pinv") )EllCopSim( n, d, grid, g_d, A = diag(d), Sigma = NULL, genR = list(method = "pinv") )
n |
number of observations. |
d |
dimension of X. |
grid |
grid on which values of density generator are known. |
g_d |
vector of values of the density generator on the |
A, Sigma
|
A is the square-root of the covariance matrix |
genR |
additional arguments for the generation of the squared radius. It must be a list with a component method:
|
a matrix of size (n,d) with n observations
of the d-dimensional elliptical copula.
Derumigny, A., & Fermanian, J. D. (2022). Identifiability and estimation of meta-elliptical copula generators. Journal of Multivariate Analysis, article 104962. doi:10.1016/j.jmva.2022.104962.
EllDistrSim for the simulation of elliptical distributions samples,
EllCopEst for the estimation of elliptical copula,
EllCopLikelihood for the computation of the likelihood of a given generator,
DensityGenerator.normalize to compute the normalized version of a given generator.
# Simulation from a Gaussian copula grid = seq(0,5,by = 0.01) X = EllCopSim(n = 20, d = 2, grid = grid, g_d = exp(-grid/2)) X = EllCopSim(n = 20, d = 2, grid = grid, g_d = exp(-grid/2), genR = list(method = "MH", niter = 500) ) plot(X)# Simulation from a Gaussian copula grid = seq(0,5,by = 0.01) X = EllCopSim(n = 20, d = 2, grid = grid, g_d = exp(-grid/2)) X = EllCopSim(n = 20, d = 2, grid = grid, g_d = exp(-grid/2), genR = list(method = "MH", niter = 500) ) plot(X)
A continuous elliptical distribution has a density of the form
where ,
is the mean,
is a positive-definite matrix
and a function , called the
density generator of .
The goal is to estimate the derivatives of at some point ,
by kernel smoothing, following Section 3 of (Ryan and Derumigny, 2024).
EllDistrDerivEst( X, mu = 0, Sigma_m1 = diag(NCOL(X)), grid, h, Kernel = "gaussian", a = 1, k, mpfr = FALSE, precBits = 100, dopb = TRUE )EllDistrDerivEst( X, mu = 0, Sigma_m1 = diag(NCOL(X)), grid, h, Kernel = "gaussian", a = 1, k, mpfr = FALSE, precBits = 100, dopb = TRUE )
X |
a matrix of size |
mu |
mean of X. This can be the true value or an estimate. It must be
a vector of dimension |
Sigma_m1 |
inverse of the covariance matrix of X.
This can be the true value or an estimate. It must be
a matrix of dimension |
grid |
grid of values on which to estimate the density generator. |
h |
bandwidth of the kernel. Can be either a number or a vector of the
size |
Kernel |
name of the kernel. Possible choices are
|
a |
tuning parameter to improve the performance at 0. |
k |
highest order of the derivative of the generator that is to be
estimated. For example, |
mpfr |
if |
precBits |
number of precBits used for floating point precision
(only used if |
dopb |
a Boolean value.
If |
Note that this function may be rather slow for higher-order derivatives. Furthermore, it is likely that the number of observations needs to be quite high for the higher-order derivatives to be estimated well enough.
a matrix of size length(grid) * (kmax + 1)
with the estimated value of the generator and all its derivatives
at all orders until and including kmax, at all points of the grid.
Alexis Derumigny, Victor Ryan
Victor Ryan, Alexis Derumigny
Ryan, V., & Derumigny, A. (2024). On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator arxiv:2408.17087.
EllDistrEst for the nonparametric estimation of the
elliptical distribution density generator itself,
EllDistrSim for the simulation of elliptical distribution samples.
This function uses the internal functions compute_etahat
and compute_matrix_alpha.
# Comparison between the estimated and true generator of the Gaussian distribution n = 50000 d = 3 X = matrix(rnorm(n * d), ncol = d) grid = seq(0, 5, by = 0.1) a = 1.5 gprimeEst = EllDistrDerivEst(X = X, grid = grid, a = a, h = 0.09, k = 1)[,2] plot(grid, gprimeEst, type = "l") # Computation of true values g = exp(-grid/2)/(2*pi)^{3/2} gprime = (-1/2) * exp(-grid/2)/(2*pi)^{3/2} lines(grid, gprime, col = "red")# Comparison between the estimated and true generator of the Gaussian distribution n = 50000 d = 3 X = matrix(rnorm(n * d), ncol = d) grid = seq(0, 5, by = 0.1) a = 1.5 gprimeEst = EllDistrDerivEst(X = X, grid = grid, a = a, h = 0.09, k = 1)[,2] plot(grid, gprimeEst, type = "l") # Computation of true values g = exp(-grid/2)/(2*pi)^{3/2} gprime = (-1/2) * exp(-grid/2)/(2*pi)^{3/2} lines(grid, gprime, col = "red")
This function uses Liebscher's algorithm to estimate the density generator of an elliptical distribution by kernel smoothing. A continuous elliptical distribution has a density of the form
where ,
is the mean,
is a positive-definite matrix
and a function , called the
density generator of .
The goal is to estimate at some point , by
where
,
is the Gamma function,
and are tuning parameters (respectively the bandwidth and a
parameter controlling the bias at ),
, is a kernel function and
for a sample .
EllDistrEst( X, mu = 0, Sigma_m1 = diag(d), grid, h, Kernel = "epanechnikov", a = 1, mpfr = FALSE, precBits = 100, dopb = TRUE )EllDistrEst( X, mu = 0, Sigma_m1 = diag(d), grid, h, Kernel = "epanechnikov", a = 1, mpfr = FALSE, precBits = 100, dopb = TRUE )
X |
a matrix of size |
mu |
mean of X. This can be the true value or an estimate. It must be
a vector of dimension |
Sigma_m1 |
inverse of the covariance matrix of X.
This can be the true value or an estimate. It must be
a matrix of dimension |
grid |
grid of values of |
h |
bandwidth of the kernel. Can be either a number or a vector of the
size |
Kernel |
name of the kernel. Possible choices are
|
a |
tuning parameter to improve the performance at 0.
Can be either a number or a vector of the
size |
mpfr |
if |
precBits |
number of precBits used for floating point precision
(only used if |
dopb |
a Boolean value.
If |
the values of the density generator of the elliptical copula,
estimated at each point of the grid.
Alexis Derumigny, Rutger van der Spek
Liebscher, E. (2005). A semiparametric density estimator based on elliptical distributions. Journal of Multivariate Analysis, 92(1), 205. doi:10.1016/j.jmva.2003.09.007
The function is introduced in Liebscher (2005), Example p.210.
EllDistrSim for the simulation of elliptical distribution samples.
estim_tilde_AMSE for the estimation of a component of
the asymptotic mean-square error (AMSE) of this estimator
, assuming has been optimally chosen.
EllDistrEst.adapt for the adaptive nonparametric estimation
of the generator of an elliptical distribution.
EllDistrDerivEst for the nonparametric estimation of the
derivatives of the generator.
EllCopEst for the estimation of elliptical copulas
density generators.
# Comparison between the estimated and true generator of the Gaussian distribution X = matrix(rnorm(500*3), ncol = 3) grid = seq(0,5,by=0.1) g_3 = EllDistrEst(X = X, grid = grid, a = 0.7, h=0.05) g_3mpfr = EllDistrEst(X = X, grid = grid, a = 0.7, h=0.05, mpfr = TRUE, precBits = 20) plot(grid, g_3, type = "l") lines(grid, exp(-grid/2)/(2*pi)^{3/2}, col = "red") # In higher dimensions d = 250 X = matrix(rnorm(500*d), ncol = d) grid = seq(0, 400, by = 25) true_g = exp(-grid/2) / (2*pi)^{d/2} g_d = EllDistrEst(X = X, grid = grid, a = 100, h=40) g_dmpfr = EllDistrEst(X = X, grid = grid, a = 100, h=40, mpfr = TRUE, precBits = 10000) ylim = c(min(c(true_g, as.numeric(g_dmpfr[which(g_dmpfr>0)]))), max(c(true_g, as.numeric(g_dmpfr)), na.rm=TRUE) ) plot(grid, g_dmpfr, type = "l", col = "red", ylim = ylim, log = "y") lines(grid, g_d, type = "l") lines(grid, true_g, col = "blue")# Comparison between the estimated and true generator of the Gaussian distribution X = matrix(rnorm(500*3), ncol = 3) grid = seq(0,5,by=0.1) g_3 = EllDistrEst(X = X, grid = grid, a = 0.7, h=0.05) g_3mpfr = EllDistrEst(X = X, grid = grid, a = 0.7, h=0.05, mpfr = TRUE, precBits = 20) plot(grid, g_3, type = "l") lines(grid, exp(-grid/2)/(2*pi)^{3/2}, col = "red") # In higher dimensions d = 250 X = matrix(rnorm(500*d), ncol = d) grid = seq(0, 400, by = 25) true_g = exp(-grid/2) / (2*pi)^{d/2} g_d = EllDistrEst(X = X, grid = grid, a = 100, h=40) g_dmpfr = EllDistrEst(X = X, grid = grid, a = 100, h=40, mpfr = TRUE, precBits = 10000) ylim = c(min(c(true_g, as.numeric(g_dmpfr[which(g_dmpfr>0)]))), max(c(true_g, as.numeric(g_dmpfr)), na.rm=TRUE) ) plot(grid, g_dmpfr, type = "l", col = "red", ylim = ylim, log = "y") lines(grid, g_d, type = "l") lines(grid, true_g, col = "blue")
A continuous elliptical distribution has a density of the form
where ,
is the mean,
is a positive-definite matrix
and a function , called the
density generator of .
The goal is to estimate at some point , by
where
,
is the Gamma function,
and are tuning parameters (respectively the bandwidth and a
parameter controlling the bias at ),
, is a kernel function and
for a sample .
This function computes "optimal asymptotic" values for the bandwidth
and the tuning parameter from a first step bandwidth that the user
needs to provide.
EllDistrEst.adapt( X, mu = 0, Sigma_m1 = diag(NCOL(X)), grid, h_firstStep, grid_a = NULL, Kernel = "gaussian", mpfr = FALSE, precBits = 100, dopb = TRUE )EllDistrEst.adapt( X, mu = 0, Sigma_m1 = diag(NCOL(X)), grid, h_firstStep, grid_a = NULL, Kernel = "gaussian", mpfr = FALSE, precBits = 100, dopb = TRUE )
X |
a matrix of size |
mu |
mean of X. This can be the true value or an estimate. It must be
a vector of dimension |
Sigma_m1 |
inverse of the covariance matrix of X.
This can be the true value or an estimate. It must be
a matrix of dimension |
grid |
vector containing the values at which we want the generator to be estimated. |
h_firstStep |
a vector of size If |
grid_a |
the grid of possible values of |
Kernel |
name of the kernel. Possible choices are
|
mpfr |
if |
precBits |
number of precBits used for floating point precision
(only used if |
dopb |
a Boolean value.
If |
a list with the following elements:
g a vector of size n1 = length(grid).
Each component of this vector is an estimator of
where x[i] is the -th element of the grid.
best_a a vector of the same size as grid indicating
for each value of the grid what is the optimal choice of found by
our algorithm (which is used to estimate ).
best_h a vector of the same size as grid indicating
for each value of the grid what is the optimal choice of found by
our algorithm (which is used to estimate ).
first_step_g first step estimator of g, computed using
the tuning parameters best_a and h_firstStep[2].
AMSE_estimated an estimator of the part of the asymptotic MSE
that only depends on .
Alexis Derumigny, Victor Ryan
Ryan, V., & Derumigny, A. (2024). On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator arxiv:2408.17087.
EllDistrEst for the nonparametric estimation of the
elliptical distribution density generator,
EllDistrSim for the simulation of elliptical distribution samples.
estim_tilde_AMSE which is used in this function. It estimates
a component of the asymptotic mean-square error (AMSE) of the nonparametric
estimator of the elliptical density generator assuming has been optimally
chosen.
n = 500 d = 3 X = matrix(rnorm(n * d), ncol = d) grid = seq(0, 5, by = 0.1) result = EllDistrEst.adapt(X = X, grid = grid, h = 0.05) plot(grid, result$g, type = "l") lines(grid, result$first_step_g, col = "blue") # Computation of true values g = exp(-grid/2)/(2*pi)^{3/2} lines(grid, g, type = "l", col = "red") plot(grid, result$best_a, type = "l", col = "red") plot(grid, result$best_h, type = "l", col = "red") sum((g - result$g)^2, na.rm = TRUE) < sum((g - result$first_step_g)^2, na.rm = TRUE)n = 500 d = 3 X = matrix(rnorm(n * d), ncol = d) grid = seq(0, 5, by = 0.1) result = EllDistrEst.adapt(X = X, grid = grid, h = 0.05) plot(grid, result$g, type = "l") lines(grid, result$first_step_g, col = "blue") # Computation of true values g = exp(-grid/2)/(2*pi)^{3/2} lines(grid, g, type = "l", col = "red") plot(grid, result$best_a, type = "l", col = "red") plot(grid, result$best_h, type = "l", col = "red") sum((g - result$g)^2, na.rm = TRUE) < sum((g - result$first_step_g)^2, na.rm = TRUE)
This function uses the decomposition
where is the mean of , is the random radius,
is the square-root of the covariance matrix of ,
and is a uniform random variable of the d-dimensional unit sphere.
Note that is generated using the Metropolis-Hasting algorithm.
EllDistrSim( n, d, A = diag(d), Sigma = NULL, mu = 0, density_R2, genR = list(method = "pinv") )EllDistrSim( n, d, A = diag(d), Sigma = NULL, mu = 0, density_R2, genR = list(method = "pinv") )
n |
number of observations. |
d |
dimension of |
A, Sigma
|
A is the square-root of the covariance matrix |
mu |
mean of |
density_R2 |
density of the random variable Note that this function must return Note that this function does not need to be normalized (i.e. its integral may
be different from 1); in this case, it must be a multiple of the density
function of |
genR |
additional arguments for the generation of the squared radius. It must be a list with a component method:
|
a matrix of dimensions (n, d) of simulated observations.
EllCopSim for the simulation of elliptical copula samples,
EllCopEst for the estimation of elliptical distributions,
EllDistrSimCond for the conditional simulation of
elliptically distributed random vectors given some observe components.
# Sample from the 3-dimensional standard normal distribution X = EllDistrSim(n = 200, d = 3, density_R2 = function(x){stats::dchisq(x=x,df=3)}) plot(X[,1], X[,2]) hist(X[,1]) cov(X) X = EllDistrSim(n = 200, d = 3, density_R2 = function(x){stats::dchisq(x=x,df=3)}, genR = list(method = "MH", niter = 500)) plot(X[,1], X[,2]) # Sample from the t distribution with 6 degrees of freedom df = 6 d = 2 cov1 = rbind(c(1, 0.5), c(0.5, 1)) density_R2_student_t <- function(r2){ s_d = 2 * pi^(d / 2) / gamma(d / 2) # Surface area of the unit ball in R^d return ( (r2 > 0) * r2^(d/2 - 1) * (s_d / 2) * gamma((df + d)/2) / ( gamma(df/2) * gamma(1/2)^d * (df - 2) ) * (1 + r2 / (df - 2) )^(-(df + d)/2) ) } X = EllDistrSim(n = 1000, d = d, Sigma = cov1, mu = c(2, 6), density_R2 = density_R2_student_t) hist(X[,1]) plot(X[,1], X[,2]) cov(X)# Sample from the 3-dimensional standard normal distribution X = EllDistrSim(n = 200, d = 3, density_R2 = function(x){stats::dchisq(x=x,df=3)}) plot(X[,1], X[,2]) hist(X[,1]) cov(X) X = EllDistrSim(n = 200, d = 3, density_R2 = function(x){stats::dchisq(x=x,df=3)}, genR = list(method = "MH", niter = 500)) plot(X[,1], X[,2]) # Sample from the t distribution with 6 degrees of freedom df = 6 d = 2 cov1 = rbind(c(1, 0.5), c(0.5, 1)) density_R2_student_t <- function(r2){ s_d = 2 * pi^(d / 2) / gamma(d / 2) # Surface area of the unit ball in R^d return ( (r2 > 0) * r2^(d/2 - 1) * (s_d / 2) * gamma((df + d)/2) / ( gamma(df/2) * gamma(1/2)^d * (df - 2) ) * (1 + r2 / (df - 2) )^(-(df + d)/2) ) } X = EllDistrSim(n = 1000, d = d, Sigma = cov1, mu = c(2, 6), density_R2 = density_R2_student_t) hist(X[,1]) plot(X[,1], X[,2]) cov(X)
Simulation of elliptically symmetric random vectors conditionally to some observed part.
EllDistrSimCond( n, xobs, d, Sigma = diag(d), mu = 0, density_R2_, genR = list(method = "pinv") )EllDistrSimCond( n, xobs, d, Sigma = diag(d), mu = 0, density_R2_, genR = list(method = "pinv") )
n |
number of observations to be simulated from the conditional distribution. |
xobs |
observed value of X that we condition on.
|
d |
dimension of the random vector |
Sigma |
(unconditional) covariance matrix |
mu |
(unconditional) mean |
density_R2_ |
(unconditional) density of the squared radius. |
genR |
additional arguments for the generation of the squared radius. It must be a list with a component method:
|
a matrix of size (n,d) of simulated observations.
Cambanis, S., Huang, S., & Simons, G. (1981). On the Theory of Elliptically Contoured Distributions, Journal of Multivariate Analysis. (Corollary 5, p.376)
EllDistrSim for the (unconditional) simulation of
elliptically distributed random vectors.
d = 3 Sigma = rbind(c(1, 0.8, 0.9), c(0.8, 1, 0.7), c(0.9, 0.7, 1)) mu = c(0, 0, 0) result = EllDistrSimCond(n = 100, xobs = c(NA, 2, NA), d = d, Sigma = Sigma, mu = mu, density_R2_ = function(x){stats::dchisq(x=x,df=3)}) plot(result) result2 = EllDistrSimCond(n = 1000, xobs = c(1.3, 2, NA), d = d, Sigma = Sigma, mu = mu, density_R2_ = function(x){stats::dchisq(x=x,df=3)}) hist(result2)d = 3 Sigma = rbind(c(1, 0.8, 0.9), c(0.8, 1, 0.7), c(0.9, 0.7, 1)) mu = c(0, 0, 0) result = EllDistrSimCond(n = 100, xobs = c(NA, 2, NA), d = d, Sigma = Sigma, mu = mu, density_R2_ = function(x){stats::dchisq(x=x,df=3)}) plot(result) result2 = EllDistrSimCond(n = 1000, xobs = c(1.3, 2, NA), d = d, Sigma = Sigma, mu = mu, density_R2_ = function(x){stats::dchisq(x=x,df=3)}) hist(result2)
has been optimally chosenA continuous elliptical distribution has a density of the form
where ,
is the mean,
is a positive-definite matrix
and a function , called the
density generator of .
The goal is to estimate at some point , by
where
,
is the Gamma function,
and are tuning parameters (respectively the bandwidth and a
parameter controlling the bias at ),
, is a kernel function and
for a sample .
Thanks to Proposition 2.2 in (Ryan and Derumigny, 2024), the asymptotic
mean square error of can be decomposed into
a product of a constant (that depends on the true ) and a term that
depends on and . This function computes this term. It can be
useful to find out the best value of the parameter to be used.
estim_tilde_AMSE( X, mu = 0, Sigma_m1 = diag(NCOL(X)), grid, h, Kernel = "gaussian", a = 1, mpfr = FALSE, precBits = 100, dopb = TRUE )estim_tilde_AMSE( X, mu = 0, Sigma_m1 = diag(NCOL(X)), grid, h, Kernel = "gaussian", a = 1, mpfr = FALSE, precBits = 100, dopb = TRUE )
X |
a matrix of size |
mu |
mean of X. This can be the true value or an estimate. It must be
a vector of dimension |
Sigma_m1 |
inverse of the covariance matrix of X.
This can be the true value or an estimate. It must be
a matrix of dimension |
grid |
grid of values of |
h |
bandwidth of the kernel. Can be either a number or a vector of the
size |
Kernel |
name of the kernel. Possible choices are
|
a |
tuning parameter to improve the performance at 0.
Can be either a number or a vector of the
size |
mpfr |
if |
precBits |
number of precBits used for floating point precision
(only used if |
dopb |
a Boolean value.
If |
a vector of the same size as the grid, with the corresponding value
for the .
Alexis Derumigny, Victor Ryan
Ryan, V., & Derumigny, A. (2024). On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator arxiv:2408.17087.
# Comparison between the estimated and true generator of the Gaussian distribution n = 50000 d = 3 X = matrix(rnorm(n * d), ncol = d) grid = seq(0, 5, by = 0.1) a = 1.5 AMSE_est = estim_tilde_AMSE(X = X, grid = grid, a = a, h = 0.09) plot(grid, abs(AMSE_est), type = "l") # Computation of true values g = exp(-grid/2)/(2*pi)^{3/2} gprime = (-1/2) *exp(-grid/2)/(2*pi)^{3/2} A = a^(d/2) psia = -a + (A + grid^(d/2))^(2/d) psiaprime = grid^(d/2 - 1) * (A + grid^(d/2))^(2/d - 1) psiasecond = psiaprime * ( (d-2)/2 ) * grid^{-1} * A * ( grid^(d/2) + A )^(-1) rhoprimexi = ((d-2) * grid^((d-4)/2) * psiaprime - 2 * grid^((d-2)/2) * psiasecond) / (2 * psiaprime^3) * g + grid^((d-2)/2) / (psiaprime^2) * gprime AMSE = rhoprimexi / psiaprime lines(grid, abs(AMSE), col = "red") # Comparison as a function of $a$ n = 50000 d = 3 X = matrix(rnorm(n * d), ncol = d) grid = 0.1 vec_a = c(0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 1, 1.5, 2) AMSE_est = rep(NA, length = length(vec_a)) for (i in 1:length(vec_a)){ AMSE_est[i] = estim_tilde_AMSE(X = X, grid = grid, a = vec_a[i], h = 0.09, dopb = FALSE) } plot(vec_a, abs(AMSE_est), type = "l", log = "x") # Computation of true values a = vec_a g = exp(-grid/2)/(2*pi)^{3/2} gprime = (-1/2) *exp(-grid/2)/(2*pi)^{3/2} A = a^(d/2) psia = -a + (A + grid^(d/2))^(2/d) psiaprime = grid^(d/2 - 1) * (A + grid^(d/2))^(2/d - 1) psiasecond = psiaprime * ( (d-2)/2 ) * grid^{-1} * A * ( grid^(d/2) + A )^(-1) rhoprimexi = ((d-2) * grid^((d-4)/2) * psiaprime - 2 * grid^((d-2)/2) * psiasecond) / (2 * psiaprime^3) * g + grid^((d-2)/2) / (psiaprime^2) * gprime AMSE = rhoprimexi / psiaprime yliminf = min(c(abs(AMSE_est), abs(AMSE))) ylimsup = max(c(abs(AMSE_est), abs(AMSE))) plot(vec_a, abs(AMSE_est), type = "l", log = "xy", ylim = c(yliminf, ylimsup)) lines(vec_a, abs(AMSE), col = "red")# Comparison between the estimated and true generator of the Gaussian distribution n = 50000 d = 3 X = matrix(rnorm(n * d), ncol = d) grid = seq(0, 5, by = 0.1) a = 1.5 AMSE_est = estim_tilde_AMSE(X = X, grid = grid, a = a, h = 0.09) plot(grid, abs(AMSE_est), type = "l") # Computation of true values g = exp(-grid/2)/(2*pi)^{3/2} gprime = (-1/2) *exp(-grid/2)/(2*pi)^{3/2} A = a^(d/2) psia = -a + (A + grid^(d/2))^(2/d) psiaprime = grid^(d/2 - 1) * (A + grid^(d/2))^(2/d - 1) psiasecond = psiaprime * ( (d-2)/2 ) * grid^{-1} * A * ( grid^(d/2) + A )^(-1) rhoprimexi = ((d-2) * grid^((d-4)/2) * psiaprime - 2 * grid^((d-2)/2) * psiasecond) / (2 * psiaprime^3) * g + grid^((d-2)/2) / (psiaprime^2) * gprime AMSE = rhoprimexi / psiaprime lines(grid, abs(AMSE), col = "red") # Comparison as a function of $a$ n = 50000 d = 3 X = matrix(rnorm(n * d), ncol = d) grid = 0.1 vec_a = c(0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 1, 1.5, 2) AMSE_est = rep(NA, length = length(vec_a)) for (i in 1:length(vec_a)){ AMSE_est[i] = estim_tilde_AMSE(X = X, grid = grid, a = vec_a[i], h = 0.09, dopb = FALSE) } plot(vec_a, abs(AMSE_est), type = "l", log = "x") # Computation of true values a = vec_a g = exp(-grid/2)/(2*pi)^{3/2} gprime = (-1/2) *exp(-grid/2)/(2*pi)^{3/2} A = a^(d/2) psia = -a + (A + grid^(d/2))^(2/d) psiaprime = grid^(d/2 - 1) * (A + grid^(d/2))^(2/d - 1) psiasecond = psiaprime * ( (d-2)/2 ) * grid^{-1} * A * ( grid^(d/2) + A )^(-1) rhoprimexi = ((d-2) * grid^((d-4)/2) * psiaprime - 2 * grid^((d-2)/2) * psiasecond) / (2 * psiaprime^3) * g + grid^((d-2)/2) / (psiaprime^2) * gprime AMSE = rhoprimexi / psiaprime yliminf = min(c(abs(AMSE_est), abs(AMSE))) ylimsup = max(c(abs(AMSE_est), abs(AMSE))) plot(vec_a, abs(AMSE_est), type = "l", log = "xy", ylim = c(yliminf, ylimsup)) lines(vec_a, abs(AMSE), col = "red")
Estimate Kendall's tau matrix using averaging estimators. Under
the structural assumption that Kendall's tau matrix is block-structured
with constant values in each off-diagonal block, this function estimates
Kendall's tau matrix “fast”, in the sense that each interblock
coefficient is estimated in time ,
where N is the amount of pairs that are averaged.
KTMatrixEst(dataMatrix, blockStructure = NULL, averaging = "no", N = NULL)KTMatrixEst(dataMatrix, blockStructure = NULL, averaging = "no", N = NULL)
dataMatrix |
matrix of size |
blockStructure |
list of vectors.
Each vector corresponds to one group of variables
and contains the indexes of the variables that belongs to this group.
|
averaging |
type of averaging used for fast estimation. Possible choices are
|
N |
number of entries to average (n the random case.
By default, |
matrix with dimensions depending on averaging.
If averaging = no,
the function returns a matrix of dimension (n,n)
which estimates the Kendall's tau matrix.
Else, the function returns a matrix of dimension
(length(blockStructure) , length(blockStructure))
giving the estimates of the Kendall's tau for each block with ones on the diagonal.
Rutger van der Spek, Alexis Derumigny
van der Spek, R., & Derumigny, A. (2022). Fast estimation of Kendall's Tau and conditional Kendall's Tau matrices under structural assumptions. arxiv:2204.03285.
# Estimating off-diagonal block Kendall's taus matrixCor = matrix(c(1 , 0.5, 0.3 ,0.3, 0.3, 0.5, 1, 0.3, 0.3, 0.3, 0.3, 0.3, 1, 0.5, 0.5, 0.3, 0.3, 0.5, 1, 0.5, 0.3, 0.3, 0.5, 0.5, 1), ncol = 5 , nrow = 5) dataMatrix = mvtnorm::rmvnorm(n = 100, mean = rep(0, times = 5), sigma = matrixCor) blockStructure = list(1:2, 3:5) estKTMatrix = list() estKTMatrix$all = KTMatrixEst(dataMatrix = dataMatrix, blockStructure = blockStructure, averaging = "all") estKTMatrix$row = KTMatrixEst(dataMatrix = dataMatrix, blockStructure = blockStructure, averaging = "row") estKTMatrix$diag = KTMatrixEst(dataMatrix = dataMatrix, blockStructure = blockStructure, averaging = "diag") estKTMatrix$random = KTMatrixEst(dataMatrix = dataMatrix, blockStructure = blockStructure, averaging = "random", N = 2) InterBlockCor = lapply(estKTMatrix, FUN = function(x) {sin(x[1,2] * pi / 2)}) # Estimation of the correlation between variables of the first group # and of the second group print(unlist(InterBlockCor)) # to be compared with the true value: 0.3.# Estimating off-diagonal block Kendall's taus matrixCor = matrix(c(1 , 0.5, 0.3 ,0.3, 0.3, 0.5, 1, 0.3, 0.3, 0.3, 0.3, 0.3, 1, 0.5, 0.5, 0.3, 0.3, 0.5, 1, 0.5, 0.3, 0.3, 0.5, 0.5, 1), ncol = 5 , nrow = 5) dataMatrix = mvtnorm::rmvnorm(n = 100, mean = rep(0, times = 5), sigma = matrixCor) blockStructure = list(1:2, 3:5) estKTMatrix = list() estKTMatrix$all = KTMatrixEst(dataMatrix = dataMatrix, blockStructure = blockStructure, averaging = "all") estKTMatrix$row = KTMatrixEst(dataMatrix = dataMatrix, blockStructure = blockStructure, averaging = "row") estKTMatrix$diag = KTMatrixEst(dataMatrix = dataMatrix, blockStructure = blockStructure, averaging = "diag") estKTMatrix$random = KTMatrixEst(dataMatrix = dataMatrix, blockStructure = blockStructure, averaging = "random", N = 2) InterBlockCor = lapply(estKTMatrix, FUN = function(x) {sin(x[1,2] * pi / 2)}) # Estimation of the correlation between variables of the first group # and of the second group print(unlist(InterBlockCor)) # to be compared with the true value: 0.3.
This function estimates the parameters of a trans-elliptical distribution which is a distribution whose copula is (meta-)elliptical, with arbitrary margins, using the procedure proposed in (Derumigny & Fermanian, 2022).
TEllDistrEst( X, estimatorCDF = function(x){ force(x) return( function(y){(stats::ecdf(x)(y) - 1/(2*length(x))) }) }, h, verbose = 1, grid, ...)TEllDistrEst( X, estimatorCDF = function(x){ force(x) return( function(y){(stats::ecdf(x)(y) - 1/(2*length(x))) }) }, h, verbose = 1, grid, ...)
X |
the matrix of observations of the variables |
estimatorCDF |
the way of estimating the marginal cumulative distribution functions.
It should be either a function that takes in parameter a vector of observations
and returns an estimated cdf (i.e. a function) or a list of such functions
to be applied on the data.
In this case, it is required that the length of the list should be the same
as the number of columns of |
h |
bandwidth for the non-parametric estimation of the density generator. |
verbose |
if 1, prints the progress of the iterations.
If 2, prints the normalizations constants used at each iteration,
as computed by |
grid |
grid of values on which to estimate the density generator |
... |
other parameters to be passed to |
This function returns a list with three components:
listEstCDF: a list of estimated marginal CDF given by estimatorCDF;
corMatrix: the estimated correlation matrix:
estEllCopGen: the estimated generator of the meta-elliptical copula.
Derumigny, A., & Fermanian, J. D. (2022). Identifiability and estimation of meta-elliptical copula generators. Journal of Multivariate Analysis, article 104962. doi:10.1016/j.jmva.2022.104962.
cor = matrix(c(1, 0.5, 0.2, 0.5, 1, 0.8, 0.2, 0.8, 1), byrow = TRUE, nrow = 3) grid = seq(0,10,by = 0.01) g_d = DensityGenerator.normalize(grid, grid_g = exp(-grid), d = 3) n = 10 # To have a nice estimation, we suggest to use rather n=200 # (around 20s of computation time) U = EllCopSim(n = n, d = 3, grid = grid, g_d = g_d, A = chol(cor)) X = matrix(nrow = n, ncol = 3) X[,1] = stats::qnorm(U[,1], mean = 2) X[,2] = stats::qt(U[,2], df = 5) X[,3] = stats::qt(U[,3], df = 8) result = TEllDistrEst(X, h = 0.1, grid = grid) plot(grid, g_d, type = "l", xlim = c(0,2)) lines(grid, result$estiEllCop$g_d_norm, col = "red") print(result$corMatrix) # Adding missing observations n_NA = 2 X_NA = X for (i in 1:n_NA){ X_NA[sample.int(n,1), sample.int(3,1)] = NA } resultNA = TEllDistrEst(X_NA, h = 0.1, grid = grid, verbose = 1) lines(grid, resultNA$estiEllCopGen, col = "blue")cor = matrix(c(1, 0.5, 0.2, 0.5, 1, 0.8, 0.2, 0.8, 1), byrow = TRUE, nrow = 3) grid = seq(0,10,by = 0.01) g_d = DensityGenerator.normalize(grid, grid_g = exp(-grid), d = 3) n = 10 # To have a nice estimation, we suggest to use rather n=200 # (around 20s of computation time) U = EllCopSim(n = n, d = 3, grid = grid, g_d = g_d, A = chol(cor)) X = matrix(nrow = n, ncol = 3) X[,1] = stats::qnorm(U[,1], mean = 2) X[,2] = stats::qt(U[,2], df = 5) X[,3] = stats::qt(U[,3], df = 8) result = TEllDistrEst(X, h = 0.1, grid = grid) plot(grid, g_d, type = "l", xlim = c(0,2)) lines(grid, result$estiEllCop$g_d_norm, col = "red") print(result$corMatrix) # Adding missing observations n_NA = 2 X_NA = X for (i in 1:n_NA){ X_NA[sample.int(n,1), sample.int(3,1)] = NA } resultNA = TEllDistrEst(X_NA, h = 0.1, grid = grid, verbose = 1) lines(grid, resultNA$estiEllCopGen, col = "blue")
This code implements a vectorized version of the Faa di Bruno formula, relying internally on the Bell polynomials from the package kStatistics, via the function kStatistics::eBellPol.
vectorized_Faa_di_Bruno(f, g, x, k, args_f, args_g)vectorized_Faa_di_Bruno(f, g, x, k, args_f, args_g)
f, g
|
two functions that take in argument
|
x |
vector of (one-dimensional) values
at which the |
k |
the order of the derivative |
args_f, args_g
|
the list of additional parameters to be passed on
to |
a vector of size length(x) for which
the i-th component is
Alexis Derumigny, Victor Ryan
compute_matrix_alpha which also uses the Bell polynomials
in a similar way.
g <- function(x, k, a){ if (k == 0){ return ( exp(x) + a) } else { return (exp(x)) } } args_g = list(a = 2) f <- function(x, k, a){ if (k == 0){ return ( x^2 + a) } else if (k == 1) { return ( 2 * x) } else if (k == 2) { return ( 2 ) } else { return ( 0 ) } } args_f = list(a = 5) x = 1:5 vectorized_Faa_di_Bruno(f = f, g = g, x = x, k = 1, args_f = args_f, args_g = args_g) # Derivative of ( exp(x) + 2 )^2 + 5 # which explicit expression is: 2 * exp(x) * ( exp(x) + 2 )g <- function(x, k, a){ if (k == 0){ return ( exp(x) + a) } else { return (exp(x)) } } args_g = list(a = 2) f <- function(x, k, a){ if (k == 0){ return ( x^2 + a) } else if (k == 1) { return ( 2 * x) } else if (k == 2) { return ( 2 ) } else { return ( 0 ) } } args_f = list(a = 5) x = 1:5 vectorized_Faa_di_Bruno(f = f, g = g, x = x, k = 1, args_f = args_f, args_g = args_g) # Derivative of ( exp(x) + 2 )^2 + 5 # which explicit expression is: 2 * exp(x) * ( exp(x) + 2 )