Title: | Estimation and Inference for Conditional Copula Models |
---|---|
Description: | Provides functions for the estimation of conditional copulas models, various estimators of conditional Kendall's tau (proposed in Derumigny and Fermanian (2019a, 2019b, 2020) <doi:10.1515/demo-2019-0016>, <doi:10.1016/j.csda.2019.01.013>, <doi:10.1016/j.jmva.2020.104610>), and test procedures for the simplifying assumption (proposed in Derumigny and Fermanian (2017) <doi:10.1515/demo-2017-0011> and Derumigny, Fermanian and Min (2022) <doi:10.1002/cjs.11742>). |
Authors: | Alexis Derumigny [aut, cre] , Jean-David Fermanian [ctb, ths] , Aleksey Min [ctb] , Rutger van der Spek [ctb] |
Maintainer: | Alexis Derumigny <[email protected]> |
License: | GPL-3 |
Version: | 0.1.4.1 |
Built: | 2024-11-05 05:14:00 UTC |
Source: | https://github.com/alexisderumigny/condcopulas |
By Sklar's theorem, any conditional distribution function can be written as
where is an event and
is a copula depending on the event
.
In this function, we assume that we have a partition
of the probability space, and that for each
,
the conditional copula is parametric according to the following model
for some parameter depending on the realized event
.
This function uses canonical maximum likelihood to estimate
and the corresponding copulas
.
bCond.estParamCopula(U1, U2, family, partition)
bCond.estParamCopula(U1, U2, family, partition)
U1 |
vector of |
U2 |
vector of |
family |
the family of conditional copulas
used for each conditioning event |
partition |
matrix of size |
a list of size p
containing the p
conditional copulas
Derumigny, A., & Fermanian, J. D. (2017). About tests of the “simplifying” assumption for conditional copulas. Dependence Modeling, 5(1), 154-197. doi:10.1515/demo-2017-0011
Derumigny, A., & Fermanian, J. D. (2022) Conditional empirical copula processes and generalized dependence measures Electronic Journal of Statistics, 16(2), 5692-5719. doi:10.1214/22-EJS2075
bCond.pobs
for the computation
of (conditional) pseudo-observations in this framework.
bCond.simpA.param
for a test of the simplifying assumption
that all these conditional copulas are equal
(assuming they all belong to the same parametric family).
bCond.simpA.CKT
for a test of the simplifying assumption
that all these conditional copulas are equal,
based on the equality of conditional Kendall's tau.
n = 800 Z = stats::runif(n = n) CKT = 0.2 * as.numeric(Z <= 0.3) + 0.5 * as.numeric(Z > 0.3 & Z <= 0.5) + - 0.8 * as.numeric(Z > 0.5) simCopula = VineCopula::BiCopSim(N = n, par = VineCopula::BiCopTau2Par(CKT, family = 1), family = 1) X1 = simCopula[,1] X2 = simCopula[,2] partition = cbind(Z <= 0.3, Z > 0.3 & Z <= 0.5, Z > 0.5) condPseudoObs = bCond.pobs(X = cbind(X1, X2), partition = partition) estimatedCondCopulas = bCond.estParamCopula( U1 = condPseudoObs[,1], U2 = condPseudoObs[,2], family = 1, partition = partition) print(estimatedCondCopulas) # Comparison with the true conditional parameters: 0.2, 0.5, -0.8.
n = 800 Z = stats::runif(n = n) CKT = 0.2 * as.numeric(Z <= 0.3) + 0.5 * as.numeric(Z > 0.3 & Z <= 0.5) + - 0.8 * as.numeric(Z > 0.5) simCopula = VineCopula::BiCopSim(N = n, par = VineCopula::BiCopTau2Par(CKT, family = 1), family = 1) X1 = simCopula[,1] X2 = simCopula[,2] partition = cbind(Z <= 0.3, Z > 0.3 & Z <= 0.5, Z > 0.5) condPseudoObs = bCond.pobs(X = cbind(X1, X2), partition = partition) estimatedCondCopulas = bCond.estParamCopula( U1 = condPseudoObs[,1], U2 = condPseudoObs[,2], family = 1, partition = partition) print(estimatedCondCopulas) # Comparison with the true conditional parameters: 0.2, 0.5, -0.8.
Let be
events forming a partition of
a probability space and
be
random variables.
Assume that we observe
i.i.d. replications of
,
and that for each
,
we also know which of the was realized.
This function computes the pseudo-observations
where
is such that the event
is realized for the
-th observation.
bCond.pobs(X, partition)
bCond.pobs(X, partition)
X |
matrix of size |
partition |
matrix of size |
a matrix of size n * d
containing the conditional pseudo-observations .
Derumigny, A., & Fermanian, J. D. (2017). About tests of the “simplifying” assumption for conditional copulas. Dependence Modeling, 5(1), 154-197. doi:10.1515/demo-2017-0011
Derumigny, A., & Fermanian, J. D. (2022) Conditional empirical copula processes and generalized dependence measures Electronic Journal of Statistics, 16(2), 5692-5719. doi:10.1214/22-EJS2075
bCond.estParamCopula
for the estimation
of a (conditional) parametric copula model in this framework.
bCond.treeCKT
that provides a binary tree
based on conditional Kendall's tau
and that can be used to derive relevant conditioning events.
n = 800 Z = stats::runif(n = n) CKT = 0.2 * as.numeric(Z <= 0.3) + 0.5 * as.numeric(Z > 0.3 & Z <= 0.5) + - 0.8 * as.numeric(Z > 0.5) simCopula = VineCopula::BiCopSim(N = n, par = VineCopula::BiCopTau2Par(CKT, family = 1), family = 1) X1 = simCopula[,1] X2 = simCopula[,2] partition = cbind(Z <= 0.3, Z > 0.3 & Z <= 0.5, Z > 0.5) condPseudoObs = bCond.pobs(X = cbind(X1, X2), partition = partition)
n = 800 Z = stats::runif(n = n) CKT = 0.2 * as.numeric(Z <= 0.3) + 0.5 * as.numeric(Z > 0.3 & Z <= 0.5) + - 0.8 * as.numeric(Z > 0.5) simCopula = VineCopula::BiCopSim(N = n, par = VineCopula::BiCopTau2Par(CKT, family = 1), family = 1) X1 = simCopula[,1] X2 = simCopula[,2] partition = cbind(Z <= 0.3, Z > 0.3 & Z <= 0.5, Z > 0.5) condPseudoObs = bCond.pobs(X = cbind(X1, X2), partition = partition)
This function takes in parameter the matrix of (observations) of the
conditioned variables and either matrixInd
, a matrix of indicator variables
describing which events occur for which observations
bCond.simpA.CKT( XI, XJ = NULL, matrixInd = NULL, minCut = 0, minProb = 0.01, minSize = minProb * nrow(XI), nPoints_xJ = 10, type.quantile = 7, verbose = 2, methodTree = "doSplit", propTree = 0.5, methodPvalue = "bootNP", nBootstrap = 100 )
bCond.simpA.CKT( XI, XJ = NULL, matrixInd = NULL, minCut = 0, minProb = 0.01, minSize = minProb * nrow(XI), nPoints_xJ = 10, type.quantile = 7, verbose = 2, methodTree = "doSplit", propTree = 0.5, methodPvalue = "bootNP", nBootstrap = 100 )
XI |
matrix of size n*p of observations of the conditioned variables. |
XJ |
matrix of size n*(d-p) containing observations of the conditioning vector. |
matrixInd |
a matrix of indexes of size (n, N.boxes) describing for each observation i to which box ( = event) it belongs. If it is |
minCut |
minimum difference in probabilities that is necessary to cut. |
minProb |
minimum probability of being in one of the node. |
minSize |
minimum number of observations in each node. This is an alternative to minProb and has priority over it. |
nPoints_xJ |
number of points in the grid that are considered when choosing the point for splitting the tree. |
type.quantile |
way of computing the quantiles,
see |
verbose |
control the text output of the procedure.
If |
methodTree |
method for constructing the tree
Only used if |
propTree |
share of observations used to build the tree
(the rest of the observations are used for the computation of the p-value).
Only used if |
methodPvalue |
method for computing the p-value
|
nBootstrap |
number of bootstrap replications
(Only used if |
a list with the following components
p.value
the estimated p-value.
stat
the test statistic.
treeCKT
the estimated tree if matrixInd
is not provided.
vec_statB
the vector of bootstrapped statistics
if methodPvalue
is not covMatrix
.
Alexis Derumigny, Jean-David Fermanian and Aleksey Min
Derumigny, A., Fermanian, J. D., & Min, A. (2022). Testing for equality between conditional copulas given discretized conditioning events. Canadian Journal of Statistics. doi:10.1002/cjs.11742
Derumigny, A., & Fermanian, J. D. (2022) Conditional empirical copula processes and generalized dependence measures Electronic Journal of Statistics, 16(2), 5692-5719. doi:10.1214/22-EJS2075
bCond.simpA.param
for a test of this simplifying assumption
in a parametric framework.
bCond.treeCKT
provides the binary tree that is used in this function
(if matrixInd
is not provided).
Tests of the simplifying assumption for conditional copulas with a continuous conditioning variable:
simpA.NP
in a nonparametric setting
simpA.param
in a (semi)parametric setting,
where the conditional copula belongs to a parametric family,
but the conditional margins are estimated arbitrarily through
kernel smoothing
simpA.kendallReg
: test based on the constancy of
conditional Kendall's tau
set.seed(1) n = 200 XJ = MASS::mvrnorm(n = n, mu = c(3,3), Sigma = rbind(c(1, 0.2), c(0.2, 1))) XI = matrix(nrow = n, ncol = 2) high_XJ1 = which(XJ[,1] > 4) XI[high_XJ1, ] = MASS::mvrnorm(n = length(high_XJ1), mu = c(10,10), Sigma = rbind(c(1, 0.8), c(0.8, 1))) XI[-high_XJ1, ] = MASS::mvrnorm(n = n - length(high_XJ1), mu = c(8,8), Sigma = rbind(c(1, -0.2), c(-0.2, 1))) result = bCond.simpA.CKT(XI = XI, XJ = XJ, minSize = 10, verbose = 2, methodTree = "doSplit", nBootstrap = 4) print(result$p.value) result2 = bCond.simpA.CKT(XI = XI, XJ = XJ, minSize = 10, verbose = 2, methodTree = "noSplit", nBootstrap = 4) print(result2$p.value)
set.seed(1) n = 200 XJ = MASS::mvrnorm(n = n, mu = c(3,3), Sigma = rbind(c(1, 0.2), c(0.2, 1))) XI = matrix(nrow = n, ncol = 2) high_XJ1 = which(XJ[,1] > 4) XI[high_XJ1, ] = MASS::mvrnorm(n = length(high_XJ1), mu = c(10,10), Sigma = rbind(c(1, 0.8), c(0.8, 1))) XI[-high_XJ1, ] = MASS::mvrnorm(n = n - length(high_XJ1), mu = c(8,8), Sigma = rbind(c(1, -0.2), c(-0.2, 1))) result = bCond.simpA.CKT(XI = XI, XJ = XJ, minSize = 10, verbose = 2, methodTree = "doSplit", nBootstrap = 4) print(result$p.value) result2 = bCond.simpA.CKT(XI = XI, XJ = XJ, minSize = 10, verbose = 2, methodTree = "noSplit", nBootstrap = 4) print(result2$p.value)
Test of the assumption that a conditional copulas does not vary through a list of discrete conditioning events
bCond.simpA.param( X1, X2, partition, family, testStat = "T2c_tau", typeBoot = "boot.NP", nBootstrap = 100 )
bCond.simpA.param( X1, X2, partition, family, testStat = "T2c_tau", typeBoot = "boot.NP", nBootstrap = 100 )
X1 |
vector of |
X2 |
vector of |
partition |
matrix of size |
family |
family of parametric copulas used |
testStat |
test statistic used. Possible choices are
|
typeBoot |
type of bootstrap used |
nBootstrap |
number of bootstrap replications |
a list containing
true_stat
:
the value of the test statistic computed on the whole sample
vect_statB
:
a vector of length nBootstrap
containing the bootstrapped
test statistics.
p_val
: the p-value of the test.
Derumigny, A., & Fermanian, J. D. (2017). About tests of the “simplifying” assumption for conditional copulas. Dependence Modeling, 5(1), 154-197. doi:10.1515/demo-2017-0011
Derumigny, A., & Fermanian, J. D. (2022) Conditional empirical copula processes and generalized dependence measures Electronic Journal of Statistics, 16(2), 5692-5719. doi:10.1214/22-EJS2075
bCond.estParamCopula
for the estimation
of a (conditional) parametric copula model in this framework.
bCond.simpA.CKT
for a test of the simplifying assumption
that all these conditional copulas are equal,
based on the equality of conditional Kendall's tau
(i.e. without any parametric assumption).
Tests of the simplifying assumption for conditional copulas with a continuous conditioning variable:
simpA.NP
in a nonparametric setting
simpA.param
in a (semi)parametric setting,
where the conditional copula belongs to a parametric family,
but the conditional margins are estimated arbitrarily through
kernel smoothing
simpA.kendallReg
: test based on the constancy of
conditional Kendall's tau
n = 800 Z = stats::runif(n = n) CKT = 0.2 * as.numeric(Z <= 0.3) + 0.5 * as.numeric(Z > 0.3 & Z <= 0.5) + + 0.3 * as.numeric(Z > 0.5) family = 3 simCopula = VineCopula::BiCopSim(N = n, par = VineCopula::BiCopTau2Par(CKT, family = family), family = family) X1 = simCopula[,1] X2 = simCopula[,2] partition = cbind(Z <= 0.3, Z > 0.3 & Z <= 0.5, Z > 0.5) result = bCond.simpA.param(X1 = X1, X2 = X2, testStat = "T2c_tau", partition = partition, family = family, typeBoot = "boot.paramInd") print(result$p_val) n = 800 Z = stats::runif(n = n) CKT = 0.1 family = 3 simCopula = VineCopula::BiCopSim(N = n, par = VineCopula::BiCopTau2Par(CKT, family = family), family = family) X1 = simCopula[,1] X2 = simCopula[,2] partition = cbind(Z <= 0.3, Z > 0.3 & Z <= 0.5, Z > 0.5) result = bCond.simpA.param(X1 = X1, X2 = X2, partition = partition, family = family, typeBoot = "boot.NP") print(result$p_val)
n = 800 Z = stats::runif(n = n) CKT = 0.2 * as.numeric(Z <= 0.3) + 0.5 * as.numeric(Z > 0.3 & Z <= 0.5) + + 0.3 * as.numeric(Z > 0.5) family = 3 simCopula = VineCopula::BiCopSim(N = n, par = VineCopula::BiCopTau2Par(CKT, family = family), family = family) X1 = simCopula[,1] X2 = simCopula[,2] partition = cbind(Z <= 0.3, Z > 0.3 & Z <= 0.5, Z > 0.5) result = bCond.simpA.param(X1 = X1, X2 = X2, testStat = "T2c_tau", partition = partition, family = family, typeBoot = "boot.paramInd") print(result$p_val) n = 800 Z = stats::runif(n = n) CKT = 0.1 family = 3 simCopula = VineCopula::BiCopSim(N = n, par = VineCopula::BiCopTau2Par(CKT, family = family), family = family) X1 = simCopula[,1] X2 = simCopula[,2] partition = cbind(Z <= 0.3, Z > 0.3 & Z <= 0.5, Z > 0.5) result = bCond.simpA.param(X1 = X1, X2 = X2, partition = partition, family = family, typeBoot = "boot.NP") print(result$p_val)
This function takes in parameter two matrices of observations:
the first one contains the observations of XI
(the conditioned variables)
and the second on contains the observations of XJ
(the conditioning variables).
The goal of this procedure is to find which of the variables in XJ
have important influence on the dependence between the components of XI
,
(measured by the Kendall's tau).
bCond.treeCKT( XI, XJ, minCut = 0, minProb = 0.01, minSize = minProb * nrow(XI), nPoints_xJ = 10, type.quantile = 7, verbose = 2 )
bCond.treeCKT( XI, XJ, minCut = 0, minProb = 0.01, minSize = minProb * nrow(XI), nPoints_xJ = 10, type.quantile = 7, verbose = 2 )
XI |
matrix of size n*p of observations of the conditioned variables. |
XJ |
matrix of size n*(d-p) containing observations of the conditioning vector. |
minCut |
minimum difference in probabilities that is necessary to cut. |
minProb |
minimum probability of being in one of the node. |
minSize |
minimum number of observations in each node. This is an alternative to minProb and has priority over it. |
nPoints_xJ |
number of points in the grid that are considered when choosing the point for splitting the tree. |
type.quantile |
way of computing the quantiles,
see |
verbose |
control the text output of the procedure.
If |
The object return by this function is a binary tree. Each leaf of this tree
correspond to one event (or, equivalently, one subset of ),
and the conditional Kendall's tau conditionally to it.
the estimated tree using the data 'XI, XJ'.
Derumigny, A., Fermanian, J. D., & Min, A. (2022). Testing for equality between conditional copulas given discretized conditioning events. Canadian Journal of Statistics. doi:10.1002/cjs.11742
bCond.simpA.CKT
for a test of the simplifying assumption
that all these conditional Kendall's tau are equal.
treeCKT2matrixInd
for converting this tree to a matrix of indicators
of each event. matrixInd2matrixCKT
for getting the matrix of estimated
conditional Kendall's taus for each event.
CKT.estimate
for the estimation of
pointwise conditional Kendall's tau,
i.e. assuming a continuous conditioning variable .
set.seed(1) n = 400 XJ = MASS::mvrnorm(n = n, mu = c(3,3), Sigma = rbind(c(1, 0.2), c(0.2, 1))) XI = matrix(nrow = n, ncol = 2) high_XJ1 = which(XJ[,1] > 4) XI[high_XJ1, ] = MASS::mvrnorm(n = length(high_XJ1), mu = c(10,10), Sigma = rbind(c(1, 0.8), c(0.8, 1))) XI[-high_XJ1, ] = MASS::mvrnorm(n = n - length(high_XJ1), mu = c(8,8), Sigma = rbind(c(1, -0.2), c(-0.2, 1))) result = bCond.treeCKT(XI = XI, XJ = XJ, minSize = 50, verbose = 2) # Plotting the corresponding tree using the "DiagrammeR" package if (requireNamespace("DiagrammeR", quietly = TRUE)){ plot(result) } # Number of observations in the first two children print(length(data.tree::GetAttribute(result$children[[1]], "condObs"))) print(length(data.tree::GetAttribute(result$children[[2]], "condObs")))
set.seed(1) n = 400 XJ = MASS::mvrnorm(n = n, mu = c(3,3), Sigma = rbind(c(1, 0.2), c(0.2, 1))) XI = matrix(nrow = n, ncol = 2) high_XJ1 = which(XJ[,1] > 4) XI[high_XJ1, ] = MASS::mvrnorm(n = length(high_XJ1), mu = c(10,10), Sigma = rbind(c(1, 0.8), c(0.8, 1))) XI[-high_XJ1, ] = MASS::mvrnorm(n = n - length(high_XJ1), mu = c(8,8), Sigma = rbind(c(1, -0.2), c(-0.2, 1))) result = bCond.treeCKT(XI = XI, XJ = XJ, minSize = 50, verbose = 2) # Plotting the corresponding tree using the "DiagrammeR" package if (requireNamespace("DiagrammeR", quietly = TRUE)){ plot(result) } # Number of observations in the first two children print(length(data.tree::GetAttribute(result$children[[1]], "condObs"))) print(length(data.tree::GetAttribute(result$children[[2]], "condObs")))
Let and
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
In other words, conditional Kendall's tau is the difference
between the probabilities of observing concordant and discordant pairs
from the conditional law of
This function can use different estimators for conditional Kendall's tau,
see the description of the parameter methodEstimation
for a complete list of possibilities.
CKT.estimate( X1 = NULL, X2 = NULL, Z = NULL, newZ = Z, methodEstimation, h, listPhi = if(methodEstimation == "kendallReg") {list( function(x){return(x)} , function(x){return(x^2)} , function(x){return(x^3)} ) } else {list(identity)} , ... , observedX1 = NULL, observedX2 = NULL, observedZ = NULL )
CKT.estimate( X1 = NULL, X2 = NULL, Z = NULL, newZ = Z, methodEstimation, h, listPhi = if(methodEstimation == "kendallReg") {list( function(x){return(x)} , function(x){return(x^2)} , function(x){return(x^3)} ) } else {list(identity)} , ... , observedX1 = NULL, observedX2 = NULL, observedZ = NULL )
X1 |
a vector of |
X2 |
a vector of |
Z |
a vector of |
newZ |
the new values for the conditioning variable
|
methodEstimation |
method for estimating the conditional Kendall's tau. Possible estimation methods are:
|
h |
the bandwidth |
listPhi |
the list of transformations to be applied
to the conditioning variable |
... |
other parameters passed to the estimating functions
|
observedX1 , observedX2 , observedZ
|
old parameter names for |
the vector of estimated conditional Kendall's tau
at each of the observations of newZ
.
Derumigny, A., & Fermanian, J. D. (2019a). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. doi:10.1016/j.csda.2019.01.013
Derumigny, A., & Fermanian, J. D. (2019b). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. doi:10.1515/demo-2019-0016
Derumigny, A., & Fermanian, J. D. (2020). On Kendall’s regression. Journal of Multivariate Analysis, 178, 104610. doi:10.1016/j.jmva.2020.104610
the specialized functions for estimating
conditional Kendall's tau for each method:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.GLM
, CKT.fit.nNets
,
CKT.predict.kNN
, CKT.fit.randomForest
,
CKT.kernel
and CKT.kendallReg.fit
.
See also the nonparametric estimator of conditional copula models
estimateNPCondCopula
,
and the parametric estimators of conditional copula models
estimateParCondCopula
.
In the case where is discrete
or in the case of discrete conditioning events, see
bCond.treeCKT
.
# We simulate from a conditional copula set.seed(1) N = 300 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) h = 0.1 estimatedCKT_tree <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "tree", h = h) estimatedCKT_rf <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "randomForest", h = h) estimatedCKT_GLM <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "logit", h = h, listPhi = list(function(x){return(x)}, function(x){return(x^2)}, function(x){return(x^3)}) ) estimatedCKT_kNN <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "nearestNeighbors", h = h, number_nn = c(50,80, 100, 120,200), partition = 4 ) estimatedCKT_nNet <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "neuralNetwork", h = h, ) estimatedCKT_kernel <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "kernel", h = h, ) estimatedCKT_kendallReg <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "kendallReg", h = h) # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in other colors) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col="black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_tree, col = "red") lines(newZ, estimatedCKT_rf, col = "blue") lines(newZ, estimatedCKT_GLM, col = "green") lines(newZ, estimatedCKT_kNN, col = "purple") lines(newZ, estimatedCKT_nNet, col = "coral") lines(newZ, estimatedCKT_kernel, col = "skyblue") lines(newZ, estimatedCKT_kendallReg, col = "darkgreen")
# We simulate from a conditional copula set.seed(1) N = 300 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) h = 0.1 estimatedCKT_tree <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "tree", h = h) estimatedCKT_rf <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "randomForest", h = h) estimatedCKT_GLM <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "logit", h = h, listPhi = list(function(x){return(x)}, function(x){return(x^2)}, function(x){return(x^3)}) ) estimatedCKT_kNN <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "nearestNeighbors", h = h, number_nn = c(50,80, 100, 120,200), partition = 4 ) estimatedCKT_nNet <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "neuralNetwork", h = h, ) estimatedCKT_kernel <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "kernel", h = h, ) estimatedCKT_kendallReg <- CKT.estimate( X1 = X1, X2 = X2, Z = Z, newZ = newZ, methodEstimation = "kendallReg", h = h) # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in other colors) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col="black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_tree, col = "red") lines(newZ, estimatedCKT_rf, col = "blue") lines(newZ, estimatedCKT_GLM, col = "green") lines(newZ, estimatedCKT_kNN, col = "purple") lines(newZ, estimatedCKT_nNet, col = "coral") lines(newZ, estimatedCKT_kernel, col = "skyblue") lines(newZ, estimatedCKT_kendallReg, col = "darkgreen")
The function CKT.fit.GLM
fits a regression model for the
conditional Kendall's tau
between two variables
and
conditionally to some predictors
.
More precisely, this function fits the model
for a link function ,
and
real-valued functions
.
The function
CKT.predict.GLM
predicts the values of
conditional Kendall's tau for some values of the conditioning variable .
CKT.fit.GLM( datasetPairs, designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE], link = "logit", ... ) CKT.predict.GLM(fit, newZ)
CKT.fit.GLM( datasetPairs, designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE], link = "logit", ... ) CKT.predict.GLM(fit, newZ)
datasetPairs |
the matrix of pairs and corresponding values of the kernel
as provided by |
designMatrix |
the matrix of predictor to be used for the fitting of the model.
It should have the same number of rows as the |
link |
link function, can be one of
|
... |
other parameters passed to
|
fit |
result of a call to |
newZ |
new matrix of observations of the conditioning vector |
CKT.fit.GLM
returns the fitted GLM,
an object with S3 class ordinalNet
.
CKT.predict.GLM
returns
a vector of (predicted) conditional Kendall's taus of the same size
as the number of rows of the matrix newZ
.
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 2) doi:10.1016/j.csda.2019.01.013
See also other estimators of conditional Kendall's tau:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.nNets
, CKT.predict.kNN
,
CKT.kernel
, CKT.kendallReg.fit
,
and the more general wrapper CKT.estimate
.
# We simulate from a conditional copula set.seed(1) N = 400 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 2*plogis(-1 + 0.8*Z - 0.1*Z^2) - 1 simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) designMatrix = cbind(datasetP[,2], datasetP[,2]^2) fitCKT_GLM <- CKT.fit.GLM( datasetPairs = datasetP, designMatrix = designMatrix, maxiterOut = 10, maxiterIn = 5) print(coef(fitCKT_GLM)) # These are rather close to the true coefficients -1, 0.8, -0.1 # used to generate the data above. newZ = seq(2,10,by = 0.1) estimatedCKT_GLM = CKT.predict.GLM( fit = fitCKT_GLM, newZ = cbind(newZ, newZ^2)) # Comparison between true Kendall's tau (in red) # and estimated Kendall's tau (in black) trueConditionalTau = 2*plogis(-1 + 0.8*newZ - 0.1*newZ^2) - 1 plot(newZ, trueConditionalTau , col="red", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_GLM)
# We simulate from a conditional copula set.seed(1) N = 400 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 2*plogis(-1 + 0.8*Z - 0.1*Z^2) - 1 simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) designMatrix = cbind(datasetP[,2], datasetP[,2]^2) fitCKT_GLM <- CKT.fit.GLM( datasetPairs = datasetP, designMatrix = designMatrix, maxiterOut = 10, maxiterIn = 5) print(coef(fitCKT_GLM)) # These are rather close to the true coefficients -1, 0.8, -0.1 # used to generate the data above. newZ = seq(2,10,by = 0.1) estimatedCKT_GLM = CKT.predict.GLM( fit = fitCKT_GLM, newZ = cbind(newZ, newZ^2)) # Comparison between true Kendall's tau (in red) # and estimated Kendall's tau (in black) trueConditionalTau = 2*plogis(-1 + 0.8*newZ - 0.1*newZ^2) - 1 plot(newZ, trueConditionalTau , col="red", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_GLM)
Let and
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
In other words, conditional Kendall's tau is the difference
between the probabilities of observing concordant and discordant pairs
from the conditional law of
This function estimates conditional Kendall's tau using neural networks. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.
CKT.fit.nNets( datasetPairs, designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE], vecSize = rep(3, times = 10), nObs_per_NN = 0.9 * nrow(designMatrix), verbose = 1 )
CKT.fit.nNets( datasetPairs, designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE], vecSize = rep(3, times = 10), nObs_per_NN = 0.9 * nrow(designMatrix), verbose = 1 )
datasetPairs |
the matrix of pairs and corresponding values of the kernel
as provided by |
designMatrix |
the matrix of predictor to be used for the fitting of the tree |
vecSize |
vector with the number of neurons for each network |
nObs_per_NN |
number of observations used for each neural network. |
verbose |
a number indicated what to print
|
CKT.fit.nNets
returns a list of the fitted neural networks
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 7) doi:10.1016/j.csda.2019.01.013
See also other estimators of conditional Kendall's tau:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.GLM
, CKT.predict.kNN
,
CKT.kernel
, CKT.kendallReg.fit
,
and the more general wrapper CKT.estimate
.
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) fitCKT_nets <- CKT.fit.nNets(datasetPairs = datasetP) estimatedCKT_nNets <- CKT.predict.nNets( fit = fitCKT_nets, newZ = matrix(newZ, ncol = 1)) # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in red) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col="black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_nNets, col = "red")
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) fitCKT_nets <- CKT.fit.nNets(datasetPairs = datasetP) estimatedCKT_nNets <- CKT.predict.nNets( fit = fitCKT_nets, newZ = matrix(newZ, ncol = 1)) # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in red) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col="black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_nNets, col = "red")
Let and
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
In other words, conditional Kendall's tau is the difference
between the probabilities of observing concordant and discordant pairs
from the conditional law of
These functions estimate and predict conditional Kendall's tau using a random forest. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.
CKT.fit.randomForest( datasetPairs, designMatrix = data.frame(x = datasetPairs[, 2:(ncol(datasetPairs) - 3)]), n, nTree = 10, mindev = 0.008, mincut = 0, nObs_per_Tree = ceiling(0.8 * n), nVar_per_Tree = ceiling(0.8 * (ncol(datasetPairs) - 4)), verbose = FALSE, nMaxDepthAllowed = 10 ) CKT.predict.randomForest(fit, newZ)
CKT.fit.randomForest( datasetPairs, designMatrix = data.frame(x = datasetPairs[, 2:(ncol(datasetPairs) - 3)]), n, nTree = 10, mindev = 0.008, mincut = 0, nObs_per_Tree = ceiling(0.8 * n), nVar_per_Tree = ceiling(0.8 * (ncol(datasetPairs) - 4)), verbose = FALSE, nMaxDepthAllowed = 10 ) CKT.predict.randomForest(fit, newZ)
datasetPairs |
the matrix of pairs and corresponding values of the kernel
as provided by |
designMatrix |
the matrix of predictor to be used for the fitting of the tree |
n |
the original sample size of the dataset |
nTree |
number of trees of the Random Forest. |
mindev |
a factor giving the minimum deviation for a node to be splitted.
See |
mincut |
the minimum number of observations (of pairs) in a node
See |
nObs_per_Tree |
number of observations kept in each tree. |
nVar_per_Tree |
number of variables kept in each tree. |
verbose |
if |
nMaxDepthAllowed |
the maximum number of errors of type "the tree cannot be fitted" or "is too deep" before stopping the procedure. |
fit |
result of a call to |
newZ |
new matrix of observations, with the same number of variables.
and same names as the |
a list with two components
list_tree
a list of size nTree
composed of all the fitted trees.
list_variables
a list of size nTree
composed of the (predictor) variables for each tree.
CKT.predict.randomForest
returns
a vector of (predicted) conditional Kendall's taus of the same size
as the number of rows of the newZ.
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 4) doi:10.1016/j.csda.2019.01.013
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) est_RF = CKT.fit.randomForest(datasetPairs = datasetP, n = N, mindev = 0.008) newZ = seq(1,10,by = 0.1) prediction = CKT.predict.randomForest(fit = est_RF, newZ = data.frame(x=newZ)) # Comparison between true Kendall's tau (in red) # and estimated Kendall's tau (in black) plot(newZ, prediction, type = "l", ylim = c(-1,1)) lines(newZ, -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2), col="red")
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) est_RF = CKT.fit.randomForest(datasetPairs = datasetP, n = N, mindev = 0.008) newZ = seq(1,10,by = 0.1) prediction = CKT.predict.randomForest(fit = est_RF, newZ = data.frame(x=newZ)) # Comparison between true Kendall's tau (in red) # and estimated Kendall's tau (in black) plot(newZ, prediction, type = "l", ylim = c(-1,1)) lines(newZ, -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2), col="red")
Let and
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
In other words, conditional Kendall's tau is the difference
between the probabilities of observing concordant and discordant pairs
from the conditional law of
These functions estimate and predict conditional Kendall's tau using a classification tree. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.
CKT.fit.tree(datasetPairs, mindev = 0.008, mincut = 0) CKT.predict.tree(fit, newZ)
CKT.fit.tree(datasetPairs, mindev = 0.008, mincut = 0) CKT.predict.tree(fit, newZ)
datasetPairs |
the matrix of pairs and corresponding values of the kernel
as provided by |
mindev |
a factor giving the minimum deviation for a node to be splitted.
See |
mincut |
the minimum number of observations (of pairs) in a node
See |
fit |
result of a call to |
newZ |
new matrix of observations, with the same number of variables.
and same names as the |
CKT.fit.tree
returns the fitted tree.
CKT.predict.tree
returns
a vector of (predicted) conditional Kendall's taus of the same size
as the number of rows of newZ
.
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Section 3.2) doi:10.1016/j.csda.2019.01.013
See also other estimators of conditional Kendall's tau:
CKT.fit.nNets
, CKT.fit.randomForest
,
CKT.fit.GLM
, CKT.predict.kNN
,
CKT.kernel
, CKT.kendallReg.fit
,
and the more general wrapper CKT.estimate
.
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) est_Tree = CKT.fit.tree(datasetPairs = datasetP, mindev = 0.008) print(est_Tree) newZ = seq(1,10,by = 0.1) prediction = CKT.predict.tree(fit = est_Tree, newZ = data.frame(x=newZ)) # Comparison between true Kendall's tau (in red) # and estimated Kendall's tau (in black) plot(newZ, prediction, type = "l", ylim = c(-1,1)) lines(newZ, -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2), col="red")
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) est_Tree = CKT.fit.tree(datasetPairs = datasetP, mindev = 0.008) print(est_Tree) newZ = seq(1,10,by = 0.1) prediction = CKT.predict.tree(fit = est_Tree, newZ = data.frame(x=newZ)) # Comparison between true Kendall's tau (in red) # and estimated Kendall's tau (in black) plot(newZ, prediction, type = "l", ylim = c(-1,1)) lines(newZ, -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2), col="red")
Let and
be two random variables.
The goal here is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
For this, a kernel-based estimator is used, as described in
(Derumigny & Fermanian (2019)).
These functions aims at finding the best bandwidth
h
among a given
range_h
by cross-validation. They use either:
leave-one-out cross-validation:
function CKT.hCV.l1out
or K-folds cross-validation:
function CKT.hCV.Kfolds
CKT.hCV.l1out( X1 = NULL, X2 = NULL, Z = NULL, range_h, matrixSignsPairs = NULL, nPairs = 10 * length(X1), typeEstCKT = "wdm", kernel.name = "Epa", progressBar = TRUE, verbose = FALSE, observedX1 = NULL, observedX2 = NULL, observedZ = NULL ) CKT.hCV.Kfolds( X1, X2, Z, ZToEstimate, range_h, matrixSignsPairs = NULL, typeEstCKT = "wdm", kernel.name = "Epa", Kfolds = 5, progressBar = TRUE, verbose = FALSE, observedX1 = NULL, observedX2 = NULL, observedZ = NULL )
CKT.hCV.l1out( X1 = NULL, X2 = NULL, Z = NULL, range_h, matrixSignsPairs = NULL, nPairs = 10 * length(X1), typeEstCKT = "wdm", kernel.name = "Epa", progressBar = TRUE, verbose = FALSE, observedX1 = NULL, observedX2 = NULL, observedZ = NULL ) CKT.hCV.Kfolds( X1, X2, Z, ZToEstimate, range_h, matrixSignsPairs = NULL, typeEstCKT = "wdm", kernel.name = "Epa", Kfolds = 5, progressBar = TRUE, verbose = FALSE, observedX1 = NULL, observedX2 = NULL, observedZ = NULL )
X1 |
a vector of |
X2 |
a vector of |
Z |
vector of observed values of Z. If Z is multivariate, then this is a matrix whose rows correspond to the observations of Z |
range_h |
vector containing possible values for the bandwidth. |
matrixSignsPairs |
square matrix of signs of all pairs,
produced by |
nPairs |
number of pairs used in the cross-validation criteria. |
typeEstCKT |
type of estimation of the conditional Kendall's tau. |
kernel.name |
name of the kernel used for smoothing.
Possible choices are |
progressBar |
if |
verbose |
if |
observedX1 , observedX2 , observedZ
|
old parameter names for |
ZToEstimate |
vector of fixed conditioning values at which the difference between the two conditional Kendall's tau should be computed. Can also be a matrix whose lines are the conditioning vectors at which the difference between the two conditional Kendall's tau should be computed. |
Kfolds |
number of subsamples used. |
Both functions return a list with two components:
hCV
: the chosen bandwidth
scores
: vector of the same length as range_h giving the
value of the CV criteria for each of the h tested.
Lower score indicates a better fit.
Derumigny, A., & Fermanian, J. D. (2019). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. Page 296, Equation (4). doi:10.1515/demo-2019-0016
CKT.kernel
for the corresponding
estimator of conditional Kendall's tau by kernel smoothing.
# We simulate from a conditional copula set.seed(1) N = 200 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) range_h = 3:10 resultCV <- CKT.hCV.l1out(X1 = X1, X2 = X2, Z = Z, range_h = range_h, nPairs = 100) resultCV <- CKT.hCV.Kfolds(X1 = X1, X2 = X2, Z = Z, range_h = range_h, ZToEstimate = newZ) plot(range_h, resultCV$scores, type = "b")
# We simulate from a conditional copula set.seed(1) N = 200 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) range_h = 3:10 resultCV <- CKT.hCV.l1out(X1 = X1, X2 = X2, Z = Z, range_h = range_h, nPairs = 100) resultCV <- CKT.hCV.Kfolds(X1 = X1, X2 = X2, Z = Z, range_h = range_h, ZToEstimate = newZ) plot(range_h, resultCV$scores, type = "b")
The function CKT.kendallReg.fit
fits a regression-type model for the
conditional Kendall's tau between two variables and
conditionally to some predictors Z.
More precisely, it fits the model
where is the conditional Kendall's tau
between
and
conditionally to
,
is a function from
to
,
are unknown coefficients to be estimated
and
are a dictionary of functions.
To estimate
, we used the penalized estimator which is defined
as the minimizer of the following criteria
where the are a second sample (here denoted by
ZToEstimate
).
The function CKT.kendallReg.predict
predicts
the conditional Kendall's tau between two variables
and
given
for some new
values of
.
CKT.kendallReg.fit( X1 = NULL, X2 = NULL, Z = NULL, ZToEstimate, designMatrixZ = cbind(ZToEstimate, ZToEstimate^2, ZToEstimate^3), newZ = designMatrixZ, h_kernel, Lambda = identity, Lambda_inv = identity, lambda = NULL, Kfolds_lambda = 10, l_norm = 1, h_lambda = h_kernel, ..., observedX1 = NULL, observedX2 = NULL, observedZ = NULL ) CKT.kendallReg.predict(fit, newZ, lambda = NULL, Lambda_inv = identity)
CKT.kendallReg.fit( X1 = NULL, X2 = NULL, Z = NULL, ZToEstimate, designMatrixZ = cbind(ZToEstimate, ZToEstimate^2, ZToEstimate^3), newZ = designMatrixZ, h_kernel, Lambda = identity, Lambda_inv = identity, lambda = NULL, Kfolds_lambda = 10, l_norm = 1, h_lambda = h_kernel, ..., observedX1 = NULL, observedX2 = NULL, observedZ = NULL ) CKT.kendallReg.predict(fit, newZ, lambda = NULL, Lambda_inv = identity)
X1 |
a vector of |
X2 |
a vector of |
Z |
a vector of |
ZToEstimate |
the intermediary dataset of observations of |
designMatrixZ |
the transformation of the |
newZ |
the new observations of the conditioning variable. |
h_kernel |
bandwidth used for the first step of kernel smoothing. |
Lambda |
the function to be applied on conditional Kendall's tau. By default, the identity function is used. |
Lambda_inv |
the functional inverse of |
lambda |
the regularization parameter. If |
Kfolds_lambda |
the number of folds used in the cross-validation
procedure to choose |
l_norm |
type of norm used for selection of the optimal lambda by cross-validation.
|
h_lambda |
the smoothing bandwidth used in the cross-validation
procedure to choose |
... |
other arguments to be passed to |
observedX1 , observedX2 , observedZ
|
old parameter names for |
fit |
the fitted model, obtained by a call
to |
The function CKT.kendallReg.fit
returns
a list with the following components:
estimatedCKT
: the estimated CKT at the new data points newZ
.
fit
: the fitted model, of S3 class glmnet
(see glmnet::glmnet
for more details).
lambda
: the value of the penalized parameter used.
(i.e. either the one supplied by the user or
the one determined by cross-validation)
CKT.kendallReg.predict
returns
the predicted values of conditional Kendall's tau.
Derumigny, A., & Fermanian, J. D. (2020). On Kendall’s regression. Journal of Multivariate Analysis, 178, 104610. doi:10.1016/j.jmva.2020.104610
See also other estimators of conditional Kendall's tau:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.nNets
, CKT.predict.kNN
,
CKT.kernel
, CKT.fit.GLM
,
and the more general wrapper CKT.estimate
.
See also the test of the simplifying assumption that a
conditional copula does not depend on the value of the
conditioning variable using the nullity of Kendall's regression
coefficients: simpA.kendallReg
.
# We simulate from a conditional copula set.seed(1) N = 400 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2, 10, by = 0.1) estimatedCKT_kendallReg <- CKT.kendallReg.fit( X1 = X1, X2 = X2, Z = Z, ZToEstimate = newZ, h_kernel = 0.07) coef(estimatedCKT_kendallReg$fit, s = estimatedCKT_kendallReg$lambda) # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in red) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col="black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_kendallReg$estimatedCKT, col = "red")
# We simulate from a conditional copula set.seed(1) N = 400 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2, 10, by = 0.1) estimatedCKT_kendallReg <- CKT.kendallReg.fit( X1 = X1, X2 = X2, Z = Z, ZToEstimate = newZ, h_kernel = 0.07) coef(estimatedCKT_kendallReg$fit, s = estimatedCKT_kendallReg$lambda) # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in red) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col="black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_kendallReg$estimatedCKT, col = "red")
In this model, three variables ,
and
are observed.
We try to model the conditional Kendall's tau between
and
conditionally
to
, as follows:
where is the conditional Kendall's tau
between
and
conditionally to
,
is a function from
to
,
are unknown coefficients to be estimated
and
are a dictionary of functions.
To estimate
, we used the penalized estimator which is defined
as the minimizer of the following criteria
This function chooses the penalization parameter
by cross-validation.
CKT.KendallReg.LambdaCV( X1 = NULL, X2 = NULL, Z = NULL, ZToEstimate, designMatrixZ = cbind(ZToEstimate, ZToEstimate^2, ZToEstimate^3), typeEstCKT = 4, h_lambda, Lambda = identity, kernel.name = "Epa", Kfolds_lambda = 10, l_norm = 1, matrixSignsPairs = NULL, progressBars = "global", observedX1 = NULL, observedX2 = NULL, observedZ = NULL )
CKT.KendallReg.LambdaCV( X1 = NULL, X2 = NULL, Z = NULL, ZToEstimate, designMatrixZ = cbind(ZToEstimate, ZToEstimate^2, ZToEstimate^3), typeEstCKT = 4, h_lambda, Lambda = identity, kernel.name = "Epa", Kfolds_lambda = 10, l_norm = 1, matrixSignsPairs = NULL, progressBars = "global", observedX1 = NULL, observedX2 = NULL, observedZ = NULL )
X1 |
a vector of n observations of the first variable |
X2 |
a vector of n observations of the second variable |
Z |
a vector of n observations of the conditioning variable,
or a matrix with n rows of observations of the conditioning vector
(if |
ZToEstimate |
the new data of observations of Z at which the conditional Kendall's tau should be estimated. |
designMatrixZ |
the transformation of the ZToEstimate that will be used as predictors. By default, no transformation is applied. |
typeEstCKT |
type of estimation of the conditional Kendall's tau. |
h_lambda |
the smoothing bandwidth used in the cross-validation
procedure to choose |
Lambda |
the function to be applied on conditional Kendall's tau. By default, the identity function is used. |
kernel.name |
name of the kernel. Possible choices are "Gaussian" (Gaussian kernel) and "Epa" (Epanechnikov kernel). |
Kfolds_lambda |
the number of folds used in the cross-validation
procedure to choose |
l_norm |
type of norm used for selection of the optimal lambda. l_norm=1 corresponds to the sum of absolute values of differences between predicted and estimated conditional Kendall's tau while l_norm=2 corresponds to the sum of squares of differences. |
matrixSignsPairs |
the results of a call to
|
progressBars |
should progress bars be displayed? Possible values are
|
observedX1 , observedX2 , observedZ
|
old parameter names for |
A list with the following components
lambdaCV
: the chosen value of the
penalization parameters lambda
.
vectorLambda
: a vector containing the values of
lambda
that have been compared.
vectorMSEMean
: the estimated MSE for each value of
lambda
in vectorLambda
vectorMSESD
: the estimated standard deviation of the
MSE for each lambda
. It can be used to construct confidence
intervals for estimates of the MSE given by vectorMSEMean
.
Derumigny, A., & Fermanian, J. D. (2020). On Kendall’s regression. Journal of Multivariate Analysis, 178, 104610.
the main fitting function CKT.kendallReg.fit
.
# We simulate from a conditional copula set.seed(1) N = 400 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2, 10, by = 0.1) result <- CKT.KendallReg.LambdaCV(X1 = X1, X2 = X2, Z = Z, ZToEstimate = newZ, h_lambda = 2) plot(x = result$vectorLambda, y = result$vectorMSEMean, type = "l", log = "x")
# We simulate from a conditional copula set.seed(1) N = 400 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2, 10, by = 0.1) result <- CKT.KendallReg.LambdaCV(X1 = X1, X2 = X2, Z = Z, ZToEstimate = newZ, h_lambda = 2) plot(x = result$vectorLambda, y = result$vectorMSEMean, type = "l", log = "x")
Let and
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
For this, a kernel-based estimator is used, as described in
(Derumigny, & Fermanian (2019)).
CKT.kernel( X1 = NULL, X2 = NULL, Z = NULL, newZ, h, kernel.name = "Epa", methodCV = "Kfolds", Kfolds = 5, nPairs = 10 * length(observedX1), typeEstCKT = "wdm", progressBar = TRUE, observedX1 = NULL, observedX2 = NULL, observedZ = NULL )
CKT.kernel( X1 = NULL, X2 = NULL, Z = NULL, newZ, h, kernel.name = "Epa", methodCV = "Kfolds", Kfolds = 5, nPairs = 10 * length(observedX1), typeEstCKT = "wdm", progressBar = TRUE, observedX1 = NULL, observedX2 = NULL, observedZ = NULL )
X1 |
a vector of n observations of the first variable (or a 1-column matrix) |
X2 |
a vector of n observations of the second variable (or a 1-column matrix) |
Z |
a vector of n observations of the conditioning variable, or a matrix with n rows of observations of the conditioning vector |
newZ |
the new data of observations of Z at which the conditional Kendall's tau should be estimated. |
h |
the bandwidth used for kernel smoothing.
If this is a vector, then cross-validation is used following the method
given by argument |
kernel.name |
name of the kernel used for smoothing.
Possible choices are |
methodCV |
method used for the cross-validation.
Possible choices are |
Kfolds |
number of subsamples used,
if |
nPairs |
number of pairs used in the cross-validation criteria,
if |
typeEstCKT |
type of estimation of the conditional Kendall's tau. Possible choices are
|
progressBar |
control the display of progress bars. Possible choices are:
|
observedX1 , observedX2 , observedZ
|
old parameter names for |
Choice of the bandwidth h
.
The choice of the bandwidth must be done carefully.
In the univariate case, the default kernel (Epanechnikov kernel) has a support
on , so for a bandwidth
h
, estimation of conditional Kendall's
tau at will only use points for which
.
As usual in nonparametric estimation,
h
should not be too small
(to avoid having a too large variance) and should not be large
(to avoid having a too large bias).
We recommend that for each for which the conditional Kendall's tau
is estimated, the set
should contain at least 20 points and not more than 30% of the points of
the whole dataset.
Note that for a consistent estimation, as the sample size
tends
to the infinity,
h
should tend to while the size of the set
should also tend to the infinity.
Indeed the conditioning points should be closer and closer to the point of interest
(small
h
) and more and more numerous (h
tending to 0 slowly enough).
In the multivariate case, similar recommendations can be made. Because of the curse of dimensionality, a larger sample will be necessary to reach the same level of precision as in the univariate case.
a list with two components
estimatedCKT
the vector of size NROW(newZ)
containing the values of the estimated conditional Kendall's tau.
finalh
the bandwidth h
that was finally used
for kernel smoothing (either the one specified by the user
or the one chosen by cross-validation if multiple bandwidths were given.)
Derumigny, A., & Fermanian, J. D. (2019). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. doi:10.1515/demo-2019-0016
CKT.estimate
for other estimators
of conditional Kendall's tau.
CKTmatrix.kernel
for a generalization of this function
when the conditioned vector is of dimension d
instead of dimension 2
here.
See CKT.hCV.l1out
for manual selection of the bandwidth h
by leave-one-out or K-folds cross-validation.
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) estimatedCKT_kernel <- CKT.kernel( X1 = X1, X2 = X2, Z = Z, newZ = newZ, h = 0.1, kernel.name = "Epa")$estimatedCKT # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in red) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col = "black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_kernel, col = "red")
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) estimatedCKT_kernel <- CKT.kernel( X1 = X1, X2 = X2, Z = Z, newZ = newZ, h = 0.1, kernel.name = "Epa")$estimatedCKT # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in red) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col = "black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_kernel, col = "red")
Let and
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between
and
given
for a conditioning variable
.
Conditional Kendall's tau between
and
given
is defined as:
where and
are two independent and identically distributed copies of
.
In other words, conditional Kendall's tau is the difference
between the probabilities of observing concordant and discordant pairs
from the conditional law of
This function estimates conditional Kendall's tau using a nearest neighbors. This is possible by the relationship between estimation of conditional Kendall's tau and classification problems (see Derumigny and Fermanian (2019)): estimation of conditional Kendall's tau is equivalent to the prediction of concordance in the space of pairs of observations.
CKT.predict.kNN( datasetPairs, designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE], newZ, number_nn, weightsVariables = 1, normLp = 2, constantA = 1, partition = NULL, verbose = 1, lengthVerbose = 100, methodSort = "partial.sort" )
CKT.predict.kNN( datasetPairs, designMatrix = datasetPairs[, 2:(ncol(datasetPairs) - 3), drop = FALSE], newZ, number_nn, weightsVariables = 1, normLp = 2, constantA = 1, partition = NULL, verbose = 1, lengthVerbose = 100, methodSort = "partial.sort" )
datasetPairs |
the matrix of pairs and corresponding values of the kernel
as provided by |
designMatrix |
the matrix of predictors.
They must have the same number of variables as |
newZ |
the matrix of predictors for which we want to estimate the conditional Kendall's taus at these values. |
number_nn |
vector of numbers of nearest neighbors to use.
If several number of neighbors are given (local) aggregation is performed
using Lepski's method on the subset determined by the |
weightsVariables |
optional argument to give
different weights |
normLp |
the p in the weighted p-norm |
constantA |
a tuning parameter that controls the adaptation. The higher, the smoother it is; while the smaller, the least smooth it is. |
partition |
used only if |
verbose |
if TRUE, this print information each |
lengthVerbose |
number of iterations at each time for which progress is printed. |
methodSort |
is the sorting method used to find the nearest neighbors.
Possible choices are
|
a list with two components
estimatedCKT
the estimated conditional Kendall's tau, a vector of
the same size as the number of rows in newZ
;
vect_k_chosen
the locally selected number of nearest neighbors,
a vector of the same size as the number of rows in newZ
.
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 5) doi:10.1016/j.csda.2019.01.013
See also other estimators of conditional Kendall's tau:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.nNets
, CKT.fit.randomForest
,
CKT.fit.GLM
, CKT.kernel
,
CKT.kendallReg.fit
,
and the more general wrapper CKT.estimate
.
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) estimatedCKT_knn <- CKT.predict.kNN( datasetPairs = datasetP, newZ = matrix(newZ,ncol = 1), number_nn = c(50,80, 100, 120,200), partition = 8) # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in red) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col="black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_knn$estimatedCKT, col = "red")
# We simulate from a conditional copula set.seed(1) N = 800 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) newZ = seq(2,10,by = 0.1) datasetP = datasetPairs(X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9) estimatedCKT_knn <- CKT.predict.kNN( datasetPairs = datasetP, newZ = matrix(newZ,ncol = 1), number_nn = c(50,80, 100, 120,200), partition = 8) # Comparison between true Kendall's tau (in black) # and estimated Kendall's tau (in red) trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2) plot(newZ, trueConditionalTau , col="black", type = "l", ylim = c(-1, 1)) lines(newZ, estimatedCKT_knn$estimatedCKT, col = "red")
Predict the values of conditional Kendall's tau using Model Averaging of Neural Networks
CKT.predict.nNets(fit, newZ, aggregationMethod = "mean")
CKT.predict.nNets(fit, newZ, aggregationMethod = "mean")
fit |
result of a call to |
newZ |
new matrix of observations, with the same number of variables.
and same names as the |
aggregationMethod |
the method to be used to aggregate all the predictions
together. Can be |
CKT.predict.nNets
returns
a vector of (predicted) conditional Kendall's taus of the same size
as the number of rows of the matrix newZ
.
Assume that we are interested in a random vector ,
where
is of dimension
and
is of dimension
.
We want to estimate the dependence across the elements of the conditioned vector
given
.
This function takes in parameter observations of
and returns kernel-based estimators of
which is the conditional Kendall's tau between and
given to
, for every conditioning point
in
gridZ
.
If the conditional Kendall's tau matrix has a block structure,
then improved estimation is possible by averaging over the kernel-based estimators of
pairwise conditional Kendall's taus.
Groups of variables composing the same blocks can be defined
using the parameter blockStructure
, and the averaging can be set on using
the parameter averaging=all
, or averaging=diag
for faster estimation by averaging only over diagonal elements of each block.
CKTmatrix.kernel( dataMatrix, observedZ, gridZ, averaging = "no", blockStructure = NULL, h, kernel.name = "Epa", typeEstCKT = "wdm" )
CKTmatrix.kernel( dataMatrix, observedZ, gridZ, averaging = "no", blockStructure = NULL, h, kernel.name = "Epa", typeEstCKT = "wdm" )
dataMatrix |
a matrix of size |
observedZ |
vector of observed points of a conditioning variable |
gridZ |
points at which the conditional Kendall's tau is computed. |
averaging |
type of averaging used for fast estimation. Possible choices are
|
blockStructure |
list of vectors.
Each vector corresponds to one group of variables
and contains the indexes of the variables that belongs to this group.
|
h |
bandwidth. It can be a real, in this case the same |
kernel.name |
name of the kernel used for smoothing.
Possible choices are: |
typeEstCKT |
type of estimation of the conditional Kendall's tau. |
array with dimensions depending on averaging
:
If averaging = "no"
:
it returns an array of dimensions (n, n, length(gridZ))
,
containing the estimated conditional Kendall's tau matrix given .
Here,
n
is the number of rows in dataMatrix
.
If averaging = "all"
or "diag"
:
it returns an array of dimensions
(length(blockStructure), length(blockStructure), length(gridZ))
,
containing the block estimates of the conditional Kendall's tau given
with ones on the diagonal.
Rutger van der Spek, Alexis Derumigny
van der Spek, R., & Derumigny, A. (2022). Fast estimation of Kendall's Tau and conditional Kendall's Tau matrices under structural assumptions. arxiv:2204.03285.
CKT.kernel
for kernel-based estimation of conditional Kendall's tau
between two variables (i.e. the equivalent of this function
when is bivariate and
d=2
).
# Data simulation n = 100 Z = runif(n) d = 5 CKT_11 = 0.8 CKT_22 = 0.9 CKT_12 = 0.1 + 0.5 * cos(pi * Z) data_X = matrix(nrow = n, ncol = d) for (i in 1:n){ CKT_matrix = matrix(data = c( 1 , CKT_11 , CKT_11 , CKT_12[i], CKT_12[i] , CKT_11 , 1 , CKT_11 , CKT_12[i], CKT_12[i] , CKT_11 , CKT_11 , 1 , CKT_12[i], CKT_12[i] , CKT_12[i], CKT_12[i], CKT_12[i], 1 , CKT_22 , CKT_12[i], CKT_12[i], CKT_12[i], CKT_22 , 1 ) , nrow = 5, ncol = 5) sigma = sin(pi * CKT_matrix/2) data_X[i, ] = mvtnorm::rmvnorm(n = 1, sigma = sigma) } plot(as.data.frame.matrix(data_X)) # Estimation of CKT matrix h = 1.06 * sd(Z) * n^{-1/5} gridZ = c(0.2, 0.8) estMatrixAll <- CKTmatrix.kernel( dataMatrix = data_X, observedZ = Z, gridZ = gridZ, h = h) # Averaging estimator estMatrixAve <- CKTmatrix.kernel( dataMatrix = data_X, observedZ = Z, gridZ = gridZ, averaging = "diag", blockStructure = list(1:3,4:5), h = h) # The estimated CKT matrix conditionally to Z=0.2 is: estMatrixAll[ , , 1] # Using the averaging estimator, # the estimated CKT between the first group (variables 1 to 3) # and the second group (variables 4 and 5) is estMatrixAve[1, 2, 1] # True value (of CKT between variables in block 1 and 2 given Z = 0.2): 0.1 + 0.5 * cos(pi * 0.2)
# Data simulation n = 100 Z = runif(n) d = 5 CKT_11 = 0.8 CKT_22 = 0.9 CKT_12 = 0.1 + 0.5 * cos(pi * Z) data_X = matrix(nrow = n, ncol = d) for (i in 1:n){ CKT_matrix = matrix(data = c( 1 , CKT_11 , CKT_11 , CKT_12[i], CKT_12[i] , CKT_11 , 1 , CKT_11 , CKT_12[i], CKT_12[i] , CKT_11 , CKT_11 , 1 , CKT_12[i], CKT_12[i] , CKT_12[i], CKT_12[i], CKT_12[i], 1 , CKT_22 , CKT_12[i], CKT_12[i], CKT_12[i], CKT_22 , 1 ) , nrow = 5, ncol = 5) sigma = sin(pi * CKT_matrix/2) data_X[i, ] = mvtnorm::rmvnorm(n = 1, sigma = sigma) } plot(as.data.frame.matrix(data_X)) # Estimation of CKT matrix h = 1.06 * sd(Z) * n^{-1/5} gridZ = c(0.2, 0.8) estMatrixAll <- CKTmatrix.kernel( dataMatrix = data_X, observedZ = Z, gridZ = gridZ, h = h) # Averaging estimator estMatrixAve <- CKTmatrix.kernel( dataMatrix = data_X, observedZ = Z, gridZ = gridZ, averaging = "diag", blockStructure = list(1:3,4:5), h = h) # The estimated CKT matrix conditionally to Z=0.2 is: estMatrixAll[ , , 1] # Using the averaging estimator, # the estimated CKT between the first group (variables 1 to 3) # and the second group (variables 4 and 5) is estMatrixAve[1, 2, 1] # True value (of CKT between variables in block 1 and 2 given Z = 0.2): 0.1 + 0.5 * cos(pi * 0.2)
This function computes a matrix of dimensions (length(observedX3), length(newX3))
,
whose element at coordinate (i,j)
is
observedX3
newX3
,
where
and
is the
kernel
.
computeKernelMatrix(observedX, newX, kernel, h)
computeKernelMatrix(observedX, newX, kernel, h)
observedX |
a numeric vector of observations of X3.
on the interval |
newX |
a numeric vector of points of X3. |
kernel |
a character string describing the kernel to be used.
Possible choices are |
h |
the bandwidth |
a numeric matrix of dimensions (length(observedX), length(newX))
estimateCondCDF_matrix
, estimateCondCDF_vec
,
Y = MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = cbind(c(1, 0.9), c(0.9, 1))) matrixK = computeKernelMatrix(observedX = Y[,2], newX = c(0, 1, 2.5), kernel = "Gaussian", h = 0.8) # To have an estimator of the conditional expectation of Y1 given Y2 = 0, 1, 2.5 Y[,1] * matrixK[,1] / sum(matrixK[,1]) Y[,1] * matrixK[,2] / sum(matrixK[,2]) Y[,1] * matrixK[,3] / sum(matrixK[,3])
Y = MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = cbind(c(1, 0.9), c(0.9, 1))) matrixK = computeKernelMatrix(observedX = Y[,2], newX = c(0, 1, 2.5), kernel = "Gaussian", h = 0.8) # To have an estimator of the conditional expectation of Y1 given Y2 = 0, 1, 2.5 Y[,1] * matrixK[,1] / sum(matrixK[,1]) Y[,1] * matrixK[,2] / sum(matrixK[,2]) Y[,1] * matrixK[,3] / sum(matrixK[,3])
Compute a matrix giving the concordance or discordance of each pair of observations.
computeMatrixSignPairs(vectorX1, vectorX2, typeEstCKT = 4)
computeMatrixSignPairs(vectorX1, vectorX2, typeEstCKT = 4)
vectorX1 |
vector of observed data (first coordinate) |
vectorX2 |
vector of observed data (second coordinate) |
typeEstCKT |
if typeEstCKT = 2 or 4, compute the matrix whose term (i,j) is :
where For |
an n * n
matrix with the signs of each pair
of observations.
# We simulate from a conditional copula N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.9 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N = N , family = 3, par = VineCopula::BiCopTau2Par(1 , conditionalTau) ) matrixPairs = computeMatrixSignPairs(vectorX1 = simCopula[,1], vectorX2 = simCopula[,2])
# We simulate from a conditional copula N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.9 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N = N , family = 3, par = VineCopula::BiCopTau2Par(1 , conditionalTau) ) matrixPairs = computeMatrixSignPairs(vectorX1 = simCopula[,1], vectorX2 = simCopula[,2])
The function treeCKT2matrixInd
takes as input a binary tree that has been returned
by the function bCond.treeCKT
.
Since this tree describes a partition of the conditioning space,
it can be interesting to get, for a given dataset, the matrix
where each corresponds to a conditioning subset.
This is the so-called
matrixInd
.
Finally, it can be interesting to get the matrix of
treeCKT2matrixInd(estimatedTree, newDataXJ = NULL) matrixInd2matrixCKT(matrixInd, newDataXI) treeCKT2matrixCKT(estimatedTree, newDataXI = NULL, newDataXJ = NULL)
treeCKT2matrixInd(estimatedTree, newDataXJ = NULL) matrixInd2matrixCKT(matrixInd, newDataXI) treeCKT2matrixCKT(estimatedTree, newDataXI = NULL, newDataXJ = NULL)
estimatedTree |
the tree that has been estimated before,
for example by |
newDataXJ |
this is a matrix of size |
matrixInd |
a matrix of indexes of size (n, N.boxes) describing for each observation i to which box ( = event) it belongs. |
newDataXI |
this is a matrix of size |
The function treeCKT2matrixInd
returns
a matrix of size N * m
which component [i,j]
is
.
The function matrixInd2matrixCKT
and treeCKT2matrixCKT
return
a matrix of size |I| * (|I|-1) * m
where each component corresponds
to a conditional Kendall's tau between a pair of conditional variables
conditionally to the conditioned variables in one of the boxes
bCond.treeCKT
for the construction of such a binary tree.
set.seed(1) n = 200 XJ = MASS::mvrnorm(n = n, mu = c(3,3), Sigma = rbind(c(1, 0.2), c(0.2, 1))) XI = matrix(nrow = n, ncol = 2) high_XJ1 = which(XJ[,1] > 4) XI[high_XJ1, ] = MASS::mvrnorm(n = length(high_XJ1), mu = c(10,10), Sigma = rbind(c(1, 0.8), c(0.8, 1))) XI[-high_XJ1, ] = MASS::mvrnorm(n = n - length(high_XJ1), mu = c(8,8), Sigma = rbind(c(1, -0.2), c(-0.2, 1))) result = bCond.treeCKT(XI = XI, XJ = XJ, minSize = 10, verbose = 2) treeCKT2matrixInd(result) matrixInd2matrixCKT(treeCKT2matrixInd(result), newDataXI = XI) treeCKT2matrixCKT(result)
set.seed(1) n = 200 XJ = MASS::mvrnorm(n = n, mu = c(3,3), Sigma = rbind(c(1, 0.2), c(0.2, 1))) XI = matrix(nrow = n, ncol = 2) high_XJ1 = which(XJ[,1] > 4) XI[high_XJ1, ] = MASS::mvrnorm(n = length(high_XJ1), mu = c(10,10), Sigma = rbind(c(1, 0.8), c(0.8, 1))) XI[-high_XJ1, ] = MASS::mvrnorm(n = n - length(high_XJ1), mu = c(8,8), Sigma = rbind(c(1, -0.2), c(-0.2, 1))) result = bCond.treeCKT(XI = XI, XJ = XJ, minSize = 10, verbose = 2) treeCKT2matrixInd(result) matrixInd2matrixCKT(treeCKT2matrixInd(result), newDataXI = XI) treeCKT2matrixCKT(result)
In (Derumigny, & Fermanian (2019)), it is described how the problem
of estimating conditional Kendall's tau can be rewritten as a
classification task for a dataset of pairs (of observations).
This function computes such a dataset, that can be then used to
estimate conditional Kendall's tau using one of the following
functions:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.GLM
, CKT.fit.nNets
,
CKT.predict.kNN
.
datasetPairs( X1, X2, Z, h, cut = 0.9, onlyConsecutivePairs = FALSE, nPairs = NULL )
datasetPairs( X1, X2, Z, h, cut = 0.9, onlyConsecutivePairs = FALSE, nPairs = NULL )
X1 |
vector of observations of the first conditioned variable. |
X2 |
vector of observations of the second conditioned variable. |
Z |
vector or matrix of observations of the conditioning variable(s),
of dimension |
h |
the bandwidth. Can be a vector; in this case,
the components of |
cut |
the cutting level to keep a given pair or not.
Used only if no |
onlyConsecutivePairs |
if |
nPairs |
number of most relevant pairs to keep in the final datasets.
If this is different than the default |
A matrix with (4+dimZ)
columns and n*(n-1)/2
rows
if onlyConsecutivePairs=FALSE
and else (n/2)
rows.
It is structured in the following way:
column 1
contains the information
about the concordance of the pair (i,j) ;
columns 2
to 1+dimZ
contain
the mean value of Z (the conditioning variables) ;
column 2+dimZ
contains the value of the kernel K_h(Z_j - Z_i) ;
column 3+dimZ
and 4+dimZ
contain the corresponding values of i and j.
Derumigny, A., & Fermanian, J. D. (2019). A classification point-of-view about conditional Kendall’s tau. Computational Statistics & Data Analysis, 135, 70-94. (Algorithm 1 for all pairs and Algorithm 8 for the case of only consecutive pairs) doi:10.1016/j.csda.2019.01.013
the functions that require such a dataset of pairs
to do the estimation of conditional Kendall's tau:
CKT.fit.tree
, CKT.fit.randomForest
,
CKT.fit.GLM
, CKT.fit.nNets
,
CKT.predict.kNN
, and CKT.fit.randomForest
.
# We simulate from a conditional copula N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.9 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N = N , family = 3, par = VineCopula::BiCopTau2Par(1 , conditionalTau) ) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) datasetP = datasetPairs( X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
# We simulate from a conditional copula N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.9 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N = N , family = 3, par = VineCopula::BiCopTau2Par(1 , conditionalTau) ) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) datasetP = datasetPairs( X1 = X1, X2 = X2, Z = Z, h = 0.07, cut = 0.9)
This function computes an estimate of the conditional (marginal) cdf of X1 given a conditioning variable X3.
estimateCondCDF_matrix(observedX1, newX1, matrixK3)
estimateCondCDF_matrix(observedX1, newX1, matrixK3)
observedX1 |
a sample of observations of X1 of size |
newX1 |
a sample of new points for the variable X1, of size |
matrixK3 |
a matrix of kernel values of dimension |
This function is supposed to be used with computeKernelMatrix
.
Assume that we observe a sample .
We want to estimate the conditional cdf of
given
at point
using the following kernel-based estimator
for every in
newX1
and every in
newX3
.
The matrixK3
should be a matrix of the values
such as the one produced by
computeKernelMatrix(observedX3, newX3, kernel, h)
.
A matrix of dimensions (p1 = length(newX), p3 = length(matrixK3[,1]))
of estimators for every possible choices
of
.
Y = MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = cbind(c(1, 0.9), c(0.9, 1))) newY1 = seq(-1, 1, by = 0.5) newY2 = c(0, 1, 2) matrixK = computeKernelMatrix(observedX = Y[,2], newX = newY2, kernel = "Gaussian", h = 0.8) # In this matrix, there are the estimated conditionl cdf at points given by newY1 # conditionally to the points given by newY2. matrixCondCDF = estimateCondCDF_matrix(observedX1 = Y[,1], newX1 = newY1, matrixK) matrixCondCDF
Y = MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = cbind(c(1, 0.9), c(0.9, 1))) newY1 = seq(-1, 1, by = 0.5) newY2 = c(0, 1, 2) matrixK = computeKernelMatrix(observedX = Y[,2], newX = newY2, kernel = "Gaussian", h = 0.8) # In this matrix, there are the estimated conditionl cdf at points given by newY1 # conditionally to the points given by newY2. matrixCondCDF = estimateCondCDF_matrix(observedX1 = Y[,1], newX1 = newY1, matrixK) matrixCondCDF
This function computes an estimate of the conditional (marginal) cdf
of X1 given a conditioning variable X3.
This function is supposed to be used with computeKernelMatrix
.
Assume that we observe a sample .
We want to estimate the conditional cdf of
given
at point
using the following kernel-based estimator
for every couple where
in
newX1
and in
newX3
.
The matrixK3
should be a matrix of the values
such as the one produced by
computeKernelMatrix(observedX3, newX3, kernel, h)
.
estimateCondCDF_vec(observedX1, newX1, matrixK3)
estimateCondCDF_vec(observedX1, newX1, matrixK3)
observedX1 |
a sample of observations of X1 of size n |
newX1 |
a sample of new points for the variable X1, of size p1 |
matrixK3 |
a matrix of kernel values of dimension (p2 , n)
|
It returns a vector of length newX1
of estimators
for every couple
.
Y = MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = cbind(c(1, 0.9), c(0.9, 1))) newY1 = seq(-1, 1, by = 0.5) newY2 = newY1 matrixK = computeKernelMatrix(observedX = Y[,2], newX = newY2, kernel = "Gaussian", h = 0.8) vecCondCDF = estimateCondCDF_vec(observedX1 = Y[,1], newX1 = newY1, matrixK) vecCondCDF
Y = MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = cbind(c(1, 0.9), c(0.9, 1))) newY1 = seq(-1, 1, by = 0.5) newY2 = newY1 matrixK = computeKernelMatrix(observedX = Y[,2], newX = newY2, kernel = "Gaussian", h = 0.8) vecCondCDF = estimateCondCDF_vec(observedX1 = Y[,1], newX1 = newY1, matrixK) vecCondCDF
This function is supposed to be used with computeKernelMatrix
.
Assume that we observe a sample .
We want to estimate the conditional quantiles of
given
at point
using the following kernel-based estimator
where
for every in
probsX1
and every in
newX3
.
The matrixK3
should be a matrix of the values
such as the one produced by
computeKernelMatrix(observedX3, newX3, kernel, h)
.
estimateCondQuantiles(observedX1, probsX1, matrixK3)
estimateCondQuantiles(observedX1, probsX1, matrixK3)
observedX1 |
a sample of observations of X1 of size n |
probsX1 |
a sample of probabilities at which we want to compute the quantiles for the variable X1, of size p1 |
matrixK3 |
a matrix of kernel values of dimension (p2 , n)
|
A matrix of dimensions (p1,p2)
whose (i,j) entry is
with
=
probsX1[i]
and =
newX3[j]
,
where newX3[j]
is the vector that was used to construct matrixK3
.
Y = MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = cbind(c(1, 0.9), c(0.9, 1))) matrixK = computeKernelMatrix(observedX = Y[,2] , newX = c(0, 1, 2.5), kernel = "Gaussian", h = 0.8) matrixnp = estimateCondQuantiles(observedX1 = Y[,2], probsX1 = c(0.3, 0.5) , matrixK3 = matrixK) matrixnp
Y = MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = cbind(c(1, 0.9), c(0.9, 1))) matrixK = computeKernelMatrix(observedX = Y[,2] , newX = c(0, 1, 2.5), kernel = "Gaussian", h = 0.8) matrixnp = estimateCondQuantiles(observedX1 = Y[,2], probsX1 = c(0.3, 0.5) , matrixK3 = matrixK) matrixnp
Assuming that we observe a sample ,
this function returns a array
for each choice of (u_1, u_2, x_3).
estimateNPCondCopula( X1 = NULL, X2 = NULL, X3 = NULL, U1_, U2_, newX3, kernel, h, observedX1 = NULL, observedX2 = NULL, observedX3 = NULL )
estimateNPCondCopula( X1 = NULL, X2 = NULL, X3 = NULL, U1_, U2_, newX3, kernel, h, observedX1 = NULL, observedX2 = NULL, observedX3 = NULL )
X1 , X2 , X3
|
vectors of observations of size |
U1_ |
a vector of numbers in [0, 1] |
U2_ |
a vector of numbers in [0, 1] |
newX3 |
a vector of new values for the conditioning variable |
kernel |
a character string describing the kernel to be used.
Possible choices are |
h |
the bandwidth to use in the estimation. |
observedX1 , observedX2 , observedX3
|
old parameter names for |
An array of dimension (length(U1_, U2_, newX3))
whose element in position (i, j, k) is
where
= U1_[i],
= U2_[j] and
= newX3[k]
Derumigny, A., & Fermanian, J. D. (2017). About tests of the “simplifying” assumption for conditional copulas. Dependence Modeling, 5(1), 154-197. doi:10.1515/demo-2017-0011
estimateParCondCopula
for estimating a conditional
copula in a parametric setting ( = where the conditional copula is assumed to
belong to a parametric class).
simpA.NP
for a test that this conditional copula is
constant with respect to the value of the conditioning variable.
# We simulate from a conditional copula N = 500 X3 = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.9 * pnorm(X3, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 3, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) # We do the estimation grid = c(0.2, 0.4, 0.6, 0.8) arrayEst = estimateNPCondCopula( X1 = X1, X2 = X2, X3 = X3, U1_ = grid, U2_ = grid, newX3 = c(2, 5, 7), kernel = "Gaussian", h = 0.8) arrayEst
# We simulate from a conditional copula N = 500 X3 = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.9 * pnorm(X3, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 3, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) # We do the estimation grid = c(0.2, 0.4, 0.6, 0.8) arrayEst = estimateNPCondCopula( X1 = X1, X2 = X2, X3 = X3, U1_ = grid, U2_ = grid, newX3 = c(2, 5, 7), kernel = "Gaussian", h = 0.8) arrayEst
The function estimateParCondCopula
computes an estimate
of the conditional parameters in a conditional parametric copula model, i.e.
for some parametric family , some conditional
parameter
, and a three-dimensional random
vector
. Remember that
denotes the conditional copula of
and
given
.
The function estimateParCondCopula_ZIJ
is an auxiliary function
that is called when conditional pseudos-observations are already
available when one wants to estimate a parametric conditional copula.
estimateParCondCopula( X1 = NULL, X2 = NULL, X3 = NULL, newX3, family, method = "mle", h, observedX1 = NULL, observedX2 = NULL, observedX3 = NULL ) estimateParCondCopula_ZIJ(Z1_J, Z2_J, observedX3, newX3, family, method, h)
estimateParCondCopula( X1 = NULL, X2 = NULL, X3 = NULL, newX3, family, method = "mle", h, observedX1 = NULL, observedX2 = NULL, observedX3 = NULL ) estimateParCondCopula_ZIJ(Z1_J, Z2_J, observedX3, newX3, family, method, h)
X1 |
a vector of |
X2 |
a vector of |
X3 |
a vector of |
newX3 |
a vector of new observations of |
family |
an integer indicating the parametric family of copulas to be used, following the conventions of the VineCopula package, see e.g. VineCopula::BiCop. |
method |
the method of estimation of the conditional parameters.
Can be |
h |
bandwidth to be chosen |
observedX1 , observedX2 , observedX3
|
old parameter names for |
Z1_J |
the conditional pseudos-observations of the first variable,
i.e. |
Z2_J |
the conditional pseudos-observations of the second variable,
i.e. |
a vector of size length(newX3)
containing
the estimated conditional copula parameters for each value of newX3
.
Derumigny, A., & Fermanian, J. D. (2017). About tests of the “simplifying” assumption for conditional copulas. Dependence Modeling, 5(1), 154-197. doi:10.1515/demo-2017-0011
estimateNPCondCopula
for estimating a conditional
copula in a nonparametric setting ( = without parametric assumption on the
conditional copula).
simpA.param
for a test that this conditional copula is
constant with respect to the value of the conditioning variable.
# We simulate from a conditional copula N = 500 X3 = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.9 * pnorm(X3, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim( N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) gridnewX3 = seq(2, 8, by = 1) conditionalTauNewX3 = 0.9 * pnorm(gridnewX3, mean = 5, sd = 2) vecEstimatedThetas = estimateParCondCopula( X1 = X1, X2 = X2, X3 = X3, newX3 = gridnewX3, family = 1, h = 0.1) # Estimated conditional parameters vecEstimatedThetas # True conditional parameters VineCopula::BiCopTau2Par(1 , conditionalTauNewX3 ) # Estimated conditional Kendall's tau VineCopula::BiCopPar2Tau(1 , vecEstimatedThetas ) # True conditional Kendall's tau conditionalTauNewX3
# We simulate from a conditional copula N = 500 X3 = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.9 * pnorm(X3, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim( N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) gridnewX3 = seq(2, 8, by = 1) conditionalTauNewX3 = 0.9 * pnorm(gridnewX3, mean = 5, sd = 2) vecEstimatedThetas = estimateParCondCopula( X1 = X1, X2 = X2, X3 = X3, newX3 = gridnewX3, family = 1, h = 0.1) # Estimated conditional parameters vecEstimatedThetas # True conditional parameters VineCopula::BiCopTau2Par(1 , conditionalTauNewX3 ) # Estimated conditional Kendall's tau VineCopula::BiCopPar2Tau(1 , vecEstimatedThetas ) # True conditional Kendall's tau conditionalTauNewX3
This function computes Kendall's regression, a regression-like model for conditional Kendall's tau. More precisely, it fits the model
where is the conditional Kendall's tau
between
and
conditionally to
,
is a function from
to
,
are unknown coefficients to be estimated
and
are a dictionary of functions.
Then, this function tests the assumption
where the coefficient corresponding to the intercept is removed.
simpA.kendallReg( X1, X2, Z, vectorZToEstimate = NULL, listPhi = list(z = function(z) { return(z) }), typeEstCKT = 4, h_kernel, Lambda = function(x) { return(x) }, Lambda_deriv = function(x) { return(1) }, Lambda_inv = function(x) { return(x) }, lambda = NULL, h_lambda = h_kernel, Kfolds_lambda = 5, l_norm = 1 ) ## S3 method for class 'simpA_kendallReg_test' coef(object, ...) ## S3 method for class 'simpA_kendallReg_test' vcov(object, ...) ## S3 method for class 'simpA_kendallReg_test' print(x, ...) ## S3 method for class 'simpA_kendallReg_test' plot(x, ylim = c(-1.5, 1.5), ...)
simpA.kendallReg( X1, X2, Z, vectorZToEstimate = NULL, listPhi = list(z = function(z) { return(z) }), typeEstCKT = 4, h_kernel, Lambda = function(x) { return(x) }, Lambda_deriv = function(x) { return(1) }, Lambda_inv = function(x) { return(x) }, lambda = NULL, h_lambda = h_kernel, Kfolds_lambda = 5, l_norm = 1 ) ## S3 method for class 'simpA_kendallReg_test' coef(object, ...) ## S3 method for class 'simpA_kendallReg_test' vcov(object, ...) ## S3 method for class 'simpA_kendallReg_test' print(x, ...) ## S3 method for class 'simpA_kendallReg_test' plot(x, ylim = c(-1.5, 1.5), ...)
X1 |
vector of observations of the first conditioned variable |
X2 |
vector of observations of the second conditioned variable |
Z |
vector of observations of the conditioning variable |
vectorZToEstimate |
vector containing the points |
listPhi |
the list of transformations |
typeEstCKT |
the type of estimation of the kernel-based estimation of conditional Kendall's tau. |
h_kernel |
the bandwidth used for the kernel-based estimations. |
Lambda |
the function to be applied on conditional Kendall's tau. By default, the identity function is used. |
Lambda_deriv |
the derivative of the function |
Lambda_inv |
the inverse function of |
lambda |
the penalization parameter used for Kendall's regression.
By default, cross-validation is used to find the best value of |
h_lambda |
bandwidth used for the smooth cross-validation
in order to get a value for |
Kfolds_lambda |
the number of subsets used for the cross-validation
in order to get a value for |
l_norm |
type of norm used for selection of the optimal lambda
by cross-validation. |
object , x
|
an |
... |
other arguments, unused |
ylim |
graphical parameter, see plot |
simpA.kendallReg
returns an S3
object of
class simpA_kendallReg_test
, containing
statWn
: the value of the test statistic.
p_val
: the p-value of the test.
plot.simpA_kendallReg_test
returns (invisibly) a matrix with columns
z
, est_CKT_NP
, asympt_se_np
, est_CKT_NP_q025
,
est_CKT_NP_q975
, est_CKT_reg
, asympt_se_reg
,
est_CKT_reg_q025
, est_CKT_reg_q975
.
The first column correspond to the grid of values of z. The next 4 columns
are the NP (kernel-based) estimator of conditional Kendall's tau, with its
standard error, and lower/upper confidence bands. The last 4 columns are the
equivalents for the estimator based on Kendall's regression.
plot.simpA_kendallReg_test
plots the kernel-based estimator and its
confidence band (in red), and the estimator based on Kendall's regression
and its confidence band (in blue).
Usually the confidence band for Kendall's regression is much tighter than the pure non-parametric counterpart. This is because the parametric model is sparser and the corresponding estimator converges faster (even without penalization).
print.simpA_kendallReg_test
has no return values and is only called
for its side effects.
Function coef.simpA_kendallReg_test
returns the matrix of coefficients
with standard errors, z values and p-values.
Function vcov.simpA_kendallReg_test
returns the (estimated)
variance-covariance matrix of the estimated coefficients.
Derumigny, A., & Fermanian, J. D. (2020). On Kendall’s regression. Journal of Multivariate Analysis, 178, 104610. (page 7) doi:10.1016/j.jmva.2020.104610
The function to fit Kendall's regression:
CKT.kendallReg.fit
.
Other tests of the simplifying assumption:
simpA.NP
in a nonparametric setting
simpA.param
in a (semi)parametric setting,
where the conditional copula belongs to a parametric family,
but the conditional margins are estimated arbitrarily through
kernel smoothing
the counterparts of these tests in the discrete conditioning setting:
bCond.simpA.CKT
(test based on conditional Kendall's tau)
bCond.simpA.param
(test assuming a parametric form for the conditional copula)
# We simulate from a non-simplified conditional copula set.seed(1) N = 300 Z = runif(n = N, min = 0, max = 1) conditionalTau = -0.9 + 1.8 * Z simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) result = simpA.kendallReg( X1, X2, Z, h_kernel = 0.03, listPhi = list(z = function(z){return(z)} ) ) print(result) plot(result) # Obtain matrix of coefficients, std err, z values and p values coef(result) # Obtain variance-covariance matrix of the coefficients vcov(result) result_morePhi = simpA.kendallReg( X1, X2, Z, h_kernel = 0.03, listPhi = list( z = function(z){return(z)}, cos10z = function(z){return(cos(10 * z))}, sin10z = function(z){return(sin(10 * z))}, `1(z <= 0.4)` = function(z){return(as.numeric(z <= 0.4))}, `1(z <= 0.6)` = function(z){return(as.numeric(z <= 0.6))}) ) print(result_morePhi) plot(result_morePhi) # We simulate from a simplified conditional copula set.seed(1) N = 300 Z = runif(n = N, min = 0, max = 1) conditionalTau = -0.3 simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) result = simpA.kendallReg( X1, X2, Z, h_kernel = 0.03, listPhi = list( z = function(z){return(z)}, cos10z = function(z){return(cos(10 * z))}, sin10z = function(z){return(sin(10 * z))}, `1(z <= 0.4)` = function(z){return(as.numeric(z <= 0.4))}, `1(z <= 0.6)` = function(z){return(as.numeric(z <= 0.6))}) ) print(result) plot(result)
# We simulate from a non-simplified conditional copula set.seed(1) N = 300 Z = runif(n = N, min = 0, max = 1) conditionalTau = -0.9 + 1.8 * Z simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) result = simpA.kendallReg( X1, X2, Z, h_kernel = 0.03, listPhi = list(z = function(z){return(z)} ) ) print(result) plot(result) # Obtain matrix of coefficients, std err, z values and p values coef(result) # Obtain variance-covariance matrix of the coefficients vcov(result) result_morePhi = simpA.kendallReg( X1, X2, Z, h_kernel = 0.03, listPhi = list( z = function(z){return(z)}, cos10z = function(z){return(cos(10 * z))}, sin10z = function(z){return(sin(10 * z))}, `1(z <= 0.4)` = function(z){return(as.numeric(z <= 0.4))}, `1(z <= 0.6)` = function(z){return(as.numeric(z <= 0.6))}) ) print(result_morePhi) plot(result_morePhi) # We simulate from a simplified conditional copula set.seed(1) N = 300 Z = runif(n = N, min = 0, max = 1) conditionalTau = -0.3 simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1]) X2 = qnorm(simCopula[,2]) result = simpA.kendallReg( X1, X2, Z, h_kernel = 0.03, listPhi = list( z = function(z){return(z)}, cos10z = function(z){return(cos(10 * z))}, sin10z = function(z){return(sin(10 * z))}, `1(z <= 0.4)` = function(z){return(as.numeric(z <= 0.4))}, `1(z <= 0.6)` = function(z){return(as.numeric(z <= 0.6))}) ) print(result) plot(result)
This function tests the “simplifying assumption” that a conditional copula
does not depend on the
value of the conditioning variable in a nonparametric setting,
where the conditional copula is estimated by kernel smoothing.
simpA.NP( X1, X2, X3, testStat, typeBoot = "bootNP", h, nBootstrap = 100, kernel.name = "Epanechnikov", truncVal = h, numericalInt = list(kind = "legendre", nGrid = 10) )
simpA.NP( X1, X2, X3, testStat, typeBoot = "bootNP", h, nBootstrap = 100, kernel.name = "Epanechnikov", truncVal = h, numericalInt = list(kind = "legendre", nGrid = 10) )
X1 |
vector of |
X2 |
vector of |
X3 |
vector of |
testStat |
name of the test statistic to be used. Possible values are
|
typeBoot |
the type of bootstrap to be used (see Derumigny and Fermanian, 2017, p.165). Possible values are
|
h |
the bandwidth used for kernel smoothing |
nBootstrap |
number of bootstrap replications |
kernel.name |
the name of the kernel |
truncVal |
the value of truncation for the integral,
i.e. the integrals are computed from |
numericalInt |
parameters to be given to
|
a list containing
true_stat
: the value of the test statistic
computed on the whole sample
vect_statB
: a vector of length nBootstrap
containing the bootstrapped test statistics.
p_val
: the p-value of the test.
Derumigny, A., & Fermanian, J. D. (2017). About tests of the “simplifying” assumption for conditional copulas. Dependence Modeling, 5(1), 154-197. doi:10.1515/demo-2017-0011
Other tests of the simplifying assumption:
simpA.param
in a (semi)parametric setting,
where the conditional copula belongs to a parametric family,
but the conditional margins are estimated arbitrarily through
kernel smoothing
simpA.kendallReg
: test based on the constancy of
conditional Kendall's tau
the counterparts of these tests in the discrete conditioning setting:
bCond.simpA.CKT
(test based on conditional Kendall's tau)
bCond.simpA.param
(test assuming a parametric form for the conditional copula)
# We simulate from a conditional copula set.seed(1) N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1], mean = Z) X2 = qnorm(simCopula[,2], mean = - Z) result <- simpA.NP( X1 = X1, X2 = X2, X3 = Z, testStat = "I_chi", typeBoot = "boot.pseudoInd", h = 0.03, kernel.name = "Epanechnikov", nBootstrap = 10) # In practice, it is recommended to use at least nBootstrap = 100 # with nBootstrap = 200 being a good choice. print(result$p_val) set.seed(1) N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.8 simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1], mean = Z) X2 = qnorm(simCopula[,2], mean = - Z) result <- simpA.NP( X1 = X1, X2 = X2, X3 = Z, testStat = "I_chi", typeBoot = "boot.pseudoInd", h = 0.08, kernel.name = "Epanechnikov", nBootstrap = 10) print(result$p_val)
# We simulate from a conditional copula set.seed(1) N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1], mean = Z) X2 = qnorm(simCopula[,2], mean = - Z) result <- simpA.NP( X1 = X1, X2 = X2, X3 = Z, testStat = "I_chi", typeBoot = "boot.pseudoInd", h = 0.03, kernel.name = "Epanechnikov", nBootstrap = 10) # In practice, it is recommended to use at least nBootstrap = 100 # with nBootstrap = 200 being a good choice. print(result$p_val) set.seed(1) N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.8 simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1], mean = Z) X2 = qnorm(simCopula[,2], mean = - Z) result <- simpA.NP( X1 = X1, X2 = X2, X3 = Z, testStat = "I_chi", typeBoot = "boot.pseudoInd", h = 0.08, kernel.name = "Epanechnikov", nBootstrap = 10) print(result$p_val)
This function tests the “simplifying assumption” that a conditional copula
does not depend on the
value of the conditioning variable in a semiparametric setting,
where the conditional copula is of the form
for all and all
.
Here,
is a known family of copula and
is an unknown conditional dependence parameter.
In this setting, the simplifying assumption can be rewritten as
“
does not depend on
, i.e. is a constant
function of
”.
simpA.param( X1, X2, X3, family, testStat = "T2c", typeBoot = "boot.NP", h, nBootstrap = 100, kernel.name = "Epanechnikov", truncVal = h, numericalInt = list(kind = "legendre", nGrid = 10) )
simpA.param( X1, X2, X3, family, testStat = "T2c", typeBoot = "boot.NP", h, nBootstrap = 100, kernel.name = "Epanechnikov", truncVal = h, numericalInt = list(kind = "legendre", nGrid = 10) )
X1 |
vector of |
X2 |
vector of |
X3 |
vector of |
family |
the chosen family of copulas
(see the documentation of the class |
testStat |
name of the test statistic to be used.
The only choice implemented yet is |
typeBoot |
the type of bootstrap to be used. (see Derumigny and Fermanian, 2017, p.165). Possible values are
|
h |
the bandwidth used for kernel smoothing |
nBootstrap |
number of bootstrap replications |
kernel.name |
the name of the kernel |
truncVal |
the value of truncation for the integral,
i.e. the integrals are computed from |
numericalInt |
parameters to be given to
|
a list containing
true_stat
: the value of the test statistic
computed on the whole sample
vect_statB
: a vector of length nBootstrap
containing the bootstrapped test statistics.
p_val
: the p-value of the test.
Derumigny, A., & Fermanian, J. D. (2017). About tests of the “simplifying” assumption for conditional copulas. Dependence Modeling, 5(1), 154-197. doi:10.1515/demo-2017-0011
Other tests of the simplifying assumption:
simpA.NP
in a nonparametric setting
simpA.kendallReg
: test based on the constancy of
conditional Kendall's tau
the counterparts of these tests in the discrete conditioning setting:
bCond.simpA.CKT
(test based on conditional Kendall's tau)
bCond.simpA.param
(test assuming a parametric form for the conditional copula)
# We simulate from a conditional copula set.seed(1) N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1], mean = Z) X2 = qnorm(simCopula[,2], mean = - Z) result <- simpA.param( X1 = X1, X2 = X2, X3 = Z, family = 1, h = 0.03, kernel.name = "Epanechnikov", nBootstrap = 5) print(result$p_val) # In practice, it is recommended to use at least nBootstrap = 100 # with nBootstrap = 200 being a good choice. set.seed(1) N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.8 simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1], mean = Z) X2 = qnorm(simCopula[,2], mean = - Z) result <- simpA.param( X1 = X1, X2 = X2, X3 = Z, family = 1, h = 0.08, kernel.name = "Epanechnikov", nBootstrap = 5) print(result$p_val)
# We simulate from a conditional copula set.seed(1) N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2) simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1], mean = Z) X2 = qnorm(simCopula[,2], mean = - Z) result <- simpA.param( X1 = X1, X2 = X2, X3 = Z, family = 1, h = 0.03, kernel.name = "Epanechnikov", nBootstrap = 5) print(result$p_val) # In practice, it is recommended to use at least nBootstrap = 100 # with nBootstrap = 200 being a good choice. set.seed(1) N = 500 Z = rnorm(n = N, mean = 5, sd = 2) conditionalTau = 0.8 simCopula = VineCopula::BiCopSim(N=N , family = 1, par = VineCopula::BiCopTau2Par(1 , conditionalTau )) X1 = qnorm(simCopula[,1], mean = Z) X2 = qnorm(simCopula[,2], mean = - Z) result <- simpA.param( X1 = X1, X2 = X2, X3 = Z, family = 1, h = 0.08, kernel.name = "Epanechnikov", nBootstrap = 5) print(result$p_val)