Title: | Bound on the Error of the First-Order Edgeworth Expansion |
---|---|
Description: | Computes uniform bounds on the distance between the cumulative distribution function of a standardized sum of random variables and its first-order Edgeworth expansion, following the article Derumigny, Girard, Guyonvarch (2023) <doi:10.1007/s13171-023-00320-y>. |
Authors: | Alexis Derumigny [aut, cre] , Lucas Girard [aut], Yannick Guyonvarch [aut] |
Maintainer: | Alexis Derumigny <[email protected]> |
License: | GPL-3 |
Version: | 0.1.2.2 |
Built: | 2024-11-03 05:36:39 UTC |
Source: | https://github.com/alexisderumigny/boundedgeworth |
This function returns a valid value \(\delta_n\) for the bound \[\sup_{x \in \mathbb{R}} \left| \textrm{Prob}(S_n \leq x) - \Phi(x) \right| \leq \delta_n, \]
Bound_BE( setup = list(continuity = FALSE, iid = FALSE, no_skewness = FALSE), n, K4 = 9, K3 = NULL, lambda3 = NULL, K3tilde = NULL, regularity = list(C0 = 1, p = 2), eps = 0.1 )
Bound_BE( setup = list(continuity = FALSE, iid = FALSE, no_skewness = FALSE), n, K4 = 9, K3 = NULL, lambda3 = NULL, K3tilde = NULL, regularity = list(C0 = 1, p = 2), eps = 0.1 )
setup |
logical vector of size 3 made up of the following components:
|
n |
sample size ( = number of random variables that appear in the sum). |
K4 |
bound on the 4th normalized moment of the random variables. We advise to use K4 = 9 as a general case which covers most “usual” distributions. |
K3 |
bound on the 3rd normalized moment.
If not given, an upper bound on |
lambda3 |
(average) skewness of the variables.
If not given, an upper bound on \(abs(lambda3)\)
will be derived from the value of |
K3tilde |
value of
\[
K_{3,n} + \frac{1}{n}\sum_{i=1}^n
\mathbb{E}|X_i| \sigma_{X_i}^2 / \overline{B}_n^3\]
where \(\overline{B}_n := \sqrt{(1/n) \sum_{i=1}^n E[X_i^2]}\).
If not given, an upper bound on |
regularity |
list of length up to 3
(only used in the
|
eps |
a value between 0 and 1/3 on which several terms depends.
Any value of |
where \(X_1, \dots, X_n\) be \(n\) independent centered variables, and \(S_n\) be their normalized sum, in the sense that \(S_n := \sum_{i=1}^n X_i / \textrm{sd}(\sum_{i=1}^n X_i)\). This bounds follows from the triangular inequality and the bound on the difference between a cdf and its 1st-order Edgeworth Expansion.
Note that the variables \(X_1, \dots, X_n\) must be independent
but may have different distributions (if setup$iid = FALSE
).
A vector of the same size as n
with values \(\delta_n\)
such that
\[\sup_{x \in \mathbb{R}}
\left| \textrm{Prob}(S_n \leq x) - \Phi(x) \right|
\leq \delta_n.
\]
Derumigny A., Girard L., and Guyonvarch Y. (2023). Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion, Sankhya A. doi:10.1007/s13171-023-00320-y arxiv:2101.05780.
Bound_EE1()
for a bound on the distance
to the first-order Edgeworth expansion.
setup = list(continuity = FALSE, iid = FALSE, no_skewness = FALSE) regularity = list(C0 = 1, p = 2, kappa = 0.99) computedBound_EE1 <- Bound_EE1( setup = setup, n = 150, K4 = 9, regularity = regularity, eps = 0.1 ) computedBound_BE <- Bound_BE( setup = setup, n = 150, K4 = 9, regularity = regularity, eps = 0.1 ) print(c(computedBound_EE1, computedBound_BE))
setup = list(continuity = FALSE, iid = FALSE, no_skewness = FALSE) regularity = list(C0 = 1, p = 2, kappa = 0.99) computedBound_EE1 <- Bound_EE1( setup = setup, n = 150, K4 = 9, regularity = regularity, eps = 0.1 ) computedBound_BE <- Bound_BE( setup = setup, n = 150, K4 = 9, regularity = regularity, eps = 0.1 ) print(c(computedBound_EE1, computedBound_BE))
This function computes a non-asymptotically uniform bound on the difference between the cdf of a normalized sum of random variables and its 1st order Edgeworth expansion. It returns a valid value \(\delta_n\) such that \[ \sup_{x \in \mathbb{R}} \left| \textrm{Prob}(S_n \leq x) - \Phi(x) - \frac{\lambda_{3,n}}{6\sqrt{n}}(1-x^2) \varphi(x) \right| \leq \delta_n,\] where \(X_1, \dots, X_n\) be \(n\) independent centered variables, and \(S_n\) be their normalized sum, in the sense that \(S_n := \sum_{i=1}^n X_i / \textrm{sd}(\sum_{i=1}^n X_i)\). Here \(\lambda_{3,n}\) denotes the average skewness of the variables \(X_1, \dots, X_n\).
Bound_EE1( setup = list(continuity = FALSE, iid = FALSE, no_skewness = FALSE), n, K4 = 9, K3 = NULL, lambda3 = NULL, K3tilde = NULL, regularity = list(C0 = 1, p = 2), eps = 0.1, verbose = 0 )
Bound_EE1( setup = list(continuity = FALSE, iid = FALSE, no_skewness = FALSE), n, K4 = 9, K3 = NULL, lambda3 = NULL, K3tilde = NULL, regularity = list(C0 = 1, p = 2), eps = 0.1, verbose = 0 )
setup |
logical vector of size 3 made up of the following components:
|
n |
sample size ( = number of random variables that appear in the sum). |
K4 |
bound on the 4th normalized moment of the random variables. We advise to use K4 = 9 as a general case which covers most “usual” distributions. |
K3 |
bound on the 3rd normalized moment.
If not given, an upper bound on |
lambda3 |
(average) skewness of the variables.
If not given, an upper bound on \(abs(lambda3)\)
will be derived from the value of |
K3tilde |
value of
\[
K_{3,n} + \frac{1}{n}\sum_{i=1}^n
\mathbb{E}|X_i| \sigma_{X_i}^2 / \overline{B}_n^3\]
where \(\overline{B}_n := \sqrt{(1/n) \sum_{i=1}^n E[X_i^2]}\).
If not given, an upper bound on |
regularity |
list of length up to 3
(only used in the
|
eps |
a value between 0 and 1/3 on which several terms depends.
Any value of |
verbose |
if it is |
Note that the variables \(X_1, \dots, X_n\) must be independent
but may have different distributions (if setup$iid = FALSE
).
A vector of the same size as n
with values \(\delta_n\)
such that
\[
\sup_{x \in \mathbb{R}}
\left| \textrm{Prob}(S_n \leq x) - \Phi(x)
- \frac{\lambda_{3,n}}{6\sqrt{n}}(1-x^2) \varphi(x) \right|
\leq \delta_n.\]
Derumigny A., Girard L., and Guyonvarch Y. (2023). Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion, Sankhya A. doi:10.1007/s13171-023-00320-y arxiv:2101.05780.
Bound_BE()
for a Berry-Esseen bound.
Gauss_test_powerAnalysis()
for a power analysis of the classical
Gauss test that is uniformly valid based on this bound on the Edgeworth
expansion.
setup = list(continuity = TRUE, iid = FALSE, no_skewness = TRUE) regularity = list(C0 = 1, p = 2) computedBound <- Bound_EE1( setup = setup, n = c(150, 2000), K4 = 9, regularity = regularity, eps = 0.1 ) setup = list(continuity = TRUE, iid = TRUE, no_skewness = TRUE) regularity = list(kappa = 0.99) computedBound2 <- Bound_EE1( setup = setup, n = c(150, 2000), K4 = 9, regularity = regularity, eps = 0.1 ) setup = list(continuity = FALSE, iid = FALSE, no_skewness = TRUE) computedBound3 <- Bound_EE1( setup = setup, n = c(150, 2000), K4 = 9, eps = 0.1 ) setup = list(continuity = FALSE, iid = TRUE, no_skewness = TRUE) computedBound4 <- Bound_EE1( setup = setup, n = c(150, 2000), K4 = 9, eps = 0.1 ) print(computedBound) print(computedBound2) print(computedBound3) print(computedBound4)
setup = list(continuity = TRUE, iid = FALSE, no_skewness = TRUE) regularity = list(C0 = 1, p = 2) computedBound <- Bound_EE1( setup = setup, n = c(150, 2000), K4 = 9, regularity = regularity, eps = 0.1 ) setup = list(continuity = TRUE, iid = TRUE, no_skewness = TRUE) regularity = list(kappa = 0.99) computedBound2 <- Bound_EE1( setup = setup, n = c(150, 2000), K4 = 9, regularity = regularity, eps = 0.1 ) setup = list(continuity = FALSE, iid = FALSE, no_skewness = TRUE) computedBound3 <- Bound_EE1( setup = setup, n = c(150, 2000), K4 = 9, eps = 0.1 ) setup = list(continuity = FALSE, iid = TRUE, no_skewness = TRUE) computedBound4 <- Bound_EE1( setup = setup, n = c(150, 2000), K4 = 9, eps = 0.1 ) print(computedBound) print(computedBound2) print(computedBound3) print(computedBound4)
Let \(X_1, \dots, X_n\) be \(n\) i.i.d. variables
with mean \(\mu\), variance \(\sigma^2\).
Assume that we want to test the hypothesis
\(H_0: \mu \leq \mu_0\) against the alternative \(H_1: \mu \leq \mu_0\).
For this, we want to use the classical Gauss test, which rejects the null hypothesis
if \(\sqrt{n}(\bar{X}_n - \mu)\) is larger than the quantile of the Gaussian
distribution at level \(1 - \alpha\).
Let \(\eta := (\mu - \mu_0) / \sigma\) be the effect size,
i.e. the distance between the null and the alternative hypotheses,
measured in terms of standard deviations.
Let beta
be the uniform power of this test:
\[
beta = \inf_{H_1} \textrm{Prob}(\textrm{Rejection}),\]
where the infimum is taken over all distributions under the alternative hypothesis, i.e.
that have mean \(\mu = \mu_0 + \eta \sigma\), bounded kurtosis K4
,
and that satisfy the regularity condition kappa
described below.
This means that this power beta
is uniformly valid over
a large (infinite-dimensional) class of alternative distributions,
much beyond the Gaussian family even though the test is based on the Gaussian quantile.
There is a relation between the sample size n
, the effect size eta
and the uniform power beta
of this test.
This function takes as an input two of the three quantities
(the sample size n
, the effect size eta
, and the uniform power
beta
) and return the other one.
Gauss_test_powerAnalysis( eta = NULL, n = NULL, beta = NULL, alpha = 0.05, K4 = 9, kappa = 0.99 )
Gauss_test_powerAnalysis( eta = NULL, n = NULL, beta = NULL, alpha = 0.05, K4 = 9, kappa = 0.99 )
eta |
the effect size \(\eta\) that characterizes the alternative hypothesis |
n |
sample size |
beta |
the power of detecting the effect |
alpha |
the level of the test |
K4 |
the kurtosis of the \(X_i\) |
kappa |
Regularity parameter of the distribution of the \(X_i\)
It corresponds to a bound on the modulus of the characteristic function
\(f_{X_n / \sigma_n}(t)\) of the standardized \(X_n\).
More precisely, |
This function can be used to plan experiments, for example to know what would be a sufficient sample size to attain a fixed power against a given effect size that the researcher would like to detect.
Note that the results given by this function are formally valid only for the Gauss test (i.e., when the variance of the distribution is assumed to be known).
The computed value of either the sufficient sample size n
,
or the minimum effect size eta
, or the power beta
.
Derumigny A., Girard L., and Guyonvarch Y. (2023). Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion, Sankhya A. doi:10.1007/s13171-023-00320-y arxiv:2101.05780.
# Sufficient sample size to detect an effect of 0.5 standard deviation with probability 80% Gauss_test_powerAnalysis(eta = 0.5, beta = 0.8) # We can detect an effect of 0.5 standard deviations with probability 80% for n >= 548 # Power of an experiment to detect an effect of 0.5 with a sample size of n = 800 Gauss_test_powerAnalysis(eta = 0.5, n = 800) # We can detect an effect of 0.5 standard deviations with probability 85.1% for n = 800 # Smallest effect size that can be detected with a probability of 80% for a sample size of n = 800 Gauss_test_powerAnalysis(n = 800, beta = 0.8) # We can detect an effect of 0.114 standard deviations with probability 80% for n = 800
# Sufficient sample size to detect an effect of 0.5 standard deviation with probability 80% Gauss_test_powerAnalysis(eta = 0.5, beta = 0.8) # We can detect an effect of 0.5 standard deviations with probability 80% for n >= 548 # Power of an experiment to detect an effect of 0.5 with a sample size of n = 800 Gauss_test_powerAnalysis(eta = 0.5, n = 800) # We can detect an effect of 0.5 standard deviations with probability 85.1% for n = 800 # Smallest effect size that can be detected with a probability of 80% for a sample size of n = 800 Gauss_test_powerAnalysis(n = 800, beta = 0.8) # We can detect an effect of 0.114 standard deviations with probability 80% for n = 800