| Title: | Bootstrap-Based Hypothesis Testing using Different Resampling Schemes |
|---|---|
| Description: | Perform bootstrap-based hypothesis testing procedures on three statistical problems. In particular, it covers independence testing, testing the slope in a linear regression setting, and goodness-of-fit testing, following (Derumigny, Galanis, Schipper and Van der Vaart, 2025) <doi:10.48550/arXiv.2512.10546>. |
| Authors: | Alexis Derumigny [aut] (ORCID: <https://orcid.org/0000-0002-6163-8097>), Miltiadis Galanis [aut], Wieger Schipper [aut, cre] (ORCID: <https://orcid.org/0009-0004-5661-4949>), Aad van der Vaart [aut] (ORCID: <https://orcid.org/0000-0002-8074-2375>) |
| Maintainer: | Wieger Schipper <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-04 07:36:42 UTC |
| Source: | https://github.com/alexisderumigny/bootstraptests |
This function performs a bootstrap goodness-of-fit hypothesis test for a
specific univariate parametric family. The null hypothesis corresponds to the
sample coming from the specified parametric family, while the alternative
hypothesis corresponds to the sample not coming from the specified
parametric family. This function implements a parametric bootstrap and
a non-parametric bootstrap. The test statistic is the Kolmogorov-Smirnov test
statistic. To estimate the parameters of the parametric family, either a minimum
distance estimator, or a MLE estimator (the sample mean and variance)
is used. On the bootstrap sample, we have also implemented a centered MD estimator,
as in the paper. For now, only a test of normality is implemented. This function
gives the corresponding p-values, the true test statistic and the
bootstrap-version test statistics. The default (and valid) method implemented
in this function is the parametric bootstrap, together with the equivalent test statistic
and the MLE parameter estimator. Via the bootstrapOptions
argument, the user can specify other bootstrap resampling schemes,
test statistics, and parameter estimators.
perform_GoF_test( X_data, parametric_fam = "normal", nBootstrap = 100, mygrid = NULL, show_progress = TRUE, bootstrapOptions = NULL, verbose = 0 )perform_GoF_test( X_data, parametric_fam = "normal", nBootstrap = 100, mygrid = NULL, show_progress = TRUE, bootstrapOptions = NULL, verbose = 0 )
X_data |
numerical input vector. Perform a GoF test whether or not this
sample comes from |
parametric_fam |
name of the parametric family. For the moment, only
|
nBootstrap |
numeric value of the number of bootstrap resamples. Defaults to 100. |
mygrid |
description of the grid used to compute the CDFs on. This must be one of
|
show_progress |
logical value indicating whether to show a progress bar |
bootstrapOptions |
This can be one of
A warning is raised if the given combination of |
verbose |
If |
A class object with components
pvals_df a dataframe of p-values and bootstrapped test statistics:
These are the p-values for the combinations of bootstrap resampling schemes, test statistics (centered and equivalent), and different parameter estimators.
It also contains the vectors of bootstrap test statistics for each of these combinations.
true_stat a named vector of size 2 containing the true test
statistics. The first entry is the Kolmogorov-Smirnov test statistic for
the Minimum Distance estimator, and the second entry is the Kolmogorov-Smirnov
test statistic for the MLE parameter estimator.
nBootstrap number of bootstrap repetitions.
nameMethod string for the name of the method used.
Derumigny, A., Galanis, M., Schipper, W., & van der Vaart, A. (2025). Bootstrapping not under the null? ArXiv preprint, doi:10.48550/arXiv.2512.10546
perform_regression_test,perform_independence_test.
The print and plot methods, such as plot.bootstrapTest.
n <- 100 # Under H1 X_data <- rgamma(n,2,3) result <- perform_GoF_test(X_data, nBootstrap = 100, bootstrapOptions = list(type_boot = "param", type_stat = "eq", type_estimator_bootstrap = "MLE") ) print(result) plot(result) # Under H0 X_data <- rnorm(n) result <- perform_GoF_test(X_data, nBootstrap = 100) print(result) plot(result)n <- 100 # Under H1 X_data <- rgamma(n,2,3) result <- perform_GoF_test(X_data, nBootstrap = 100, bootstrapOptions = list(type_boot = "param", type_stat = "eq", type_estimator_bootstrap = "MLE") ) print(result) plot(result) # Under H0 X_data <- rnorm(n) result <- perform_GoF_test(X_data, nBootstrap = 100) print(result) plot(result)
Perform a hypothesis test of statistical independence by means of bootstrapping.
The null hypothesis is that of independence between the two random variables,
versus the alternative of dependence between them.
This procedure gives a total of 8 combinations of bootstrap resampling schemes
(nonparametric and independent), test statistics (centered and equivalent),
and Kolmogorov-Smirnov or L2-type of true test statistic. This function
gives the corresponding p-values, the true test statistic and the
bootstrap-version test statistics. The default (and valid) method implemented
in this function is the null bootstrap, together with the equivalent test
statistic and Kolmogorov-Smirnov test statistic.
Via the bootstrapOptions argument, the user can specify other
bootstrap resampling schemes and test statistics.
perform_independence_test( X1, X2, my_grid = NULL, nBootstrap = 100, show_progress = TRUE, bootstrapOptions = NULL )perform_independence_test( X1, X2, my_grid = NULL, nBootstrap = 100, show_progress = TRUE, bootstrapOptions = NULL )
X1, X2
|
numerical vectors of the same size. The independence test tests
whether |
my_grid |
the grid on which the CDFs are estimated. This must be one of
|
nBootstrap |
number of bootstrap repetitions. |
show_progress |
logical value indicating whether to show a progress bar |
bootstrapOptions |
This can be one of
A warning is raised if the given combination of |
A class object with components
pvals_df: a dataframe of p-values and bootstrapped test statistics:
These are the p-values for the 8 combinations of bootstrap resampling schemes (nonparametric and independent), test statistics (centered and equivalent), and Kolmogorov-Smirnov or L2-type of true test statistic.
It also contains the vectors of bootstrap test statistics for each of the combinations.
true_stats a named vector of size 2 containing the true test
statistics for the L2 and KS distances.
nBootstrap Number of bootstrap repetitions.
nameMethod string for the name of the method used.
Derumigny, A., Galanis, M., Schipper, W., & van der Vaart, A. (2025). Bootstrapping not under the null? ArXiv preprint, doi:10.48550/arXiv.2512.10546
perform_GoF_test,perform_regression_test.
The print and plot methods, such as plot.bootstrapTest.
n <- 100 # Under H1 X1 <- rnorm(n) X2 <- X1 + rnorm(n) result <- perform_independence_test( X1, X2, nBootstrap = 50, bootstrapOptions = list(type_boot = "indep", type_stat = "eq", type_norm = "KS") ) print(result) plot(result) # Under H0 X1 <- rnorm(n) X2 <- rnorm(n) result <- perform_independence_test(X1, X2, nBootstrap = 50) print(result) plot(result)n <- 100 # Under H1 X1 <- rnorm(n) X2 <- X1 + rnorm(n) result <- perform_independence_test( X1, X2, nBootstrap = 50, bootstrapOptions = list(type_boot = "indep", type_stat = "eq", type_norm = "KS") ) print(result) plot(result) # Under H0 X1 <- rnorm(n) X2 <- rnorm(n) result <- perform_independence_test(X1, X2, nBootstrap = 50) print(result) plot(result)
This function performs a bootstrap regression test for given data X,Y.
The null hypothesis corresponds of a slope coefficient of zero, versus the
alternative hypothesis of a non-zero slope coefficient.
It uses an independence/null bootstrap "indep", a non-parametric "NP",
a residual bootstrap "res_bs", a fixed design bootstrap "fixed_design_bs",
a fixed design null bootstrap "fixed_design_bs_Hnull", a hybrid null
bootstrap "hybrid_null_bs" as bootstrap resampling schemes to perform
the bootstrap. This function gives the corresponding p-values, the true test
statistic and the bootstrap-version test statistics. Furthermore, it also
gives the estimated slope.The default (and valid) method implemented
in this function is the null bootstrap, together with the equivalent test
statistic. Via the bootstrapOptions argument, the user can specify other
bootstrap resampling schemes and test statistics.
perform_regression_test( X, Y, nBootstrap = 100, show_progress = TRUE, bootstrapOptions = NULL )perform_regression_test( X, Y, nBootstrap = 100, show_progress = TRUE, bootstrapOptions = NULL )
X |
numeric univariate input vector resembling the independent variables |
Y |
numeric univariate input vector the dependent variables |
nBootstrap |
numeric value of the amount of bootstrap resamples |
show_progress |
logical value indicating whether to show a progress bar |
bootstrapOptions |
This can be one of
A warning is raised if the given combination of |
A class object with components
pvals_df a dataframe of p-values and bootstrapped test statistics:
These are the p-values for the combinations of bootstrap resampling schemes, test statistics (centered and equivalent).
It also contains the vectors of bootstrap test statistics for each of the combinations.
true_stat a named vector of size 1 containing the true test
statistic.
nBootstrap Number of bootstrap repetitions.
data named list of the used input data, i.e. X and Y.
nameMethod string for the name of the method used.
beta numeric value of the estimated slope of the regression model.
Derumigny, A., Galanis, M., Schipper, W., & van der Vaart, A. (2025). Bootstrapping not under the null? ArXiv preprint, doi:10.48550/arXiv.2512.10546
perform_GoF_test,perform_independence_test.
The print and plot methods, such as plot.bootstrapTest.
n <- 100 # Under H1 X_data <- rnorm(n) Y_data <- X_data + rnorm(n) #Y = X + epsilon result <- perform_regression_test(X_data, Y_data, nBootstrap = 100, bootstrapOptions = list(type_boot = "indep", type_stat = "eq")) print(result) plot(result) # Under H0 X_data <- rnorm(n) Y_data <- 0 * X_data + rnorm(n) # (as b = 0 under H0) result <- perform_regression_test(X_data, Y_data, nBootstrap = 100) print(result) plot(result)n <- 100 # Under H1 X_data <- rnorm(n) Y_data <- X_data + rnorm(n) #Y = X + epsilon result <- perform_regression_test(X_data, Y_data, nBootstrap = 100, bootstrapOptions = list(type_boot = "indep", type_stat = "eq")) print(result) plot(result) # Under H0 X_data <- rnorm(n) Y_data <- 0 * X_data + rnorm(n) # (as b = 0 under H0) result <- perform_regression_test(X_data, Y_data, nBootstrap = 100) print(result) plot(result)
The plot and print methods work for objects of class bootstrapTest.
The print method prints the summary of the bootstrap test results.
The plot method plots the distribution of bootstrapped test statistics
as a histogram, with the true test statistic and the 95
bootstrapped test statistics highlighted. In the regression test case, the
estimated regression line is plotted as well.
## S3 method for class 'bootstrapTest' plot( x, xlim = NULL, breaks = NULL, legend.x = NULL, legend.y = NULL, ask = interactive(), plot_estimated_line = NULL, ... ) ## S3 method for class 'bootstrapTest' print(x, ...)## S3 method for class 'bootstrapTest' plot( x, xlim = NULL, breaks = NULL, legend.x = NULL, legend.y = NULL, ask = interactive(), plot_estimated_line = NULL, ... ) ## S3 method for class 'bootstrapTest' print(x, ...)
x |
an object of class |
xlim |
limits for the x-axis of the histogram |
breaks |
breaks for the histogram |
legend.x |
position of the legend on the x-axis |
legend.y |
position of the legend on the y-axis |
ask |
if |
plot_estimated_line |
Boolean describing whether to plot the estimated
regression line in case |
... |
additional arguments passed to the |
These functions have no return value and are called solely for their side effects.
Derumigny, A., Galanis, M., Schipper, W., & van der Vaart, A. (2025). Bootstrapping not under the null? ArXiv preprint, doi:10.48550/arXiv.2512.10546
perform_independence_test, perform_GoF_test,
perform_regression_test,
which are the functions that generate such object x.