Title: | Distribution-Free Goodness-of-Fit Testing for Regression |
---|---|
Description: | Implements distribution-free goodness-of-fit regression testing for the mean structure of parametric models introduced in Khmaladze (2021) <doi:10.1007/s10463-021-00786-3>. |
Authors: | Jesse Miller [aut, cre] |
Maintainer: | Jesse Miller <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-02-17 03:17:07 UTC |
Source: | https://github.com/cran/distfreereg |
Implements distribution-free goodness-of-fit regression testing for the mean structure of parametric models introduced in Khmaladze (2021) <doi:10.1007/s10463-021-00786-3>.
The DESCRIPTION file:
Package: | distfreereg |
Type: | Package |
Title: | Distribution-Free Goodness-of-Fit Testing for Regression |
Version: | 1.0.1 |
Depends: | R (>= 4.4) |
Imports: | clue, methods, numDeriv |
Suggests: | knitr, Matrix, mvtnorm, rmarkdown |
VignetteBuilder: | knitr |
Date: | 2024-11-12 |
Authors@R: | person(given = "Jesse", family = "Miller", role = c("aut", "cre"), email = "[email protected]") |
Maintainer: | Jesse Miller <[email protected]> |
Description: | Implements distribution-free goodness-of-fit regression testing for the mean structure of parametric models introduced in Khmaladze (2021) <doi:10.1007/s10463-021-00786-3>. |
License: | GPL-3 |
NeedsCompilation: | no |
Packaged: | 2024-11-14 22:40:13 UTC; Jesse |
Author: | Jesse Miller [aut, cre] |
Date/Publication: | 2024-11-18 23:50:16 UTC |
Repository: | https://jm-umn.r-universe.dev |
RemoteUrl: | https://github.com/cran/distfreereg |
RemoteRef: | HEAD |
RemoteSha: | 11b2169aa72c3b8b308b172f42a16bc7226382e1 |
Index of help topics:
coef.distfreereg Extract Estimated Parameters from 'distfreereg' Objects compare Compare the simulated statistic distribution with the observed statistic distribution used in distribution-free parametric regression testing confint.distfreereg Calculate Confidence Intervals with a 'distfreereg' Object distfreereg Distribution-Free Parametric Regression Testing distfreereg-package Distribution-Free Goodness-of-Fit Testing for Regression fitted.distfreereg Extract Fitted Values from 'distfreereg' Objects formula.distfreereg Extract Formulas from 'distfreereg' Objects ks.test.compare Formally Compare Observed and Simulated Statistics plot.compare Summary and Diagnostic Plots for 'compare' Objects plot.distfreereg Summary and Diagnostic Plots for 'distfreereg' Objects predict.distfreereg Generate Predicted Values from 'distfreereg' Objects print.distfreereg Printing 'distfreereg' Objects rejection Compute Rejection Rates of a Distribution-Free Test residuals.distfreereg Extract Residuals from 'distfreereg' Objects update.distfreereg Update 'distfreereg' Objects vcov.distfreereg Calculate Covariance Matrices from 'distfreereg' Objects
Further information is available in the following vignettes:
v1_introduction |
An Introduction to the `distfreereg` Package (source, pdf) |
v2_compare |
Comparing Distributions with the `distfreereg` Package (source, pdf) |
v3_plotting |
Plotting with the `distfreereg` Package (source, pdf) |
v4_parameter-estimation |
Parameter Estimation (source, pdf) |
v5_advanced-options |
Advanced Options for `distfreereg` Package (source, pdf) |
Jesse Miller [aut, cre]
Maintainer: Jesse Miller <[email protected]>
distfreereg
Objects
This is a coef
method for objects of class distfreereg
. It extracts the estimated parameters from a model in a distfreereg
object.
## S3 method for class 'distfreereg' coef(object, ...)
## S3 method for class 'distfreereg' coef(object, ...)
object |
Object of class |
... |
Additional parameters passed to or from other methods. Currently ignored. |
Numeric vector of estimated model parameters.
Jesse Miller
distfreereg
, vcov.distfreereg
, confint.distfreereg
Simulate response data repeatedly with true_mean
as the mean and true_covariance
as the covariance structure, each time running distfreereg
on the simulated data. The observed statistics and p-values are saved, as are the simulated statistics from the first replication.
See the Comparing Distributions with the distfreereg
Package vignette for an introduction.
compare(true_mean, true_method = NULL, true_method_args = NULL, true_covariance, true_X = NULL, true_data = NULL, theta, n = NULL, reps = 1e3, prog = reps/10, err_dist_fun = rmvnorm, err_dist_args = NULL, keep = NULL, manual = NULL, update_args = NULL, global_override = NULL, ...)
compare(true_mean, true_method = NULL, true_method_args = NULL, true_covariance, true_X = NULL, true_data = NULL, theta, n = NULL, reps = 1e3, prog = reps/10, err_dist_fun = rmvnorm, err_dist_args = NULL, keep = NULL, manual = NULL, update_args = NULL, global_override = NULL, ...)
true_mean |
Object specifying the mean structure of the true model. It is used to generate the true values of |
true_method |
Character vector of length one; specifies the method to use to create a model. Valid options are " |
true_method_args |
Optional list; values are passed to the function specified by |
true_covariance |
Named list; specifies the covariance structures of the true error distribution in the format described in the documentation for the |
true_X , true_data
|
Optional numeric matrix or data frame, respectively; specifies the covariate values for the true model. |
theta |
Numeric vector; used as the (true) parameter values for the model with mean function |
n |
Optional integer; indicates how long each simulated data vector should be. Required only when no covariate values are specified for either the true or test mean. Silently converted to integer if numeric. |
reps |
Integer; specifies number of replications. Silently converted to integer if numeric. |
prog |
Integer or |
err_dist_fun |
Function; specifies the function to be used to simulate errors. See details. |
err_dist_args |
Optional list; specifies additional named arguments to pass to |
keep |
A vector of integers, or the character string " |
manual |
Optional function; applied to the |
update_args |
Optional named list; specifies arguments to pass to |
global_override |
Optional named list; specifies arguments to pass to the |
... |
Additional arguments passed to |
This function allows the user to explore the asymptotic behavior of the distributions involved in the test conducted by distfreereg
. If the sample size is sufficiently large, and assuming that the true covariance matrix of the errors is known, then the observed and simulated statistics have nearly the same distribution. How large the sample size must be depends on the details of the situation. This function can be used to determine how large the sample size must be to obtain approximately equal distributions, and to estimate the power of the test against a specific alternative.
The user specifies a particular true model, comprising a mean function true_mean
and an error generating function err_dist_fun
, to generate the data. The user also specifies a test model, comprising a mean function test_mean
and a covariance structure specified by covariance
, to pass to distfreereg
to test. For each repetition, compare
simulates data using true_mean
as the mean function and err_dist_fun
to generate the errors. The covariance matrix of the errors is specified using true_covariance
. (See below for more details.) This simulated data is passed as Y
(or as part of data
) to distfreereg
.
The true_covariance
argument specifies the covariance structure that is available to err_dist_fun
for generating errors. The needs of err_dist_fun
can vary (for example, the default function uses SqrtSigma
to generate multivariate normal errors), so any one of the matrices Sigma
, SqrtSigma
, P
, and Q
(defined in the documentation of distfreereg
) can be specified. Any matrix needed by err_dist_fun
is calculated automatically if not supplied.
The value of err_dist_fun
must be a function whose output is a numeric matrix with n
rows and reps
columns. Each column is used as the vector of errors in one repetition. The error function's arguments can include the special values n
, reps
, Sigma
, SqrtSigma
, P
, and Q
. These arguments are automatically assigned their corresponding values from the values passed to compare
. For example, the default value rmvnorm
uses SqrtSigma
to generate multivariate normal values with mean 0 and covariance Sigma
.
The argument keep
is useful for diagnosing problems, but caution should be used lest a very large object be created. It is often sufficient to save the distfreereg
objects from only the first few replications.
For more specialized needs, the manual
argument allows the calculation and saving of objects during each repetition. For example, using manual = function(x) residuals(x)
will save the (raw) residuals from each repetition.
The first repetition creates a distfreereg
object. During each subsequent repetition, this object is passed to update.distfreereg
to create a new object. The update_args
argument can be used to modify this call.
If necessary, global_override
can be used to pass an override
argument to distfreereg
in each repetition. For example, using gobal_override = list(theta_hat = theta)
forces the estimated parameter vector used in the test in each call to be the true parameter vector theta
.
An object of class compare
with the following components:
call |
The matched call. |
Y_mean |
The vector of mean values for the simulated responses. |
errors |
The matrix whose columns contain the errors used for the corresponding repetitions. |
theta |
Supplied vector of parameter values. |
true_mean |
Supplied object specifying the true mean function. |
true_covariance |
List containing element(s) that specify the true covariance structure. |
true_X |
Supplied matrix of true covariate values. |
true_data |
Supplied data frame of true covariate values. |
test_mean |
Supplied object specifying the mean function being tested. |
covariance |
List containing element(s) that specify the test covariance structure. |
X |
Supplied matrix of test covariate values. |
data |
Supplied data frame of test covariate values. |
observed_stats |
The observed statistics collected in each repetition. |
mcsim_stats |
The simulated statistics from the first repetition. (They are the same for each repetition, because |
p |
The p-values for the observed statistics. |
dfrs |
A list containing the outputs of |
manual |
A list containing the results of the function specified by the argument |
Some of the processing of the elements of true_covariance
is analogous to the processing of covariance
by distfreereg
. Any values of solve_tol
and symmetric
specified in distfreereg
's control
argument are used by compare
to similar effect in processing true_covariance
.
The presence of call
in the value allows a compare
object to be passed to update
.
Jesse Miller
distfreereg
, rejection
, plot.compare
, ks.test.compare
set.seed(20240201) n <- 100 func <- function(X, theta) theta[1] + theta[2]*X Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1] theta <- c(2,5) X <- matrix(rexp(n, rate = 1)) # In practice, 'reps' should be much larger cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig), test_mean = func, X = X, covariance = list(Sigma = Sig), reps = 10, prog = Inf, theta = theta, theta_init = rep(1, length(theta))) cdfr$p
set.seed(20240201) n <- 100 func <- function(X, theta) theta[1] + theta[2]*X Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1] theta <- c(2,5) X <- matrix(rexp(n, rate = 1)) # In practice, 'reps' should be much larger cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig), test_mean = func, X = X, covariance = list(Sigma = Sig), reps = 10, prog = Inf, theta = theta, theta_init = rep(1, length(theta))) cdfr$p
distfreereg
Object
This is a confint
method for objects of class distfreereg
. It calculates confidence intervals for the estimated parameters of a model in a distfreereg
object.
## S3 method for class 'distfreereg' confint(object, parm, level = 0.95, ..., digits = 3)
## S3 method for class 'distfreereg' confint(object, parm, level = 0.95, ..., digits = 3)
object |
Object of class |
parm |
Numeric or character vector; specifies which parameters are to be given confidence intervals. If missing, all parameters are considered. |
level |
Numeric vector of length one; specifies the confidence level. |
... |
Additional parameters passed to other methods. Currently ignored. |
digits |
Numeric vector of length one; used to format percentage labels. Silently converted to an integer. |
This is a slight reworking of confint.default
. The primary difference is that when object$test_mean
is a function, the return value from vcov.distfreereg
is included in the output, since its calculation can be computationally expensive and this prevents users from needing to call vcov
separately for its output.
If object$test_mean
is not a function, then the output is a named numeric matrix each row of which gives the endpoints of the requested confidence interval of its corresponding parameter. If object$test_mean
is a function, then the output is a named list with the previously defined matrix as its first element and the output of vcov(object)
as its second.
If object
was created by calling distfreereg.default
directly, there is no estimated parameter vector, and therefore confint.distfreereg
does not apply.
Jesse Miller
Conduct distribution-free parametric regression testing using the process introduced in Khmaladze (2021). A parametric model for the conditional mean (specified by test_mean
) is checked against the data by fitting the model, transforming the resulting residuals, and then calculating a statistic on the empirical partial sum process of the transformed residuals. The statistic's null distribution can be simulated in a straight-forward way, thereby producing a p-value.
Using to denote the mean function being tested, the specific test has the following null and alternative hypotheses:
See the An Introduction to the distfreereg
Package vignette for an introduction.
distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ...) ## Default S3 method: distfreereg(test_mean = NULL, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ..., Y, X = NULL, covariance, J, fitted_values) ## S3 method for class 'formula' distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ..., data, covariance = NULL, method = "lm", theta_init = NULL) ## S3 method for class 'function' distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ..., Y, X = NULL, covariance, theta_init) ## S3 method for class 'lm' distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ...) ## S3 method for class 'nls' distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ...)
distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ...) ## Default S3 method: distfreereg(test_mean = NULL, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ..., Y, X = NULL, covariance, J, fitted_values) ## S3 method for class 'formula' distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ..., data, covariance = NULL, method = "lm", theta_init = NULL) ## S3 method for class 'function' distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ..., Y, X = NULL, covariance, theta_init) ## S3 method for class 'lm' distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ...) ## S3 method for class 'nls' distfreereg(test_mean, ordering = "simplex", group = FALSE, stat = c("KS", "CvM"), B = 1e4, control = NULL, override = NULL, verbose = TRUE, ...)
test_mean |
A specification of the mean function to be tested. Methods exist for objects of classes |
covariance |
Named list; specifies the covariance structure of the model's error distribution. Valid element names are "
See details. |
ordering |
A character string or a list; specifies how to order the residuals to form the empirical partial sum process. Valid character strings are:
If |
group |
Logical; if |
J |
Numeric matrix; specifies the Jacobian of the function evaluated at the covariates and the estimated parameters. |
fitted_values |
Numeric vector; specifies the model's fitted values. |
stat |
Character vector; specifies the names of the functions used to calculate the desired statistics. By default, a Kolmogorov–Smirnov statistic and a Cramer–von Mises-like statistic are calculated:
where |
B |
Numeric vector of length one; specifies the Monte Carlo sample size used when simulating statistics. Silently converted to integer. |
control |
Optional named list of elements that control the details of the algorithm's computations. The following elements are accepted for all methods:
The following named elements, all but the first of which control the process of calculating the generalized least squares estimation of the parameter vector, are accepted for the
Finally, the following element is available for the |
override |
Optional named list of arguments that override internally calculated values. Used primarily by
|
verbose |
Logical; if |
... |
Additional arguments to pass to various methods; should be empty except in a call to the generic function. |
Y |
Numeric vector of observations. A matrix value is silently converted to a vector. |
X |
Optional numeric matrix of covariates. A vector value is converted to a single-column matrix with a warning. |
method |
Character vector; specifies the function to use for fitting the model when |
theta_init |
Numeric vector; specifies the starting parameter values passed to the optimizing function to be used to estimate the parameter vector. Must be |
data |
Optional data frame of covariate values; required for |
This function implements distribution-free parametric regression testing. The model is specified by a mean structure and a covariance structure.
The mean structure is specified by the argument test_mean
. This can be a function, formula, lm
object, nls
object, or NULL
.
If test_mean
is a function, then it must have one or two arguments: either theta
only, or theta
and either X
(uppercase) or x
(lowercase). An uppercase X
is interpreted in the function definition as a matrix, while a lowercase x
is interpreted as a vector. (See examples and this vignette.) The primary reason to use a lowercase x
is to allow for a function definition using an R
function that is not vectorized. In general, an uppercase X
should be preferred for speed.
If test_mean
is an lm
or nls
object, then the covariance structure is obtained from the supplied model.
If test_mean
is a formula, then it must be a formula that can be passed to lm
or nls
, and the data
argument must be specified. The appropriate model will be created, and then sent back to distfreereg()
for method dispatch.
The function
method estimates parameter values, and then uses those to evaluate the Jacobian of the mean function and to calculate fitted values. It then calls the default method, which does not use test_mean
. The default method also allows the user to implement the algorithm even when the mean structure is not specified in R
. (This is useful if a particularly complicated function is defined in another language and cannot easily be copied into R
.) It requires specifying the vector of fitted values and the Jacobian matrix of the mean function evaluated at the estimated parameter values.
The covariance structure for must be specified using the
covariance
argument for the function
and default methods. It is optional for the formula
method; when present in that case, it must specify a diagonal matrix which is converted internally into a vector of weights. For the lm
and nls
methods, the covariance is determined using the supplied object.
Any element of covariance
can be a numeric matrix, or a numeric vector. If it is a vector, its length must be either 1 or the sample size. This option is mathematically equivalent to setting a covariance list element to a diagonal matrix with the specified value(s) along the diagonal. Using vectors, when possible, is more efficient than using the corresponding matrix.
Internally, distfreereg()
only needs Q
, so some efficiency can be gained by supplying that directly when available. When Q
is not specified, it is calculated using whichever element is specified. When more than one of the other elements are specified, Q
is calculated using the least expensive path, with no warning given if the specified elements are incompatible. (For example, if both Sigma
and SqrtSigma
elements are supplied to covariance
, then Q
is calculated using SqrtSigma
with no attempt to verify that SqrtSigma
is the matrix square root ofSigma
.)
The override
argument is used primarily by update.distfreereg
to avoid unnecessary and potentially computationally expensive recomputation. This update
method imports appropriate values automatically from a previously created object of class distfreereg
, and therefore validation is not always done. Use manually with caution.
An object of class distfreereg
with the following components:
call |
The matched call. |
data |
A list containing |
test_mean |
The value supplied to the argument |
model |
The model built when using the |
covariance |
The list of covariance matrices, containing at least |
theta_hat |
The estimated parameter vector. |
optimization_output |
The output of |
fitted_values |
The vector of fitted values, |
J |
The Jacobian matrix. |
mu |
The mu matrix. |
r |
The matrix of transformation anchor vectors. |
r_tilde |
The matrix of modified transformation anchor vectors. |
residuals |
A named list of three vectors containing raw, sphered, and transformed residuals. |
res_order |
A numeric vector indicating the ordering of the residuals used to form the empirical partial sum process, in a format analogous to the output of |
epsp |
The empirical partial sum process formed by calculating the scaled
partial sums of the transformed residuals ordered according to |
observed_stat |
A named list of the observed statistic(s) corresponding to the transformed residuals. |
mcsim_stats |
A named list, each element of which contains the values of a simulated statistic. |
p |
A named list with two elements: |
Consistency between test_mean
and theta_init
is verified only
indirectly. Uninformative errors can occur when, for example, theta_init
does not have the correct length. The two most common error messages that arise
in this case are "f_out cannot have NA values
", indicating that
theta_init
is too short, and "Unable to invert square root of J^tJ
",
indicating that theta_init
is too long. (Both of these errors might occur
for other reasons, as well.) To be safe, always define test_mean
to use
every element of theta
.
No verification of consistency is done when multiple elements of coviariance
are specified. For example, if P
and Sigma
are both specified, then the code will use only one of these, and will not verify that P
is the inverse of Sigma
.
When using the control
argument element optimization_fun
to
specify an optimization function other than optim
, the
verification that theta_hat_name
actually matches the name of an element
of the optimization function's output is done only after the optimization has
been done. If this optimization will likely take a long time, it is important to
verify the value of theta_hat_name
before running distfreereg()
.
Jesse Miller
Khmaladze, Estate V. Distribution-free testing in linear and parametric regression, 2021-03, Annals of the Institute of Statistical Mathematics, Vol. 73, No. 6, p. 1063–1087. doi:10.1007/s10463-021-00786-3
coef.distfreereg
, confint.distfreereg
, fitted.distfreereg
, formula.distfreereg
, plot.distfreereg
, predict.distfreereg
, print.distfreereg
, residuals.distfreereg
, update.distfreereg
, vcov.distfreereg
set.seed(20240218) n <- 1e2 func <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2] Sig <- runif(n, min = 1, max = 3) theta <- c(2,5) X <- matrix(runif(2*n, min = 1, max = 5), nrow = n) Y <- X[,1]^theta[1] + theta[2]*X[,2] + rnorm(n, sd = sqrt(Sig)) (dfr <- distfreereg(Y = Y, X = X, test_mean = func, covariance = list(Sigma = Sig), theta_init = c(1,1))) func_lower <- function(x, theta) x[1]^theta[1] + theta[2]*x[2] (dfr_lower <- distfreereg(Y = Y, X = X, test_mean = func_lower, covariance = list(Sigma = Sig), theta_init = c(1,1))) identical(dfr$observed_stats, dfr_lower$observed_stats)
set.seed(20240218) n <- 1e2 func <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2] Sig <- runif(n, min = 1, max = 3) theta <- c(2,5) X <- matrix(runif(2*n, min = 1, max = 5), nrow = n) Y <- X[,1]^theta[1] + theta[2]*X[,2] + rnorm(n, sd = sqrt(Sig)) (dfr <- distfreereg(Y = Y, X = X, test_mean = func, covariance = list(Sigma = Sig), theta_init = c(1,1))) func_lower <- function(x, theta) x[1]^theta[1] + theta[2]*x[2] (dfr_lower <- distfreereg(Y = Y, X = X, test_mean = func_lower, covariance = list(Sigma = Sig), theta_init = c(1,1))) identical(dfr$observed_stats, dfr_lower$observed_stats)
distfreereg
Objects
This is a fitted
method for objects of class distfreereg
.
## S3 method for class 'distfreereg' fitted(object, ...)
## S3 method for class 'distfreereg' fitted(object, ...)
object |
Object of class |
... |
Additional parameters passed to or from other methods. Currently ignored. |
Numeric vector of fitted values.
Jesse Miller
distfreereg
Objects
This is a formula
method for objects of class distfreereg
. It extracts the formula from a model in a distfreereg
object.
## S3 method for class 'distfreereg' formula(x, ...)
## S3 method for class 'distfreereg' formula(x, ...)
x |
Object of class |
... |
Additional parameters passed to or from other methods. Currently ignored. |
Formula extracted from x$test_mean
, or NULL
if such a formula cannot be extracted.
Jesse Miller
This is a ks.test
method for objects of class compare
. It performs a two-sample Kolmogorov–Smirnov test to compare the observed and simulated statistics in an object of class compare
.
## S3 method for class 'compare' ks.test(x, ..., stat = NULL)
## S3 method for class 'compare' ks.test(x, ..., stat = NULL)
x |
Object of class |
... |
Additional parameters passed to |
stat |
Character string specifying the statistic on which to run the test. |
When stat
is NULL
, the default value is the first statistic appearing in the observed_stats
element of object
.
A list of the form specified in ks.test
.
Jesse Miller
# In practice, set "reps" larger than 200. set.seed(20240201) n <- 100 func <- function(X, theta) theta[1] + theta[2]*X Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1] theta <- c(2,5) X <- matrix(rexp(n, rate = 1)) cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig), test_mean = func, X = X, covariance = list(Sigma = Sig), reps = 200, prog = Inf, theta = theta, theta_init = rep(1, length(theta))) ks.test(cdfr) ks.test(cdfr, stat = "CvM")
# In practice, set "reps" larger than 200. set.seed(20240201) n <- 100 func <- function(X, theta) theta[1] + theta[2]*X Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1] theta <- c(2,5) X <- matrix(rexp(n, rate = 1)) cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig), test_mean = func, X = X, covariance = list(Sigma = Sig), reps = 200, prog = Inf, theta = theta, theta_init = rep(1, length(theta))) ks.test(cdfr) ks.test(cdfr, stat = "CvM")
compare
Objects
This is a plot
method for objects of class compare
. It automates the creation of four summary and diagnostic plots for compare
objects.
## S3 method for class 'compare' plot(x, y, ..., which = "cdf", stat = NULL, hlines = NULL, curve_args = NULL, confband_args = FALSE, density_args = NULL, poly = NULL, legend = NULL, qqline = NULL)
## S3 method for class 'compare' plot(x, y, ..., which = "cdf", stat = NULL, hlines = NULL, curve_args = NULL, confband_args = FALSE, density_args = NULL, poly = NULL, legend = NULL, qqline = NULL)
x |
Object of class |
y |
Optional object of class |
... |
Additional parameters passed to a plotting function depending on the
value of |
which |
Character string. Acceptable values are "
|
stat |
Character string, specifies the statistic to plot. |
hlines |
An optional list of arguments to pass to |
curve_args |
An optional list used to pass arguments to |
confband_args |
An optional list of values that control the calculation and plotting of confidence bands when
Setting equal to |
density_args |
An optional list of arguments passed to |
poly |
An optional list of arguments passed to |
legend |
An optional list of arguments passed to |
qqline |
An optional list of arguments passed to |
This function produces a plot of a type specified by which
. The values
plotted depend on whether or not y
is present and the value of
which
. When y
is present, the plots compare the observed
statistics in x
and the observed statistics in y
. When y
is missing, the plots compare the observed and simulated statistics in
x
. (The exception is when which
is "qqp
", which is only
available when y
is missing.)
When which
is "cdf
" or "dens
", the plotting region and
associated labels, tick marks, etc., are created by an initial call to
plot
. The curves themselves are drawn with lines
.
The arguments specified in ...
are passed to the initial call to
plot
.
The values used to create the curves (or points, in the case of a Q–Q plot) are returned invisibly. The details depend on the value of which
:
cdf
: A list with two or four elements, all lists. The first two sub-lists contain the - and
-values cdf curves. If confidence bands are plotted, then two additional elements are included with output from the confidence band calculations, including elements
w
, cb_lower
, and cb_upper
, which contain, respectively, the -coordinates for both the upper and lower bounds of the band, the
-coordinates for the lower band, and the
-coordinates for the upper band.
dens
: A list with two or four elements, all lists. The first two sub-lists contain - and
-values for the density curves. If confidence bands are plotted, then two additional sub-lists are supplied, with contents identical to what is described for "
cdf
".
qq
, qqp
: The output of qqplot
.
For "cdf
" and "dens
", the names of the elements of the returned list depend on whether or not a value for the argument y
was supplied.
Jesse Miller
Flegal, James M. et al. Simultaneous confidence bands for (Markov chain) Monte Carlo simulations, forthcoming.
distfreereg
Objects
This is a plot
method for objects of class distfreereg
. It automates the creation of three summary and diagnostic plots for distfreereg
objects.
## S3 method for class 'distfreereg' plot(x, which = "dens", stat = NULL, density_args = NULL, polygon_args = NULL, confband_args = NULL, abline_args = NULL, shade_col = rgb(1,0,0,0.5), text_args = NULL, ...)
## S3 method for class 'distfreereg' plot(x, which = "dens", stat = NULL, density_args = NULL, polygon_args = NULL, confband_args = NULL, abline_args = NULL, shade_col = rgb(1,0,0,0.5), text_args = NULL, ...)
x |
Object of class |
which |
Character string. Acceptable values are "
|
stat |
Character vector of length one specifying the name of the statistic to plot
when |
density_args |
An optional list of arguments to pass to |
polygon_args |
An optional list of arguments to pass to |
confband_args |
An optional list of values that control the calculation and plotting of confidence bands. Any of the following named elements are allowed.
Setting equal to |
abline_args |
An optional list of arguments to pass to |
shade_col |
Character string or other value specifying the color to use to shade the upper tail of the distribution. Default value is red with 50% transparency. This is a convenience argument, and the same functionality is available by defining a |
text_args |
An optional list of arguments to pass to |
... |
Additional arguments to pass to |
This function produces one of three specified plots, depending on the value of which
.
When which
is "dens
", a plot of the estimated density of the simulated statistics is produced, including a vertical line at the value of the observed test statistic with the p-value displayed.
The default placement of the p-value text is on the left side of the line indicating the statistic value. Specifically, the default values of x
and y
passed to text
are the statistic value itself and the midpoint between zero and the maximum value of the density curve. The default value passed to adj
is c(1,0.5)
, meaning that the text is aligned to the left of the value and centered vertically on it. (The default value for the text itself, which can be modified via the
label
argument of text
, includes a space on the left and the right for padding so the text does not overlap the vertical line itself.) To align the text so it appears on the right side (for example, to avoid overlapping the density curve), use text_args = list(adj = c(0,0.5))
. See documentation for text
for details on this and other arguments.
When which
is "residuals
", a time-series-like plot is produced showing transformed residuals in the order given by x$res_order
. In the case that the null hypothesis is rejected, this plot can help determine where (in terms of the linearly ordered covariates) a discrepancy between the model and the data occurs.
When which
is "epsp
", a plot of the empirical partial sum process is produced; that is, the -values are
where is the
th transformed residual in the order given by
x$res_order
. Similar to the case when which
is "residuals
", this plot can help determine where (in terms of the linearly ordered covariates) a discrepancy between the model and the data occurs.
When which
is "dens
", the values used to create the density plot are returned invisibly in a list with two named elements, x
and y
. If the confidence band is plotted, then it is included as an element named confband
.
For other values of which
, nothing is returned.
Jesse Miller
Flegal, James M. et al. Simultaneous confidence bands for (Markov chain) Monte Carlo simulations, forthcoming.
distfreereg
Objects
This is a predict
method for objects of class distfreereg
.
## S3 method for class 'distfreereg' predict(object, ..., newdata)
## S3 method for class 'distfreereg' predict(object, ..., newdata)
object |
Object of class |
... |
Additional parameters affecting the predictions produced. Currently ignored. |
newdata |
Optional matrix or data frame of new covariate values. If missing, the fitted values are returned. |
When object$test_mean
is of class "lm
" or "nls
", object$test_mean
is sent to predict
for method dispatch. When object$test_mean
is of class "formula
", object$model
is sent to predict
.
Numeric vector of predicted values.
Jesse Miller
distfreereg
Objects
This is a print
method for objects of class distfreereg
.
## S3 method for class 'distfreereg' print(x, ..., digits = 3, col_sep = 2)
## S3 method for class 'distfreereg' print(x, ..., digits = 3, col_sep = 2)
x |
Object of class |
... |
Additional parameters, currently ignored. |
digits |
Integer; passed to |
col_sep |
Integer; specifies the padding (in units of spaces) between columns in the printed table of statistics. |
This function prints a useful summary of the distfreereg
object.
No return value (NULL
).
Jesse Miller
Compute the rejection rates of the tests simulated in a compare
object. Specifically, this function estimates the rejection rates of the tests conducted with specified statistics of the hypothesis that the mean function is test_mean
when the true mean function is true_mean
.
rejection(object, alpha = 0.05, stat = names(object[["observed_stats"]]), ...)
rejection(object, alpha = 0.05, stat = names(object[["observed_stats"]]), ...)
object |
Object of class |
alpha |
Numeric vector; specifies the |
stat |
Character vector; specifies the names of the statistics to use. The default value computes the rejection rate associated with every statistic in |
... |
Additional arguments to pass to |
Data frame containing estimated rejection rates and associated Monte Carlo standard errors, with one row for each combination of stat
and alpha
elements.
The reported Monte Carlo standard error does not account for the uncertainty of the estimation of the quantiles of the distribution of simulated statistics. The number of Monte Carlo simulations should be large enough to make this estimate sufficiently accurate that it can be considered known for practical purposes. The standard errors of estimated quantiles can be calculated using the mcmcse package.
Jesse Miller
# In practice, set "reps" larger than 200. set.seed(20240201) n <- 100 func <- function(X, theta) theta[1] + theta[2]*X Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1] theta <- c(2,5) X <- matrix(rexp(n, rate = 1)) cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig), test_mean = func, X = X, covariance = list(Sigma = Sig), reps = 200, prog = Inf, theta = theta, theta_init = rep(1, length(theta))) rejection(cdfr) rejection(cdfr, stat = "CvM") rejection(cdfr, alpha = c(0.1, 0.2))
# In practice, set "reps" larger than 200. set.seed(20240201) n <- 100 func <- function(X, theta) theta[1] + theta[2]*X Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1] theta <- c(2,5) X <- matrix(rexp(n, rate = 1)) cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig), test_mean = func, X = X, covariance = list(Sigma = Sig), reps = 200, prog = Inf, theta = theta, theta_init = rep(1, length(theta))) rejection(cdfr) rejection(cdfr, stat = "CvM") rejection(cdfr, alpha = c(0.1, 0.2))
distfreereg
Objects
This is a residuals
method for objects of class distfreereg
. It can extract any of the three available types of residuals.
## S3 method for class 'distfreereg' residuals(object, ..., type = "raw")
## S3 method for class 'distfreereg' residuals(object, ..., type = "raw")
object |
Object of class |
... |
Additional parameters passed to or from other methods. Currently ignored. |
type |
Character string specifying the type of residuals to return. Must be one of
" |
Numeric vector of residuals.
Jesse Miller
distfreereg
Objects
This is an update
method for objects of class distfreereg
. The method takes advantage of the override
argument of distfreereg
to prevent unnecessary recalculation of potentially computationally expensive objects.
## S3 method for class 'distfreereg' update(object, ..., smart = TRUE, envir = parent.frame())
## S3 method for class 'distfreereg' update(object, ..., smart = TRUE, envir = parent.frame())
object |
Object of class |
... |
Additional named parameters to pass to |
smart |
Logical. If |
envir |
Environment passed to |
This function updates an object of class distfreereg
. By default, it does so "intelligently" in the sense that it does not unnecessarily recompute elements that are already saved in object
. For example, if a new value for covariance
is not included in ...
, then the value of covariance
saved in object
is automatically passed to the new call, preventing recalculating Q
. If a new value of covariance
is specified, then all objects dependent on that (e.g., ) are recomputed.
In particular, the simulated samples depend on the data and function only through the number of observations, the covariates (if any), and the dimension of the parameter space of the function. If none of these change, then the updated object reuses the simulated samples from the supplied object
.
The price we pay for this efficiency is a potentially "large" value of call
in the updated object.
An updated object of class distfreereg
.
The default behavior of update
is to create an updated call and then evaluate that call. This means, among other things, that a call to update
made after one of the arguments in the original call is modified will use the modified version of that argument. This is not always true for update.distfreereg
. Values for the override arguments are drawn from the distfreereg
object itself, not from any objects used as values for the original call to distfreereg
.
In general, an object created by update.distfreereg
will therefore not be identical
to the object created by distfreereg
using corresponding arguments, because the call
values will differ.
Jesse Miller
set.seed(20240218) n <- 1e2 func <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2] Sig <- runif(n, min = 1, max = 3) theta <- c(2,5) X <- matrix(runif(2*n, min = 1, max = 5), nrow = n) Y <- X[,1]^theta[1] + theta[2]*X[,2] + rnorm(n, sd = sqrt(Sig)) dfr_1 <- distfreereg(Y = Y, X = X, test_mean = func, covariance = list(Sigma = Sig), theta_init = c(1,1)) func_updated <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2]^2 dfr_2 <- update(dfr_1, test_mean = func_updated) # The following are identical, since the Monte Carlo simulation did not need to # be rerun for dfr_2, and was therefore copied from dfr_1. identical(dfr_1$mcsim_stats, dfr_2$mcsim_stats)
set.seed(20240218) n <- 1e2 func <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2] Sig <- runif(n, min = 1, max = 3) theta <- c(2,5) X <- matrix(runif(2*n, min = 1, max = 5), nrow = n) Y <- X[,1]^theta[1] + theta[2]*X[,2] + rnorm(n, sd = sqrt(Sig)) dfr_1 <- distfreereg(Y = Y, X = X, test_mean = func, covariance = list(Sigma = Sig), theta_init = c(1,1)) func_updated <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2]^2 dfr_2 <- update(dfr_1, test_mean = func_updated) # The following are identical, since the Monte Carlo simulation did not need to # be rerun for dfr_2, and was therefore copied from dfr_1. identical(dfr_1$mcsim_stats, dfr_2$mcsim_stats)
distfreereg
Objects
This is a vcov
method for objects of class distfreereg
. It calculates an estimated covariance matrix of the estimated parameters in a model from a distfreereg
object.
## S3 method for class 'distfreereg' vcov(object, ..., jacobian_args, hessian_args)
## S3 method for class 'distfreereg' vcov(object, ..., jacobian_args, hessian_args)
object |
Object of class |
... |
Additional parameters passed to other methods when |
jacobian_args , hessian_args
|
Lists of additional arguments to pass to |
When the test_mean
element of object
is of class function
, the covariance matrix is estimated using the method described in section 5.3 of Van der Vaart (1998). Otherwise, test_mean
is of a class that has its own method for vcov
, which is used to calculate the output.
Named numeric matrix equal to the estimated covariance matrix of the parameter estimates from object
.
This calculation can be computationally intensive when the sample size is large and object$test_mean
is a function.
If object
was created by calling distfreereg.default
directly, there is no estimated parameter vector, and therefore vcov.distfreereg
does not apply.
Jesse Miller
Vaart, A. W. Asymptotic statistics, 2007, Cambridge series on statistical and probabilistic mathematics, Cambridge University Press.
distfreereg
, confint.distfreereg