@@ -2,17 +2,27 @@ | |||
% Please edit documentation in R/convenience_functions.R | |||
\name{chisq.test} | |||
\alias{chisq.test} | |||
\title{Chi-square test} | |||
\alias{chisq.test.default} | |||
\alias{chisq.test.formula} | |||
\title{Pearson's Chi-squared Test for Count Data} | |||
\source{ | |||
The code for Monte Carlo simulation is a C translation of the Fortran algorithm of Patefield (1981). | |||
} | |||
\usage{ | |||
chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), | |||
rescale.p = FALSE, simulate.p.value = FALSE, B = 2000) | |||
chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B) | |||
\method{chisq.test}{default}(x, y = NULL, correct = TRUE, | |||
p = rep(1/length(x), length(x)), rescale.p = FALSE, | |||
simulate.p.value = FALSE, B = 2000) | |||
\method{chisq.test}{formula}(x, y = NULL, correct = T, | |||
p = rep(1/length(x), length(x)), rescale.p = F, simulate.p.value = F, | |||
B = 2000) | |||
} | |||
\arguments{ | |||
\item{x}{a numeric vector or matrix. \code{x} and \code{y} can also | |||
both be factors.} | |||
\item{x}{a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. ‘x’ and ‘y’ can also both be factors.} | |||
\item{y}{a numeric vector; ignored if \code{x} is a matrix. If | |||
\code{x} is a factor, \code{y} should be a factor of the same length.} | |||
\item{y}{a numeric vector; ignored if ‘x’ is a matrix or a formula. If ‘x’ is a factor, ‘y’ should be a factor of the same length.} | |||
\item{correct}{a logical indicating whether to apply continuity | |||
correction when computing the test statistic for 2 by 2 tables: one | |||
@@ -33,9 +43,129 @@ chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), | |||
\item{B}{an integer specifying the number of replicates used in the | |||
Monte Carlo test.} | |||
} | |||
\value{ | |||
A list with class ‘"htest"’ containing the following components: | |||
statistic: the value the chi-squared test statistic. | |||
parameter: the degrees of freedom of the approximate chi-squared | |||
distribution of the test statistic, ‘NA’ if the p-value is | |||
computed by Monte Carlo simulation. | |||
p.value: the p-value for the test. | |||
method: a character string indicating the type of test performed, and | |||
whether Monte Carlo simulation or continuity correction was | |||
used. | |||
data.name: a character string giving the name(s) of the data. | |||
observed: the observed counts. | |||
expected: the expected counts under the null hypothesis. | |||
residuals: the Pearson residuals, ‘(observed - expected) / | |||
sqrt(expected)’. | |||
stdres: standardized residuals, ‘(observed - expected) / sqrt(V)’, | |||
where ‘V’ is the residual cell variance (Agresti, 2007, | |||
section 2.4.5 for the case where ‘x’ is a matrix, ‘n * p * (1 | |||
- p)’ otherwise). | |||
} | |||
\description{ | |||
Chi-square test | |||
‘chisq.test’ performs chi-squared contingency table tests and goodness-of-fit tests. | |||
} | |||
\details{ | |||
If ‘x’ is a matrix with one row or column, or if ‘x’ is a vector | |||
and ‘y’ is not given, then a _goodness-of-fit test_ is performed | |||
(‘x’ is treated as a one-dimensional contingency table). The | |||
entries of ‘x’ must be non-negative integers. In this case, the | |||
hypothesis tested is whether the population probabilities equal | |||
those in ‘p’, or are all equal if ‘p’ is not given. | |||
If ‘x’ is a matrix with at least two rows and columns, it is taken | |||
as a two-dimensional contingency table: the entries of ‘x’ must be | |||
non-negative integers. Otherwise, ‘x’ and ‘y’ must be vectors or | |||
factors of the same length; cases with missing values are removed, | |||
the objects are coerced to factors, and the contingency table is | |||
computed from these. Then Pearson's chi-squared test is performed | |||
of the null hypothesis that the joint distribution of the cell | |||
counts in a 2-dimensional contingency table is the product of the | |||
row and column marginals. | |||
If ‘simulate.p.value’ is ‘FALSE’, the p-value is computed from the | |||
asymptotic chi-squared distribution of the test statistic; | |||
continuity correction is only used in the 2-by-2 case (if | |||
‘correct’ is ‘TRUE’, the default). Otherwise the p-value is | |||
computed for a Monte Carlo test (Hope, 1968) with ‘B’ replicates. | |||
In the contingency table case simulation is done by random | |||
sampling from the set of all contingency tables with given | |||
marginals, and works only if the marginals are strictly positive. | |||
Continuity correction is never used, and the statistic is quoted | |||
without it. Note that this is not the usual sampling situation | |||
assumed for the chi-squared test but rather that for Fisher's | |||
exact test. | |||
In the goodness-of-fit case simulation is done by random sampling | |||
from the discrete distribution specified by ‘p’, each sample being | |||
of size ‘n = sum(x)’. This simulation is done in R and may be | |||
slow. | |||
} | |||
\examples{ | |||
\dontrun{ | |||
## From Agresti(2007) p.39 | |||
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477))) | |||
dimnames(M) <- list(gender = c("F", "M"), | |||
party = c("Democrat","Independent", "Republican")) | |||
(Xsq <- chisq.test(M)) # Prints test summary | |||
Xsq$observed # observed counts (same as M) | |||
Xsq$expected # expected counts under the null | |||
Xsq$residuals # Pearson residuals | |||
Xsq$stdres # standardized residuals | |||
## Effect of simulating p-values | |||
x <- matrix(c(12, 5, 7, 7), ncol = 2) | |||
chisq.test(x)$p.value # 0.4233 | |||
chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value | |||
# around 0.29! | |||
## Testing for population probabilities | |||
## Case A. Tabulated data | |||
x <- c(A = 20, B = 15, C = 25) | |||
chisq.test(x) | |||
chisq.test(as.table(x)) # the same | |||
x <- c(89,37,30,28,2) | |||
p <- c(40,20,20,15,5) | |||
try( | |||
chisq.test(x, p = p) # gives an error | |||
) | |||
chisq.test(x, p = p, rescale.p = TRUE) | |||
# works | |||
p <- c(0.40,0.20,0.20,0.19,0.01) | |||
# Expected count in category 5 | |||
# is 1.86 < 5 ==> chi square approx. | |||
chisq.test(x, p = p) # maybe doubtful, but is ok! | |||
chisq.test(x, p = p, simulate.p.value = TRUE) | |||
## Case B. Raw data | |||
x <- trunc(5 * runif(100)) | |||
chisq.test(table(x)) # NOT 'chisq.test(x)'! | |||
### | |||
} | |||
} | |||
\references{ | |||
Hope, A. C. A. (1968) A simplified Monte Carlo significance test | |||
procedure. _J. Roy, Statist. Soc. B_ *30*, 582-598. | |||
Patefield, W. M. (1981) Algorithm AS159. An efficient method of | |||
generating r x c tables with given row and column totals. | |||
_Applied Statistics_ *30*, 91-97. | |||
Agresti, A. (2007) _An Introduction to Categorical Data Analysis, | |||
2nd ed._, New York: John Wiley & Sons. Page 38. | |||
} | |||
\seealso{ | |||
stats::chisq.test | |||
For goodness-of-fit testing, notably of continuous distributions, ‘ks.test’. | |||
} |
@@ -26,7 +26,8 @@ datatable(data, ...) | |||
fillContainer = getOption("DT.fillContainer", NULL), | |||
autoHideNavigation = getOption("DT.autoHideNavigation", NULL), | |||
selection = c("multiple", "single", "none"), extensions = c("FixedHeader", | |||
"FixedColumns", "Buttons"), plugins = NULL, digits = 2, ...) | |||
"FixedColumns", "Buttons"), plugins = NULL, rownames = F, digits = 2, | |||
...) | |||
} | |||
\arguments{ | |||
\item{data}{A desctable} | |||
@@ -99,6 +100,10 @@ extensions (\url{http://datatables.net/extensions/index})} | |||
\item{plugins}{a character vector of the names of DataTables plug-ins | |||
(\url{http://rstudio.github.io/DT/plugins.html})} | |||
\item{rownames}{\code{TRUE} (show row names) or \code{FALSE} (hide row names) | |||
or a character vector of row names; by default, the row names are displayed | |||
in the first column of the table if exist (not \code{NULL})} | |||
\item{digits}{the desired number of digits after the decimal | |||
point (\code{format = "f"}) or \emph{significant} digits | |||
(\code{format = "g"}, \code{= "e"} or \code{= "fg"}). | |||
@@ -45,7 +45,6 @@ If data is a grouped dataframe (using group_by), subtables are created and stati | |||
The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in pander and DT::datatable are present. Printing reduces the object to a dataframe. | |||
} | |||
\examples{ | |||
\dontrun{ | |||
iris \%>\% | |||
desctable | |||
@@ -77,7 +76,6 @@ iris \%>\% | |||
group_by(Petal.Length > 5) \%>\% | |||
desctable(tests = list(.auto = tests_auto, Species = ~chisq.test)) | |||
} | |||
} | |||
\seealso{ | |||
\code{\link{stats_auto}} | |||
@@ -2,17 +2,26 @@ | |||
% Please edit documentation in R/convenience_functions.R | |||
\name{fisher.test} | |||
\alias{fisher.test} | |||
\title{Fisher test} | |||
\alias{fisher.test.default} | |||
\alias{fisher.test.formula} | |||
\title{Fisher's Exact Test for Count Data} | |||
\usage{ | |||
fisher.test(x, y = NULL, workspace = 2e+05, hybrid = FALSE, | |||
control = list(), or = 1, alternative = "two.sided", conf.int = TRUE, | |||
conf.level = 0.95, simulate.p.value = FALSE, B = 2000) | |||
fisher.test(x, y, workspace, hybrid, control, or, alternative, conf.int, | |||
conf.level, simulate.p.value, B) | |||
\method{fisher.test}{default}(x, y = NULL, workspace = 2e+05, | |||
hybrid = FALSE, control = list(), or = 1, alternative = "two.sided", | |||
conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE, | |||
B = 2000) | |||
\method{fisher.test}{formula}(x, y = NULL, workspace = 2e+05, hybrid = F, | |||
control = list(), or = 1, alternative = "two.sided", conf.int = T, | |||
conf.level = 0.95, simulate.p.value = F, B = 2000) | |||
} | |||
\arguments{ | |||
\item{x}{either a two-dimensional contingency table in matrix form, | |||
or a factor object.} | |||
\item{x}{either a two-dimensional contingency table in matrix form, a factor object, or a formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors.} | |||
\item{y}{a factor object; ignored if \code{x} is a matrix.} | |||
\item{y}{a factor object; ignored if \code{x} is a matrix or a formula.} | |||
\item{workspace}{an integer specifying the size of the workspace | |||
used in the network algorithm. In units of 4 bytes. Only used for | |||
@@ -53,9 +62,149 @@ fisher.test(x, y = NULL, workspace = 2e+05, hybrid = FALSE, | |||
\item{B}{an integer specifying the number of replicates used in the | |||
Monte Carlo test.} | |||
} | |||
\value{ | |||
A list with class ‘"htest"’ containing the following components: | |||
p.value: the p-value of the test. | |||
conf.int: a confidence interval for the odds ratio. Only present in | |||
the 2 by 2 case and if argument ‘conf.int = TRUE’. | |||
estimate: an estimate of the odds ratio. Note that the _conditional_ | |||
Maximum Likelihood Estimate (MLE) rather than the | |||
unconditional MLE (the sample odds ratio) is used. Only | |||
present in the 2 by 2 case. | |||
null.value: the odds ratio under the null, ‘or’. Only present in the 2 | |||
by 2 case. | |||
alternative: a character string describing the alternative hypothesis. | |||
method: the character string ‘"Fisher's Exact Test for Count Data"’. | |||
data.name: a character string giving the names of the data. | |||
} | |||
\description{ | |||
Fisher test | |||
Performs Fisher's exact test for testing the null of independence | |||
of rows and columns in a contingency table with fixed marginals. | |||
} | |||
\details{ | |||
If ‘x’ is a matrix, it is taken as a two-dimensional contingency | |||
table, and hence its entries should be nonnegative integers. | |||
Otherwise, both ‘x’ and ‘y’ must be vectors of the same length. | |||
Incomplete cases are removed, the vectors are coerced into factor | |||
objects, and the contingency table is computed from these. | |||
For 2 by 2 cases, p-values are obtained directly using the | |||
(central or non-central) hypergeometric distribution. Otherwise, | |||
computations are based on a C version of the FORTRAN subroutine | |||
FEXACT which implements the network developed by Mehta and Patel | |||
(1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN | |||
code can be obtained from |http://www.netlib.org/toms/643|. | |||
Note this fails (with an error message) when the entries of the | |||
table are too large. (It transposes the table if necessary so it | |||
has no more rows than columns. One constraint is that the product | |||
of the row marginals be less than 2^31 - 1.) | |||
For 2 by 2 tables, the null of conditional independence is | |||
equivalent to the hypothesis that the odds ratio equals one. | |||
‘Exact’ inference can be based on observing that in general, given | |||
all marginal totals fixed, the first element of the contingency | |||
table has a non-central hypergeometric distribution with | |||
non-centrality parameter given by the odds ratio (Fisher, 1935). | |||
The alternative for a one-sided test is based on the odds ratio, | |||
so ‘alternative = "greater"’ is a test of the odds ratio being | |||
bigger than ‘or’. | |||
Two-sided tests are based on the probabilities of the tables, and | |||
take as ‘more extreme’ all tables with probabilities less than or | |||
equal to that of the observed table, the p-value being the sum of | |||
such probabilities. | |||
For larger than 2 by 2 tables and ‘hybrid = TRUE’, asymptotic | |||
chi-squared probabilities are only used if the ‘Cochran | |||
conditions’ are satisfied, that is if no cell has count zero, and | |||
more than 80% of the cells have counts at least 5: otherwise the | |||
exact calculation is used. | |||
Simulation is done conditional on the row and column marginals, | |||
and works only if the marginals are strictly positive. (A C | |||
translation of the algorithm of Patefield (1981) is used.) | |||
} | |||
\examples{ | |||
\dontrun{ | |||
## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker | |||
## A British woman claimed to be able to distinguish whether milk or | |||
## tea was added to the cup first. To test, she was given 8 cups of | |||
## tea, in four of which milk was added first. The null hypothesis | |||
## is that there is no association between the true order of pouring | |||
## and the woman's guess, the alternative that there is a positive | |||
## association (that the odds ratio is greater than 1). | |||
TeaTasting <- | |||
matrix(c(3, 1, 1, 3), | |||
nrow = 2, | |||
dimnames = list(Guess = c("Milk", "Tea"), | |||
Truth = c("Milk", "Tea"))) | |||
fisher.test(TeaTasting, alternative = "greater") | |||
## => p = 0.2429, association could not be established | |||
## Fisher (1962, 1970), Criminal convictions of like-sex twins | |||
Convictions <- | |||
matrix(c(2, 10, 15, 3), | |||
nrow = 2, | |||
dimnames = | |||
list(c("Dizygotic", "Monozygotic"), | |||
c("Convicted", "Not convicted"))) | |||
Convictions | |||
fisher.test(Convictions, alternative = "less") | |||
fisher.test(Convictions, conf.int = FALSE) | |||
fisher.test(Convictions, conf.level = 0.95)$conf.int | |||
fisher.test(Convictions, conf.level = 0.99)$conf.int | |||
## A r x c table Agresti (2002, p. 57) Job Satisfaction | |||
Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4, | |||
dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"), | |||
satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS"))) | |||
fisher.test(Job) | |||
fisher.test(Job, simulate.p.value = TRUE, B = 1e5) | |||
### | |||
} | |||
} | |||
\references{ | |||
Agresti, A. (1990) _Categorical data analysis_. New York: Wiley. | |||
Pages 59-66. | |||
Agresti, A. (2002) _Categorical data analysis_. Second edition. | |||
New York: Wiley. Pages 91-101. | |||
Fisher, R. A. (1935) The logic of inductive inference. _Journal | |||
of the Royal Statistical Society Series A_ *98*, 39-54. | |||
Fisher, R. A. (1962) Confidence limits for a cross-product ratio. | |||
_Australian Journal of Statistics_ *4*, 41. | |||
Fisher, R. A. (1970) _Statistical Methods for Research Workers._ | |||
Oliver & Boyd. | |||
Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A | |||
Fortran subroutine for Fisher's exact test on unordered r*c | |||
contingency tables. _ACM Transactions on Mathematical Software_, | |||
*12*, 154-161. | |||
Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm | |||
643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r | |||
x c Contingency Tables. _ACM Transactions on Mathematical | |||
Software_, *19*, 484-488. | |||
Patefield, W. M. (1981) Algorithm AS159. An efficient method of | |||
generating r x c tables with given row and column totals. | |||
_Applied Statistics_ *30*, 91-97. | |||
} | |||
\seealso{ | |||
stats::fisher.test | |||
‘chisq.test’ | |||
‘fisher.exact’ in package ‘exact2x2’ for alternative | |||
interpretations of two-sided tests and confidence intervals for 2 | |||
by 2 tables. | |||
} |