You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

234 lines
8.7KB

  1. % Generated by roxygen2: do not edit by hand
  2. % Please edit documentation in R/convenience_functions.R
  3. \name{fisher.test}
  4. \alias{fisher.test}
  5. \alias{fisher.test.default}
  6. \alias{fisher.test.formula}
  7. \title{Fisher's Exact Test for Count Data}
  8. \usage{
  9. fisher.test(
  10. x,
  11. y,
  12. workspace,
  13. hybrid,
  14. control,
  15. or,
  16. alternative,
  17. conf.int,
  18. conf.level,
  19. simulate.p.value,
  20. B
  21. )
  22. \method{fisher.test}{default}(x, ...)
  23. \method{fisher.test}{formula}(
  24. x,
  25. y = NULL,
  26. workspace = 200000,
  27. hybrid = F,
  28. control = list(),
  29. or = 1,
  30. alternative = "two.sided",
  31. conf.int = T,
  32. conf.level = 0.95,
  33. simulate.p.value = F,
  34. B = 2000
  35. )
  36. }
  37. \arguments{
  38. \item{x}{either a two-dimensional contingency table in matrix form, a factor object, or a formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors.}
  39. \item{y}{a factor object; ignored if \code{x} is a matrix or a formula.}
  40. \item{workspace}{an integer specifying the size of the workspace
  41. used in the network algorithm. In units of 4 bytes. Only used for
  42. non-simulated p-values larger than \eqn{2 \times 2}{2 by 2} tables.
  43. Since \R version 3.5.0, this also increases the internal stack size
  44. which allows larger problems to be solved, however sometimes needing
  45. hours. In such cases, \code{simulate.p.values=TRUE} may be more
  46. reasonable.}
  47. \item{hybrid}{a logical. Only used for larger than \eqn{2 \times 2}{2 by 2}
  48. tables, in which cases it indicates whether the exact probabilities
  49. (default) or a hybrid approximation thereof should be computed.}
  50. \item{control}{a list with named components for low level algorithm
  51. control. At present the only one used is \code{"mult"}, a positive
  52. integer \eqn{\ge 2} with default 30 used only for larger than
  53. \eqn{2 \times 2}{2 by 2} tables. This says how many times as much
  54. space should be allocated to paths as to keys: see file
  55. \file{fexact.c} in the sources of this package.}
  56. \item{or}{the hypothesized odds ratio. Only used in the
  57. \eqn{2 \times 2}{2 by 2} case.}
  58. \item{alternative}{indicates the alternative hypothesis and must be
  59. one of \code{"two.sided"}, \code{"greater"} or \code{"less"}.
  60. You can specify just the initial letter. Only used in the
  61. \eqn{2 \times 2}{2 by 2} case.}
  62. \item{conf.int}{logical indicating if a confidence interval for the
  63. odds ratio in a \eqn{2 \times 2}{2 by 2} table should be
  64. computed (and returned).}
  65. \item{conf.level}{confidence level for the returned confidence
  66. interval. Only used in the \eqn{2 \times 2}{2 by 2} case and if
  67. \code{conf.int = TRUE}.}
  68. \item{simulate.p.value}{a logical indicating whether to compute
  69. p-values by Monte Carlo simulation, in larger than \eqn{2 \times
  70. 2}{2 by 2} tables.}
  71. \item{B}{an integer specifying the number of replicates used in the
  72. Monte Carlo test.}
  73. \item{...}{additional params to feed to original fisher.test}
  74. }
  75. \value{
  76. A list with class \code{"htest"} containing the following components:
  77. p.value: the p-value of the test.
  78. conf.int: a confidence interval for the odds ratio. Only present in
  79. the 2 by 2 case and if argument \code{conf.int = TRUE}.
  80. estimate: an estimate of the odds ratio. Note that the _conditional_
  81. Maximum Likelihood Estimate (MLE) rather than the
  82. unconditional MLE (the sample odds ratio) is used. Only
  83. present in the 2 by 2 case.
  84. null.value: the odds ratio under the null, \code{or}. Only present in the 2
  85. by 2 case.
  86. alternative: a character string describing the alternative hypothesis.
  87. method: the character string \code{"Fisher's Exact Test for Count Data"}.
  88. data.name: a character string giving the names of the data.
  89. }
  90. \description{
  91. Performs Fisher's exact test for testing the null of independence
  92. of rows and columns in a contingency table with fixed marginals, or with a formula expression.
  93. }
  94. \details{
  95. If \code{x} is a matrix, it is taken as a two-dimensional contingency
  96. table, and hence its entries should be nonnegative integers.
  97. Otherwise, both \code{x} and \code{y} must be vectors of the same length.
  98. Incomplete cases are removed, the vectors are coerced into factor
  99. objects, and the contingency table is computed from these.
  100. For 2 by 2 cases, p-values are obtained directly using the
  101. (central or non-central) hypergeometric distribution. Otherwise,
  102. computations are based on a C version of the FORTRAN subroutine
  103. FEXACT which implements the network developed by Mehta and Patel
  104. (1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN
  105. code can be obtained from \url{http://www.netlib.org/toms/643}.
  106. Note this fails (with an error message) when the entries of the
  107. table are too large. (It transposes the table if necessary so it
  108. has no more rows than columns. One constraint is that the product
  109. of the row marginals be less than 2^31 - 1.)
  110. For 2 by 2 tables, the null of conditional independence is
  111. equivalent to the hypothesis that the odds ratio equals one.
  112. \code{Exact} inference can be based on observing that in general, given
  113. all marginal totals fixed, the first element of the contingency
  114. table has a non-central hypergeometric distribution with
  115. non-centrality parameter given by the odds ratio (Fisher, 1935).
  116. The alternative for a one-sided test is based on the odds ratio,
  117. so \code{alternative = "greater"} is a test of the odds ratio being
  118. bigger than \code{or}.
  119. Two-sided tests are based on the probabilities of the tables, and
  120. take as \code{more extreme} all tables with probabilities less than or
  121. equal to that of the observed table, the p-value being the sum of
  122. such probabilities.
  123. For larger than 2 by 2 tables and \code{hybrid = TRUE}, asymptotic
  124. chi-squared probabilities are only used if the ‘Cochran
  125. conditions’ are satisfied, that is if no cell has count zero, and
  126. more than 80% of the cells have counts at least 5: otherwise the
  127. exact calculation is used.
  128. Simulation is done conditional on the row and column marginals,
  129. and works only if the marginals are strictly positive. (A C
  130. translation of the algorithm of Patefield (1981) is used.)
  131. }
  132. \examples{
  133. \dontrun{
  134. ## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker
  135. ## A British woman claimed to be able to distinguish whether milk or
  136. ## tea was added to the cup first. To test, she was given 8 cups of
  137. ## tea, in four of which milk was added first. The null hypothesis
  138. ## is that there is no association between the true order of pouring
  139. ## and the woman's guess, the alternative that there is a positive
  140. ## association (that the odds ratio is greater than 1).
  141. TeaTasting <-
  142. matrix(c(3, 1, 1, 3),
  143. nrow = 2,
  144. dimnames = list(Guess = c("Milk", "Tea"),
  145. Truth = c("Milk", "Tea")))
  146. fisher.test(TeaTasting, alternative = "greater")
  147. ## => p = 0.2429, association could not be established
  148. ## Fisher (1962, 1970), Criminal convictions of like-sex twins
  149. Convictions <-
  150. matrix(c(2, 10, 15, 3),
  151. nrow = 2,
  152. dimnames =
  153. list(c("Dizygotic", "Monozygotic"),
  154. c("Convicted", "Not convicted")))
  155. Convictions
  156. fisher.test(Convictions, alternative = "less")
  157. fisher.test(Convictions, conf.int = FALSE)
  158. fisher.test(Convictions, conf.level = 0.95)$conf.int
  159. fisher.test(Convictions, conf.level = 0.99)$conf.int
  160. ## A r x c table Agresti (2002, p. 57) Job Satisfaction
  161. Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4,
  162. dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"),
  163. satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS")))
  164. fisher.test(Job)
  165. fisher.test(Job, simulate.p.value = TRUE, B = 1e5)
  166. ###
  167. }
  168. }
  169. \references{
  170. Agresti, A. (1990) _Categorical data analysis_. New York: Wiley.
  171. Pages 59-66.
  172. Agresti, A. (2002) _Categorical data analysis_. Second edition.
  173. New York: Wiley. Pages 91-101.
  174. Fisher, R. A. (1935) The logic of inductive inference. _Journal
  175. of the Royal Statistical Society Series A_ *98*, 39-54.
  176. Fisher, R. A. (1962) Confidence limits for a cross-product ratio.
  177. _Australian Journal of Statistics_ *4*, 41.
  178. Fisher, R. A. (1970) _Statistical Methods for Research Workers._
  179. Oliver & Boyd.
  180. Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A
  181. Fortran subroutine for Fisher's exact test on unordered r*c
  182. contingency tables. _ACM Transactions on Mathematical Software_,
  183. *12*, 154-161.
  184. Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm
  185. 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r
  186. x c Contingency Tables. _ACM Transactions on Mathematical
  187. Software_, *19*, 484-488.
  188. Patefield, W. M. (1981) Algorithm AS159. An efficient method of
  189. generating r x c tables with given row and column totals.
  190. _Applied Statistics_ *30*, 91-97.
  191. }
  192. \seealso{
  193. \code{\link{chisq.test}}
  194. \code{fisher.exact} in package \pkg{kexact2x2} for alternative
  195. interpretations of two-sided tests and confidence intervals for 2
  196. by 2 tables.
  197. }