Updated documentation

7 years ago · bcfd86f41a
--- a/man/chisq.test.Rd
+++ b/man/chisq.test.Rd
@@ -2,17 +2,27 @@
 % Please edit documentation in R/convenience_functions.R
 \name{chisq.test}
 \alias{chisq.test}
 \title{Chi-square test}
 \alias{chisq.test.default}
 \alias{chisq.test.formula}
 \title{Pearson's Chi-squared Test for Count Data}
 \source{
 The code for Monte Carlo simulation is a C translation of the Fortran algorithm of Patefield (1981).
 }
 \usage{
 chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),
  rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)
 chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B)

 \method{chisq.test}{default}(x, y = NULL, correct = TRUE,
  p = rep(1/length(x), length(x)), rescale.p = FALSE,
  simulate.p.value = FALSE, B = 2000)

 \method{chisq.test}{formula}(x, y = NULL, correct = T,
  p = rep(1/length(x), length(x)), rescale.p = F, simulate.p.value = F,
  B = 2000)
 }
 \arguments{
 \item{x}{a numeric vector or matrix. \code{x} and \code{y} can also
    both be factors.}
 \item{x}{a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. ‘x’ and ‘y’ can also both be factors.}

 \item{y}{a numeric vector; ignored if \code{x} is a matrix.  If
    \code{x} is a factor, \code{y} should be a factor of the same length.}
 \item{y}{a numeric vector; ignored if ‘x’ is a matrix or a formula. If ‘x’ is a factor, ‘y’ should be a factor of the same length.}

 \item{correct}{a logical indicating whether to apply continuity
    correction when computing the test statistic for 2 by 2 tables: one
@@ -33,9 +43,129 @@ chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),
 \item{B}{an integer specifying the number of replicates used in the
    Monte Carlo test.}
 }
 \value{
 A list with class ‘"htest"’ containing the following components:
 statistic: the value the chi-squared test statistic.

 parameter: the degrees of freedom of the approximate chi-squared
          distribution of the test statistic, ‘NA’ if the p-value is
          computed by Monte Carlo simulation.

 p.value: the p-value for the test.

  method: a character string indicating the type of test performed, and
          whether Monte Carlo simulation or continuity correction was
          used.

 data.name: a character string giving the name(s) of the data.

 observed: the observed counts.

 expected: the expected counts under the null hypothesis.

 residuals: the Pearson residuals, ‘(observed - expected) /
          sqrt(expected)’.

  stdres: standardized residuals, ‘(observed - expected) / sqrt(V)’,
          where ‘V’ is the residual cell variance (Agresti, 2007,
          section 2.4.5 for the case where ‘x’ is a matrix, ‘n * p * (1
          - p)’ otherwise).
 }
 \description{
 Chi-square test
 ‘chisq.test’ performs chi-squared contingency table tests and goodness-of-fit tests.
 }
 \details{
 If ‘x’ is a matrix with one row or column, or if ‘x’ is a vector
 and ‘y’ is not given, then a _goodness-of-fit test_ is performed
 (‘x’ is treated as a one-dimensional contingency table).  The
 entries of ‘x’ must be non-negative integers.  In this case, the
 hypothesis tested is whether the population probabilities equal
 those in ‘p’, or are all equal if ‘p’ is not given.

 If ‘x’ is a matrix with at least two rows and columns, it is taken
 as a two-dimensional contingency table: the entries of ‘x’ must be
 non-negative integers.  Otherwise, ‘x’ and ‘y’ must be vectors or
 factors of the same length; cases with missing values are removed,
 the objects are coerced to factors, and the contingency table is
 computed from these.  Then Pearson's chi-squared test is performed
 of the null hypothesis that the joint distribution of the cell
 counts in a 2-dimensional contingency table is the product of the
 row and column marginals.

 If ‘simulate.p.value’ is ‘FALSE’, the p-value is computed from the
 asymptotic chi-squared distribution of the test statistic;
 continuity correction is only used in the 2-by-2 case (if
 ‘correct’ is ‘TRUE’, the default).  Otherwise the p-value is
 computed for a Monte Carlo test (Hope, 1968) with ‘B’ replicates.

 In the contingency table case simulation is done by random
 sampling from the set of all contingency tables with given
 marginals, and works only if the marginals are strictly positive.
 Continuity correction is never used, and the statistic is quoted
 without it.  Note that this is not the usual sampling situation
 assumed for the chi-squared test but rather that for Fisher's
 exact test.

 In the goodness-of-fit case simulation is done by random sampling
 from the discrete distribution specified by ‘p’, each sample being
 of size ‘n = sum(x)’.  This simulation is done in R and may be
 slow.
 }
 \examples{
 \dontrun{
 ## From Agresti(2007) p.39
 M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
 dimnames(M) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
 (Xsq <- chisq.test(M))  # Prints test summary
 Xsq$observed   # observed counts (same as M)
 Xsq$expected   # expected counts under the null
 Xsq$residuals  # Pearson residuals
 Xsq$stdres     # standardized residuals


 ## Effect of simulating p-values
 x <- matrix(c(12, 5, 7, 7), ncol = 2)
 chisq.test(x)$p.value           # 0.4233
 chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value
                                # around 0.29!

 ## Testing for population probabilities
 ## Case A. Tabulated data
 x <- c(A = 20, B = 15, C = 25)
 chisq.test(x)
 chisq.test(as.table(x))             # the same
 x <- c(89,37,30,28,2)
 p <- c(40,20,20,15,5)
 try(
 chisq.test(x, p = p)                # gives an error
 )
 chisq.test(x, p = p, rescale.p = TRUE)
                                # works
 p <- c(0.40,0.20,0.20,0.19,0.01)
                                # Expected count in category 5
                                # is 1.86 < 5 ==> chi square approx.
 chisq.test(x, p = p)            #               maybe doubtful, but is ok!
 chisq.test(x, p = p, simulate.p.value = TRUE)

 ## Case B. Raw data
 x <- trunc(5 * runif(100))
 chisq.test(table(x))            # NOT 'chisq.test(x)'!

 ###
 }
 }
 \references{
 Hope, A. C. A. (1968) A simplified Monte Carlo significance test
 procedure.  _J. Roy, Statist. Soc. B_ *30*, 582-598.

 Patefield, W. M. (1981) Algorithm AS159.  An efficient method of
 generating r x c tables with given row and column totals.
 _Applied Statistics_ *30*, 91-97.

 Agresti, A. (2007) _An Introduction to Categorical Data Analysis,
 2nd ed._, New York: John Wiley & Sons.  Page 38.
 }
 \seealso{
 stats::chisq.test
 For goodness-of-fit testing, notably of continuous distributions, ‘ks.test’.
 }
--- a/man/datatable.Rd
+++ b/man/datatable.Rd
@@ -26,7 +26,8 @@ datatable(data, ...)
  fillContainer = getOption("DT.fillContainer", NULL),
  autoHideNavigation = getOption("DT.autoHideNavigation", NULL),
  selection = c("multiple", "single", "none"), extensions = c("FixedHeader",
  "FixedColumns", "Buttons"), plugins = NULL, digits = 2, ...)
  "FixedColumns", "Buttons"), plugins = NULL, rownames = F, digits = 2,
  ...)
 }
 \arguments{
 \item{data}{A desctable}
@@ -99,6 +100,10 @@ extensions (\url{http://datatables.net/extensions/index})}
 \item{plugins}{a character vector of the names of DataTables plug-ins
 (\url{http://rstudio.github.io/DT/plugins.html})}

 \item{rownames}{\code{TRUE} (show row names) or \code{FALSE} (hide row names)
 or a character vector of row names; by default, the row names are displayed
 in the first column of the table if exist (not \code{NULL})}

 \item{digits}{the desired number of digits after the decimal
    point (\code{format = "f"}) or \emph{significant} digits
    (\code{format = "g"}, \code{= "e"} or \code{= "fg"}).
--- a/man/desctable.Rd
+++ b/man/desctable.Rd
@@ -45,7 +45,6 @@ If data is a grouped dataframe (using group_by), subtables are created and stati
 The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in pander and DT::datatable are present. Printing reduces the object to a dataframe.
 }
 \examples{
 \dontrun{
 iris \%>\%
  desctable

@@ -77,7 +76,6 @@ iris \%>\%
  group_by(Petal.Length > 5) \%>\%
  desctable(tests = list(.auto = tests_auto, Species = ~chisq.test))
 }
 }
 \seealso{
 \code{\link{stats_auto}}

--- a/man/fisher.test.Rd
+++ b/man/fisher.test.Rd
@@ -2,17 +2,26 @@
 % Please edit documentation in R/convenience_functions.R
 \name{fisher.test}
 \alias{fisher.test}
 \title{Fisher test}
 \alias{fisher.test.default}
 \alias{fisher.test.formula}
 \title{Fisher's Exact Test for Count Data}
 \usage{
 fisher.test(x, y = NULL, workspace = 2e+05, hybrid = FALSE,
  control = list(), or = 1, alternative = "two.sided", conf.int = TRUE,
  conf.level = 0.95, simulate.p.value = FALSE, B = 2000)
 fisher.test(x, y, workspace, hybrid, control, or, alternative, conf.int,
  conf.level, simulate.p.value, B)

 \method{fisher.test}{default}(x, y = NULL, workspace = 2e+05,
  hybrid = FALSE, control = list(), or = 1, alternative = "two.sided",
  conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE,
  B = 2000)

 \method{fisher.test}{formula}(x, y = NULL, workspace = 2e+05, hybrid = F,
  control = list(), or = 1, alternative = "two.sided", conf.int = T,
  conf.level = 0.95, simulate.p.value = F, B = 2000)
 }
 \arguments{
 \item{x}{either a two-dimensional contingency table in matrix form,
    or a factor object.}
 \item{x}{either a two-dimensional contingency table in matrix form, a factor object, or a formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors.}

 \item{y}{a factor object; ignored if \code{x} is a matrix.}
 \item{y}{a factor object; ignored if \code{x} is a matrix or a formula.}

 \item{workspace}{an integer specifying the size of the workspace
    used in the network algorithm.  In units of 4 bytes.  Only used for
@@ -53,9 +62,149 @@ fisher.test(x, y = NULL, workspace = 2e+05, hybrid = FALSE,
 \item{B}{an integer specifying the number of replicates used in the
    Monte Carlo test.}
 }
 \value{
 A list with class ‘"htest"’ containing the following components:

 p.value: the p-value of the test.

 conf.int: a confidence interval for the odds ratio.  Only present in
          the 2 by 2 case and if argument ‘conf.int = TRUE’.

 estimate: an estimate of the odds ratio.  Note that the _conditional_
          Maximum Likelihood Estimate (MLE) rather than the
          unconditional MLE (the sample odds ratio) is used.  Only
          present in the 2 by 2 case.

 null.value: the odds ratio under the null, ‘or’.  Only present in the 2
          by 2 case.

 alternative: a character string describing the alternative hypothesis.

 method: the character string ‘"Fisher's Exact Test for Count Data"’.

 data.name: a character string giving the names of the data.
 }
 \description{
 Fisher test
 Performs Fisher's exact test for testing the null of independence
 of rows and columns in a contingency table with fixed marginals.
 }
 \details{
 If ‘x’ is a matrix, it is taken as a two-dimensional contingency
 table, and hence its entries should be nonnegative integers.
 Otherwise, both ‘x’ and ‘y’ must be vectors of the same length.
 Incomplete cases are removed, the vectors are coerced into factor
 objects, and the contingency table is computed from these.

 For 2 by 2 cases, p-values are obtained directly using the
 (central or non-central) hypergeometric distribution. Otherwise,
 computations are based on a C version of the FORTRAN subroutine
 FEXACT which implements the network developed by Mehta and Patel
 (1986) and improved by Clarkson, Fan and Joe (1993).  The FORTRAN
 code can be obtained from  |http://www.netlib.org/toms/643|.
 Note this fails (with an error message) when the entries of the
 table are too large.  (It transposes the table if necessary so it
 has no more rows than columns.  One constraint is that the product
 of the row marginals be less than 2^31 - 1.)

 For 2 by 2 tables, the null of conditional independence is
 equivalent to the hypothesis that the odds ratio equals one.
 ‘Exact’ inference can be based on observing that in general, given
 all marginal totals fixed, the first element of the contingency
 table has a non-central hypergeometric distribution with
 non-centrality parameter given by the odds ratio (Fisher, 1935).
 The alternative for a one-sided test is based on the odds ratio,
 so ‘alternative = "greater"’ is a test of the odds ratio being
 bigger than ‘or’.

 Two-sided tests are based on the probabilities of the tables, and
 take as ‘more extreme’ all tables with probabilities less than or
 equal to that of the observed table, the p-value being the sum of
 such probabilities.

 For larger than 2 by 2 tables and ‘hybrid = TRUE’, asymptotic
 chi-squared probabilities are only used if the ‘Cochran
 conditions’ are satisfied, that is if no cell has count zero, and
 more than 80% of the cells have counts at least 5: otherwise the
 exact calculation is used.

 Simulation is done conditional on the row and column marginals,
 and works only if the marginals are strictly positive.  (A C
 translation of the algorithm of Patefield (1981) is used.)
 }
 \examples{
 \dontrun{
 ## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker
 ## A British woman claimed to be able to distinguish whether milk or
 ##  tea was added to the cup first.  To test, she was given 8 cups of
 ##  tea, in four of which milk was added first.  The null hypothesis
 ##  is that there is no association between the true order of pouring
 ##  and the woman's guess, the alternative that there is a positive
 ##  association (that the odds ratio is greater than 1).
 TeaTasting <-
 matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
 fisher.test(TeaTasting, alternative = "greater")
 ## => p = 0.2429, association could not be established

 ## Fisher (1962, 1970), Criminal convictions of like-sex twins
 Convictions <-
 matrix(c(2, 10, 15, 3),
       nrow = 2,
       dimnames =
       list(c("Dizygotic", "Monozygotic"),
            c("Convicted", "Not convicted")))
 Convictions
 fisher.test(Convictions, alternative = "less")
 fisher.test(Convictions, conf.int = FALSE)
 fisher.test(Convictions, conf.level = 0.95)$conf.int
 fisher.test(Convictions, conf.level = 0.99)$conf.int

 ## A r x c table  Agresti (2002, p. 57) Job Satisfaction
 Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4,
 dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"),
                satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS")))
 fisher.test(Job)
 fisher.test(Job, simulate.p.value = TRUE, B = 1e5)

 ###
 }
 }
 \references{
 Agresti, A. (1990) _Categorical data analysis_.  New York: Wiley.
 Pages 59-66.

 Agresti, A. (2002) _Categorical data analysis_. Second edition.
 New York: Wiley.  Pages 91-101.

 Fisher, R. A. (1935) The logic of inductive inference.  _Journal
 of the Royal Statistical Society Series A_ *98*, 39-54.

 Fisher, R. A. (1962) Confidence limits for a cross-product ratio.
 _Australian Journal of Statistics_ *4*, 41.

 Fisher, R. A. (1970) _Statistical Methods for Research Workers._
 Oliver & Boyd.

 Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A
 Fortran subroutine for Fisher's exact test on unordered r*c
 contingency tables.  _ACM Transactions on Mathematical Software_,
 *12*, 154-161.

 Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm
 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r
 x c Contingency Tables.  _ACM Transactions on Mathematical
 Software_, *19*, 484-488.

 Patefield, W. M. (1981) Algorithm AS159.  An efficient method of
 generating r x c tables with given row and column totals.
 _Applied Statistics_ *30*, 91-97.
 }
 \seealso{
 stats::fisher.test
 ‘chisq.test’

 ‘fisher.exact’ in package ‘exact2x2’ for alternative
 interpretations of two-sided tests and confidence intervals for 2
 by 2 tables.
 }