@@ -74,22 +74,37 @@ varColumn <- function(data, labels = NULL) | |||
#' Generate a statistics table | |||
#' | |||
#' Generate a statistics table with variable names/labels and levels | |||
#' Generate a statistics table with the chosen statistical functions, and tests if given a \code{"grouped"} dataframe. | |||
#' | |||
#' @section Labels: | |||
#' labels is an option named character vector used to make the table prettier. | |||
#' | |||
#' If given, the variable names for which there is a label will be replaced by their corresponding label. | |||
#' | |||
#' Not all variables need to have a label, and labels for non-existing variables are ignored. | |||
#' | |||
#' labels must be given in the form c(unquoted_variable_name = "label") | |||
#' | |||
#' @section Stats: | |||
#' The stats can be a function which takes a dataframe and returns a list of statistical functions to use. | |||
#' stats can also be a named list of statistical functions, or formulas. The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats. If an element of the list is a formula, it can be used to conditionally use stats depending on the variable. The general form is `condition ~ T | F`, and can be nested, such as `is.factor ~ percent | (is.normal ~ mean | median)`, for example. | |||
#' | |||
#' stats can also be a named list of statistical functions, or formulas. | |||
#' | |||
#' The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats. If an element of the list is a formula, it can be used to conditionally use stats depending on the variable. | |||
#' | |||
#' The general form is \code{condition ~ T | F}, and can be nested, such as \code{is.factor ~ percent | (is.normal ~ mean | median)}, for example. | |||
#' | |||
#' @section Tests: | |||
#' The tests can be a function which takes a variable and a grouping variable, and returns an appropriate statistical test to use in that case. | |||
#' tests can also be a named list of statistical test functions, associating the name of a variable in the data, and a test to use specifically for that variable. That test name must be expressed as a single-term formula (e.g. ~t.test). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name .default, and an automatic test can be defined with the name .auto. | |||
#' | |||
#' If data is a grouped dataframe (using group_by), subtables are created and statistic tests are performed over each sub-group. | |||
#' tests can also be a named list of statistical test functions, associating the name of a variable in the data, and a test to use specifically for that variable. | |||
#' | |||
#' That test name must be expressed as a single-term formula (e.g. \code{~t.test}). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name \code{.default}, and an automatic test can be defined with the name \code{.auto}. | |||
#' | |||
#' If data is a grouped dataframe (using \code{group_by}), subtables are created and statistic tests are performed over each sub-group. | |||
#' | |||
#' The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in pander and DT::datatable are present. Printing reduces the object to a dataframe. | |||
#' @section Output: | |||
#' The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in \pkg{pander} and \pkg{DT} are present. Printing reduces the object to a dataframe. | |||
#' | |||
#' @param data The dataframe to analyze | |||
#' @param stats A list of named statistics to apply to each element of the dataframe, or a function returning a list of named statistics | |||
@@ -43,11 +43,11 @@ is.normal <- function(x) | |||
#' Fisher's Exact Test for Count Data | |||
#' | |||
#' Performs Fisher's exact test for testing the null of independence | |||
#' of rows and columns in a contingency table with fixed marginals. | |||
#' of rows and columns in a contingency table with fixed marginals, or with a formula expression. | |||
#' | |||
#' If ‘x’ is a matrix, it is taken as a two-dimensional contingency | |||
#' If \code{x} is a matrix, it is taken as a two-dimensional contingency | |||
#' table, and hence its entries should be nonnegative integers. | |||
#' Otherwise, both ‘x’ and ‘y’ must be vectors of the same length. | |||
#' Otherwise, both \code{x} and \code{y} must be vectors of the same length. | |||
#' Incomplete cases are removed, the vectors are coerced into factor | |||
#' objects, and the contingency table is computed from these. | |||
#' | |||
@@ -56,7 +56,7 @@ is.normal <- function(x) | |||
#' computations are based on a C version of the FORTRAN subroutine | |||
#' FEXACT which implements the network developed by Mehta and Patel | |||
#' (1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN | |||
#' code can be obtained from |http://www.netlib.org/toms/643|. | |||
#' code can be obtained from \url{http://www.netlib.org/toms/643}. | |||
#' Note this fails (with an error message) when the entries of the | |||
#' table are too large. (It transposes the table if necessary so it | |||
#' has no more rows than columns. One constraint is that the product | |||
@@ -64,20 +64,20 @@ is.normal <- function(x) | |||
#' | |||
#' For 2 by 2 tables, the null of conditional independence is | |||
#' equivalent to the hypothesis that the odds ratio equals one. | |||
#' ‘Exact’ inference can be based on observing that in general, given | |||
#' \code{Exact} inference can be based on observing that in general, given | |||
#' all marginal totals fixed, the first element of the contingency | |||
#' table has a non-central hypergeometric distribution with | |||
#' non-centrality parameter given by the odds ratio (Fisher, 1935). | |||
#' The alternative for a one-sided test is based on the odds ratio, | |||
#' so ‘alternative = "greater"’ is a test of the odds ratio being | |||
#' bigger than ‘or’. | |||
#' so \code{alternative = "greater"} is a test of the odds ratio being | |||
#' bigger than \code{or}. | |||
#' | |||
#' Two-sided tests are based on the probabilities of the tables, and | |||
#' take as ‘more extreme’ all tables with probabilities less than or | |||
#' take as \code{more extreme} all tables with probabilities less than or | |||
#' equal to that of the observed table, the p-value being the sum of | |||
#' such probabilities. | |||
#' | |||
#' For larger than 2 by 2 tables and ‘hybrid = TRUE’, asymptotic | |||
#' For larger than 2 by 2 tables and \code{hybrid = TRUE}, asymptotic | |||
#' chi-squared probabilities are only used if the ‘Cochran | |||
#' conditions’ are satisfied, that is if no cell has count zero, and | |||
#' more than 80% of the cells have counts at least 5: otherwise the | |||
@@ -89,24 +89,24 @@ is.normal <- function(x) | |||
#' @param x either a two-dimensional contingency table in matrix form, a factor object, or a formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. | |||
#' @param y a factor object; ignored if \code{x} is a matrix or a formula. | |||
#' @inheritParams stats::fisher.test | |||
#' @return A list with class ‘"htest"’ containing the following components: | |||
#' @return A list with class \code{"htest"} containing the following components: | |||
#' | |||
#' p.value: the p-value of the test. | |||
#' | |||
#' conf.int: a confidence interval for the odds ratio. Only present in | |||
#' the 2 by 2 case and if argument ‘conf.int = TRUE’. | |||
#' the 2 by 2 case and if argument \code{conf.int = TRUE}. | |||
#' | |||
#' estimate: an estimate of the odds ratio. Note that the _conditional_ | |||
#' Maximum Likelihood Estimate (MLE) rather than the | |||
#' unconditional MLE (the sample odds ratio) is used. Only | |||
#' present in the 2 by 2 case. | |||
#' | |||
#' null.value: the odds ratio under the null, ‘or’. Only present in the 2 | |||
#' null.value: the odds ratio under the null, \code{or}. Only present in the 2 | |||
#' by 2 case. | |||
#' | |||
#' alternative: a character string describing the alternative hypothesis. | |||
#' | |||
#' method: the character string ‘"Fisher's Exact Test for Count Data"’. | |||
#' method: the character string \code{"Fisher's Exact Test for Count Data"}. | |||
#' | |||
#' data.name: a character string giving the names of the data. | |||
#' @references | |||
@@ -139,9 +139,9 @@ is.normal <- function(x) | |||
#' generating r x c tables with given row and column totals. | |||
#' _Applied Statistics_ *30*, 91-97. | |||
#' @seealso | |||
#' ‘chisq.test’ | |||
#' \code{\link{chisq.test}} | |||
#' | |||
#' ‘fisher.exact’ in package ‘exact2x2’ for alternative | |||
#' \code{fisher.exact} in package \pkg{kexact2x2} for alternative | |||
#' interpretations of two-sided tests and confidence intervals for 2 | |||
#' by 2 tables. | |||
#' @examples | |||
@@ -220,18 +220,18 @@ fisher.test.formula <- function(x, | |||
#' Pearson's Chi-squared Test for Count Data | |||
#' | |||
#' ‘chisq.test’ performs chi-squared contingency table tests and goodness-of-fit tests. | |||
#' \code{chisq.test} performs chi-squared contingency table tests and goodness-of-fit tests, with an added method for formulas. | |||
#' | |||
#' If ‘x’ is a matrix with one row or column, or if ‘x’ is a vector | |||
#' and ‘y’ is not given, then a _goodness-of-fit test_ is performed | |||
#' (‘x’ is treated as a one-dimensional contingency table). The | |||
#' entries of ‘x’ must be non-negative integers. In this case, the | |||
#' If \code{x} is a matrix with one row or column, or if \code{x} is a vector | |||
#' and \code{y} is not given, then a _goodness-of-fit test_ is performed | |||
#' (\code{x} is treated as a one-dimensional contingency table). The | |||
#' entries of \code{x} must be non-negative integers. In this case, the | |||
#' hypothesis tested is whether the population probabilities equal | |||
#' those in ‘p’, or are all equal if ‘p’ is not given. | |||
#' those in \code{p}, or are all equal if \code{p} is not given. | |||
#' | |||
#' If ‘x’ is a matrix with at least two rows and columns, it is taken | |||
#' as a two-dimensional contingency table: the entries of ‘x’ must be | |||
#' non-negative integers. Otherwise, ‘x’ and ‘y’ must be vectors or | |||
#' If \code{x} is a matrix with at least two rows and columns, it is taken | |||
#' as a two-dimensional contingency table: the entries of \code{x} must be | |||
#' non-negative integers. Otherwise, \code{x} and \code{y} must be vectors or | |||
#' factors of the same length; cases with missing values are removed, | |||
#' the objects are coerced to factors, and the contingency table is | |||
#' computed from these. Then Pearson's chi-squared test is performed | |||
@@ -239,11 +239,11 @@ fisher.test.formula <- function(x, | |||
#' counts in a 2-dimensional contingency table is the product of the | |||
#' row and column marginals. | |||
#' | |||
#' If ‘simulate.p.value’ is ‘FALSE’, the p-value is computed from the | |||
#' If \code{simulate.p.value} is \code{FALSE}, the p-value is computed from the | |||
#' asymptotic chi-squared distribution of the test statistic; | |||
#' continuity correction is only used in the 2-by-2 case (if | |||
#' ‘correct’ is ‘TRUE’, the default). Otherwise the p-value is | |||
#' computed for a Monte Carlo test (Hope, 1968) with ‘B’ replicates. | |||
#' \code{correct} is \code{TRUE}, the default). Otherwise the p-value is | |||
#' computed for a Monte Carlo test (Hope, 1968) with \code{B} replicates. | |||
#' | |||
#' In the contingency table case simulation is done by random | |||
#' sampling from the set of all contingency tables with given | |||
@@ -254,17 +254,17 @@ fisher.test.formula <- function(x, | |||
#' exact test. | |||
#' | |||
#' In the goodness-of-fit case simulation is done by random sampling | |||
#' from the discrete distribution specified by ‘p’, each sample being | |||
#' of size ‘n = sum(x)’. This simulation is done in R and may be | |||
#' from the discrete distribution specified by \code{p}, each sample being | |||
#' of size \code{n = sum(x)}. This simulation is done in R and may be | |||
#' slow. | |||
#' @param x a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. ‘x’ and ‘y’ can also both be factors. | |||
#' @param y a numeric vector; ignored if ‘x’ is a matrix or a formula. If ‘x’ is a factor, ‘y’ should be a factor of the same length. | |||
#' @param x a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. \code{x} and \code{y} can also both be factors. | |||
#' @param y a numeric vector; ignored if \code{x} is a matrix or a formula. If \code{x} is a factor, \code{y} should be a factor of the same length. | |||
#' @inheritParams stats::chisq.test | |||
#' @return A list with class ‘"htest"’ containing the following components: | |||
#' @return A list with class \code{"htest"} containing the following components: | |||
#' statistic: the value the chi-squared test statistic. | |||
#' | |||
#' parameter: the degrees of freedom of the approximate chi-squared | |||
#' distribution of the test statistic, ‘NA’ if the p-value is | |||
#' distribution of the test statistic, \code{NA} if the p-value is | |||
#' computed by Monte Carlo simulation. | |||
#' | |||
#' p.value: the p-value for the test. | |||
@@ -282,9 +282,9 @@ fisher.test.formula <- function(x, | |||
#' residuals: the Pearson residuals, ‘(observed - expected) / | |||
#' sqrt(expected)’. | |||
#' | |||
#' stdres: standardized residuals, ‘(observed - expected) / sqrt(V)’, | |||
#' where ‘V’ is the residual cell variance (Agresti, 2007, | |||
#' section 2.4.5 for the case where ‘x’ is a matrix, ‘n * p * (1 | |||
#' stdres: standardized residuals, \code{(observed - expected) / sqrt(V)}, | |||
#' where \code{V} is the residual cell variance (Agresti, 2007, | |||
#' section 2.4.5 for the case where \code{x} is a matrix, ‘n * p * (1 | |||
#' - p)’ otherwise). | |||
#' @source The code for Monte Carlo simulation is a C translation of the Fortran algorithm of Patefield (1981). | |||
#' @references | |||
@@ -297,7 +297,7 @@ fisher.test.formula <- function(x, | |||
#' | |||
#' Agresti, A. (2007) _An Introduction to Categorical Data Analysis, | |||
#' 2nd ed._, New York: John Wiley & Sons. Page 38. | |||
#' @seealso For goodness-of-fit testing, notably of continuous distributions, ‘ks.test’. | |||
#' @seealso For goodness-of-fit testing, notably of continuous distributions, \code{\link{ks.test}}. | |||
#' @examples | |||
#' \dontrun{ | |||
#' ## From Agresti(2007) p.39 | |||
@@ -368,10 +368,10 @@ chisq.test.formula <- function(x, | |||
B = B) | |||
} | |||
#' Wrapper for summary(aov) | |||
#' Wrapper for oneway.test(var.equal = T) | |||
#' | |||
#' @param formula An anova formula (variable ~ grouping variable) | |||
#' @seealso stats::aov | |||
#' @param formula An anova formula (\code{variable ~ grouping variable}) | |||
#' @seealso \code{\link{oneway.test}} | |||
#' @export | |||
ANOVA <- function(formula) | |||
{ | |||
@@ -30,8 +30,13 @@ as.data.frame.desctable <- function(x, ...) | |||
#' Pander method for desctable | |||
#' | |||
#' Pander method to output a desctable | |||
#' | |||
#' Uses \code{pandoc.table}, with some default parameters (\code{digits = 2}, \code{justify = "left"}, \code{missing = ""}, \code{keep.line.breaks = T}, \code{split.tables = Inf}, and \code{emphasize.rownames = F}), that you can override if needed. | |||
#' | |||
#' @param x A desctable | |||
#' @inheritParams pander::pandoc.table | |||
#' @seealso \code{\link{pandoc.table}} | |||
#' @export | |||
pander.desctable <- function(x = NULL, | |||
digits = 2, | |||
@@ -62,8 +67,41 @@ pander.desctable <- function(x = NULL, | |||
...) | |||
} | |||
#' Datatable | |||
#' Create an HTML table widget using the DataTables library | |||
#' | |||
#' This function creates an HTML widget to display rectangular data (a matrix or data frame) using the JavaScript library DataTables, with a method for \code{desctable} objects. | |||
#' | |||
#' @note | |||
#' You are recommended to escape the table content for security reasons (e.g. XSS attacks) when using this function in Shiny or any other dynamic web applications. | |||
#' @references | |||
#' See \url{http://rstudio.github.io/DT} for the full documentation. | |||
#' @examples | |||
#' library(DT) | |||
#' | |||
#' # see the package vignette for examples and the link to website | |||
#' vignette('DT', package = 'DT') | |||
#' | |||
#' # some boring edge cases for testing purposes | |||
#' m = matrix(nrow = 0, ncol = 5, dimnames = list(NULL, letters[1:5])) | |||
#' datatable(m) # zero rows | |||
#' datatable(as.data.frame(m)) | |||
#' | |||
#' m = matrix(1, dimnames = list(NULL, 'a')) | |||
#' datatable(m) # one row and one column | |||
#' datatable(as.data.frame(m)) | |||
#' | |||
#' m = data.frame(a = 1, b = 2, c = 3) | |||
#' datatable(m) | |||
#' datatable(as.matrix(m)) | |||
#' | |||
#' # dates | |||
#' datatable(data.frame( | |||
#' date = seq(as.Date("2015-01-01"), by = "day", length.out = 5), x = 1:5 | |||
#' )) | |||
#' datatable(data.frame(x = Sys.Date())) | |||
#' datatable(data.frame(x = Sys.time())) | |||
#' | |||
#' ### | |||
#' @inheritParams DT::datatable | |||
#' @export | |||
datatable <- function(data, ...) | |||
@@ -93,10 +131,6 @@ datatable.default <- function(data, | |||
DT::datatable(data, options = options, class = class, callback = callback, caption = caption, filter = filter, escape = escape, style = style, width = width, height = height, elementId = elementId, fillContainer = fillContainer, autoHideNavigation = autoHideNavigation, selection = selection, extensions = extensions, plugins = plugins, ...) | |||
} | |||
#' datatable method for desctable | |||
#' | |||
#' @param data A desctable | |||
#' @param ... Additional datatable parameters | |||
#' @rdname datatable | |||
#' @inheritParams base::prettyNum | |||
#' @export | |||
@@ -1,10 +1,14 @@ | |||
#' Transform any function into a valid stat function for the table | |||
#' | |||
#' Transform a function into a valid stat function for the table | |||
#' | |||
#' NA values are removed from the data | |||
#' | |||
#' Applying the function on a numerical vector should return one value | |||
#' | |||
#' Applying the function on a factor should return nlevels + 1 value, or one value per factor level | |||
#' See `parse_formula` for the usage for formulaes. | |||
#' | |||
#' See \code{parse_formula} for the usage for formulaes. | |||
#' @param f The function to try to apply, or a formula combining two functions | |||
#' @param x A vector | |||
#' @export | |||
@@ -67,10 +71,12 @@ statify.formula <- function(x, f) | |||
#' These functions take a dataframe as argument and return a list of statistcs in the form accepted by desctable. | |||
#' | |||
#' Already defined are | |||
#' - stats_default with length, mean/%, sd, med and IQR | |||
#' - stats_normal with length, mean/% and sd | |||
#' - stats_nonnormal with length, median/% and IQR | |||
#' - stats_auto, which picks stats depending of the data | |||
#' \enumerate{ | |||
#' \item stats_default with length, mean/\%, sd, med and IQR | |||
#' \item stats_normal with length, mean/\% and sd | |||
#' \item stats_nonnormal with length, median/\% and IQR | |||
#' \item stats_auto, which picks stats depending of the data | |||
#' } | |||
#' | |||
#' You can define your own automatic functions, as long as they take a dataframe as argument and return a list of functions or formulas defining conditions to use a stat function. | |||
#' | |||
@@ -27,7 +27,7 @@ testify <- function(x, f, group) | |||
#' | |||
#' These functions take a variable and a grouping variable as arguments, and return a statistcal test to use, expressed as a single-term formula. | |||
#' | |||
#' Currently, only tests_auto is defined, and picks between t test, wilcoxon, anova, kruskal-wallis and fisher depending on the number of groups, the type of the variable, the normality and homoskedasticity of the distributions. | |||
#' Currently, only \code{tests_auto} is defined, and picks between t test, wilcoxon, anova, kruskal-wallis and fisher depending on the number of groups, the type of the variable, the normality and homoskedasticity of the distributions. | |||
#' | |||
#' @param var The variable to test | |||
#' @param grp The variable for the groups | |||
@@ -2,16 +2,16 @@ | |||
% Please edit documentation in R/convenience_functions.R | |||
\name{ANOVA} | |||
\alias{ANOVA} | |||
\title{Wrapper for summary(aov)} | |||
\title{Wrapper for oneway.test(var.equal = T)} | |||
\usage{ | |||
ANOVA(formula) | |||
} | |||
\arguments{ | |||
\item{formula}{An anova formula (variable ~ grouping variable)} | |||
\item{formula}{An anova formula (\code{variable ~ grouping variable})} | |||
} | |||
\description{ | |||
Wrapper for summary(aov) | |||
Wrapper for oneway.test(var.equal = T) | |||
} | |||
\seealso{ | |||
stats::aov | |||
\code{\link{oneway.test}} | |||
} |
@@ -20,9 +20,9 @@ chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B) | |||
B = 2000) | |||
} | |||
\arguments{ | |||
\item{x}{a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. ‘x’ and ‘y’ can also both be factors.} | |||
\item{x}{a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. \code{x} and \code{y} can also both be factors.} | |||
\item{y}{a numeric vector; ignored if ‘x’ is a matrix or a formula. If ‘x’ is a factor, ‘y’ should be a factor of the same length.} | |||
\item{y}{a numeric vector; ignored if \code{x} is a matrix or a formula. If \code{x} is a factor, \code{y} should be a factor of the same length.} | |||
\item{correct}{a logical indicating whether to apply continuity | |||
correction when computing the test statistic for 2 by 2 tables: one | |||
@@ -44,11 +44,11 @@ chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B) | |||
Monte Carlo test.} | |||
} | |||
\value{ | |||
A list with class ‘"htest"’ containing the following components: | |||
A list with class \code{"htest"} containing the following components: | |||
statistic: the value the chi-squared test statistic. | |||
parameter: the degrees of freedom of the approximate chi-squared | |||
distribution of the test statistic, ‘NA’ if the p-value is | |||
distribution of the test statistic, \code{NA} if the p-value is | |||
computed by Monte Carlo simulation. | |||
p.value: the p-value for the test. | |||
@@ -66,25 +66,25 @@ expected: the expected counts under the null hypothesis. | |||
residuals: the Pearson residuals, ‘(observed - expected) / | |||
sqrt(expected)’. | |||
stdres: standardized residuals, ‘(observed - expected) / sqrt(V)’, | |||
where ‘V’ is the residual cell variance (Agresti, 2007, | |||
section 2.4.5 for the case where ‘x’ is a matrix, ‘n * p * (1 | |||
stdres: standardized residuals, \code{(observed - expected) / sqrt(V)}, | |||
where \code{V} is the residual cell variance (Agresti, 2007, | |||
section 2.4.5 for the case where \code{x} is a matrix, ‘n * p * (1 | |||
- p)’ otherwise). | |||
} | |||
\description{ | |||
‘chisq.test’ performs chi-squared contingency table tests and goodness-of-fit tests. | |||
\code{chisq.test} performs chi-squared contingency table tests and goodness-of-fit tests, with an added method for formulas. | |||
} | |||
\details{ | |||
If ‘x’ is a matrix with one row or column, or if ‘x’ is a vector | |||
and ‘y’ is not given, then a _goodness-of-fit test_ is performed | |||
(‘x’ is treated as a one-dimensional contingency table). The | |||
entries of ‘x’ must be non-negative integers. In this case, the | |||
If \code{x} is a matrix with one row or column, or if \code{x} is a vector | |||
and \code{y} is not given, then a _goodness-of-fit test_ is performed | |||
(\code{x} is treated as a one-dimensional contingency table). The | |||
entries of \code{x} must be non-negative integers. In this case, the | |||
hypothesis tested is whether the population probabilities equal | |||
those in ‘p’, or are all equal if ‘p’ is not given. | |||
those in \code{p}, or are all equal if \code{p} is not given. | |||
If ‘x’ is a matrix with at least two rows and columns, it is taken | |||
as a two-dimensional contingency table: the entries of ‘x’ must be | |||
non-negative integers. Otherwise, ‘x’ and ‘y’ must be vectors or | |||
If \code{x} is a matrix with at least two rows and columns, it is taken | |||
as a two-dimensional contingency table: the entries of \code{x} must be | |||
non-negative integers. Otherwise, \code{x} and \code{y} must be vectors or | |||
factors of the same length; cases with missing values are removed, | |||
the objects are coerced to factors, and the contingency table is | |||
computed from these. Then Pearson's chi-squared test is performed | |||
@@ -92,11 +92,11 @@ of the null hypothesis that the joint distribution of the cell | |||
counts in a 2-dimensional contingency table is the product of the | |||
row and column marginals. | |||
If ‘simulate.p.value’ is ‘FALSE’, the p-value is computed from the | |||
If \code{simulate.p.value} is \code{FALSE}, the p-value is computed from the | |||
asymptotic chi-squared distribution of the test statistic; | |||
continuity correction is only used in the 2-by-2 case (if | |||
‘correct’ is ‘TRUE’, the default). Otherwise the p-value is | |||
computed for a Monte Carlo test (Hope, 1968) with ‘B’ replicates. | |||
\code{correct} is \code{TRUE}, the default). Otherwise the p-value is | |||
computed for a Monte Carlo test (Hope, 1968) with \code{B} replicates. | |||
In the contingency table case simulation is done by random | |||
sampling from the set of all contingency tables with given | |||
@@ -107,8 +107,8 @@ assumed for the chi-squared test but rather that for Fisher's | |||
exact test. | |||
In the goodness-of-fit case simulation is done by random sampling | |||
from the discrete distribution specified by ‘p’, each sample being | |||
of size ‘n = sum(x)’. This simulation is done in R and may be | |||
from the discrete distribution specified by \code{p}, each sample being | |||
of size \code{n = sum(x)}. This simulation is done in R and may be | |||
slow. | |||
} | |||
\examples{ | |||
@@ -167,5 +167,5 @@ Agresti, A. (2007) _An Introduction to Categorical Data Analysis, | |||
2nd ed._, New York: John Wiley & Sons. Page 38. | |||
} | |||
\seealso{ | |||
For goodness-of-fit testing, notably of continuous distributions, ‘ks.test’. | |||
For goodness-of-fit testing, notably of continuous distributions, \code{\link{ks.test}}. | |||
} |
@@ -4,7 +4,7 @@ | |||
\alias{datatable} | |||
\alias{datatable.default} | |||
\alias{datatable.desctable} | |||
\title{Datatable} | |||
\title{Create an HTML table widget using the DataTables library} | |||
\usage{ | |||
datatable(data, ...) | |||
@@ -30,9 +30,9 @@ datatable(data, ...) | |||
...) | |||
} | |||
\arguments{ | |||
\item{data}{A desctable} | |||
\item{data}{a data object (either a matrix or a data frame)} | |||
\item{...}{Additional datatable parameters} | |||
\item{...}{arguments passed to \code{format}.} | |||
\item{options}{a list of initialization options (see | |||
\url{http://datatables.net/reference/option/}); the character options | |||
@@ -117,7 +117,39 @@ in the first column of the table if exist (not \code{NULL})} | |||
} | |||
} | |||
\description{ | |||
Datatable | |||
This function creates an HTML widget to display rectangular data (a matrix or data frame) using the JavaScript library DataTables, with a method for \code{desctable} objects. | |||
} | |||
\note{ | |||
You are recommended to escape the table content for security reasons (e.g. XSS attacks) when using this function in Shiny or any other dynamic web applications. | |||
} | |||
\examples{ | |||
library(DT) | |||
# see the package vignette for examples and the link to website | |||
vignette('DT', package = 'DT') | |||
# some boring edge cases for testing purposes | |||
m = matrix(nrow = 0, ncol = 5, dimnames = list(NULL, letters[1:5])) | |||
datatable(m) # zero rows | |||
datatable(as.data.frame(m)) | |||
datatable method for desctable | |||
m = matrix(1, dimnames = list(NULL, 'a')) | |||
datatable(m) # one row and one column | |||
datatable(as.data.frame(m)) | |||
m = data.frame(a = 1, b = 2, c = 3) | |||
datatable(m) | |||
datatable(as.matrix(m)) | |||
# dates | |||
datatable(data.frame( | |||
date = seq(as.Date("2015-01-01"), by = "day", length.out = 5), x = 1:5 | |||
)) | |||
datatable(data.frame(x = Sys.Date())) | |||
datatable(data.frame(x = Sys.time())) | |||
### | |||
} | |||
\references{ | |||
See \url{http://rstudio.github.io/DT} for the full documentation. | |||
} |
@@ -26,24 +26,46 @@ desctable(data, stats, tests, labels) | |||
A desctable object, which prints to a table of statistics for all variables | |||
} | |||
\description{ | |||
Generate a statistics table with variable names/labels and levels | |||
Generate a statistics table with the chosen statistical functions, and tests if given a \code{"grouped"} dataframe. | |||
} | |||
\details{ | |||
\section{Labels}{ | |||
labels is an option named character vector used to make the table prettier. | |||
If given, the variable names for which there is a label will be replaced by their corresponding label. | |||
Not all variables need to have a label, and labels for non-existing variables are ignored. | |||
labels must be given in the form c(unquoted_variable_name = "label") | |||
} | |||
\section{Stats}{ | |||
The stats can be a function which takes a dataframe and returns a list of statistical functions to use. | |||
stats can also be a named list of statistical functions, or formulas. The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats. If an element of the list is a formula, it can be used to conditionally use stats depending on the variable. The general form is `condition ~ T | F`, and can be nested, such as `is.factor ~ percent | (is.normal ~ mean | median)`, for example. | |||
stats can also be a named list of statistical functions, or formulas. | |||
The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats. If an element of the list is a formula, it can be used to conditionally use stats depending on the variable. | |||
The general form is \code{condition ~ T | F}, and can be nested, such as \code{is.factor ~ percent | (is.normal ~ mean | median)}, for example. | |||
} | |||
\section{Tests}{ | |||
The tests can be a function which takes a variable and a grouping variable, and returns an appropriate statistical test to use in that case. | |||
tests can also be a named list of statistical test functions, associating the name of a variable in the data, and a test to use specifically for that variable. That test name must be expressed as a single-term formula (e.g. ~t.test). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name .default, and an automatic test can be defined with the name .auto. | |||
If data is a grouped dataframe (using group_by), subtables are created and statistic tests are performed over each sub-group. | |||
tests can also be a named list of statistical test functions, associating the name of a variable in the data, and a test to use specifically for that variable. | |||
That test name must be expressed as a single-term formula (e.g. \code{~t.test}). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name \code{.default}, and an automatic test can be defined with the name \code{.auto}. | |||
The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in pander and DT::datatable are present. Printing reduces the object to a dataframe. | |||
If data is a grouped dataframe (using \code{group_by}), subtables are created and statistic tests are performed over each sub-group. | |||
} | |||
\section{Output}{ | |||
The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in \pkg{pander} and \pkg{DT} are present. Printing reduces the object to a dataframe. | |||
} | |||
\examples{ | |||
iris \%>\% | |||
desctable | |||
@@ -63,35 +63,35 @@ fisher.test(x, y, workspace, hybrid, control, or, alternative, conf.int, | |||
Monte Carlo test.} | |||
} | |||
\value{ | |||
A list with class ‘"htest"’ containing the following components: | |||
A list with class \code{"htest"} containing the following components: | |||
p.value: the p-value of the test. | |||
conf.int: a confidence interval for the odds ratio. Only present in | |||
the 2 by 2 case and if argument ‘conf.int = TRUE’. | |||
the 2 by 2 case and if argument \code{conf.int = TRUE}. | |||
estimate: an estimate of the odds ratio. Note that the _conditional_ | |||
Maximum Likelihood Estimate (MLE) rather than the | |||
unconditional MLE (the sample odds ratio) is used. Only | |||
present in the 2 by 2 case. | |||
null.value: the odds ratio under the null, ‘or’. Only present in the 2 | |||
null.value: the odds ratio under the null, \code{or}. Only present in the 2 | |||
by 2 case. | |||
alternative: a character string describing the alternative hypothesis. | |||
method: the character string ‘"Fisher's Exact Test for Count Data"’. | |||
method: the character string \code{"Fisher's Exact Test for Count Data"}. | |||
data.name: a character string giving the names of the data. | |||
} | |||
\description{ | |||
Performs Fisher's exact test for testing the null of independence | |||
of rows and columns in a contingency table with fixed marginals. | |||
of rows and columns in a contingency table with fixed marginals, or with a formula expression. | |||
} | |||
\details{ | |||
If ‘x’ is a matrix, it is taken as a two-dimensional contingency | |||
If \code{x} is a matrix, it is taken as a two-dimensional contingency | |||
table, and hence its entries should be nonnegative integers. | |||
Otherwise, both ‘x’ and ‘y’ must be vectors of the same length. | |||
Otherwise, both \code{x} and \code{y} must be vectors of the same length. | |||
Incomplete cases are removed, the vectors are coerced into factor | |||
objects, and the contingency table is computed from these. | |||
@@ -100,7 +100,7 @@ For 2 by 2 cases, p-values are obtained directly using the | |||
computations are based on a C version of the FORTRAN subroutine | |||
FEXACT which implements the network developed by Mehta and Patel | |||
(1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN | |||
code can be obtained from |http://www.netlib.org/toms/643|. | |||
code can be obtained from \url{http://www.netlib.org/toms/643}. | |||
Note this fails (with an error message) when the entries of the | |||
table are too large. (It transposes the table if necessary so it | |||
has no more rows than columns. One constraint is that the product | |||
@@ -108,20 +108,20 @@ of the row marginals be less than 2^31 - 1.) | |||
For 2 by 2 tables, the null of conditional independence is | |||
equivalent to the hypothesis that the odds ratio equals one. | |||
‘Exact’ inference can be based on observing that in general, given | |||
\code{Exact} inference can be based on observing that in general, given | |||
all marginal totals fixed, the first element of the contingency | |||
table has a non-central hypergeometric distribution with | |||
non-centrality parameter given by the odds ratio (Fisher, 1935). | |||
The alternative for a one-sided test is based on the odds ratio, | |||
so ‘alternative = "greater"’ is a test of the odds ratio being | |||
bigger than ‘or’. | |||
so \code{alternative = "greater"} is a test of the odds ratio being | |||
bigger than \code{or}. | |||
Two-sided tests are based on the probabilities of the tables, and | |||
take as ‘more extreme’ all tables with probabilities less than or | |||
take as \code{more extreme} all tables with probabilities less than or | |||
equal to that of the observed table, the p-value being the sum of | |||
such probabilities. | |||
For larger than 2 by 2 tables and ‘hybrid = TRUE’, asymptotic | |||
For larger than 2 by 2 tables and \code{hybrid = TRUE}, asymptotic | |||
chi-squared probabilities are only used if the ‘Cochran | |||
conditions’ are satisfied, that is if no cell has count zero, and | |||
more than 80% of the cells have counts at least 5: otherwise the | |||
@@ -202,9 +202,9 @@ generating r x c tables with given row and column totals. | |||
_Applied Statistics_ *30*, 91-97. | |||
} | |||
\seealso{ | |||
‘chisq.test’ | |||
\code{\link{chisq.test}} | |||
‘fisher.exact’ in package ‘exact2x2’ for alternative | |||
\code{fisher.exact} in package \pkg{kexact2x2} for alternative | |||
interpretations of two-sided tests and confidence intervals for 2 | |||
by 2 tables. | |||
} |
@@ -25,5 +25,11 @@ pander.desctable(x = NULL, digits = 2, justify = "left", missing = "", | |||
\item{...}{unsupported extra arguments directly placed into \code{/dev/null}} | |||
} | |||
\description{ | |||
Pander method for desctable | |||
Pander method to output a desctable | |||
} | |||
\details{ | |||
Uses \code{pandoc.table}, with some default parameters (\code{digits = 2}, \code{justify = "left"}, \code{missing = ""}, \code{keep.line.breaks = T}, \code{split.tables = Inf}, and \code{emphasize.rownames = F}), that you can override if needed. | |||
} | |||
\seealso{ | |||
\code{\link{pandoc.table}} | |||
} |
@@ -22,8 +22,13 @@ The results for the function applied on the vector, compatible with the format o | |||
} | |||
\description{ | |||
Transform a function into a valid stat function for the table | |||
} | |||
\details{ | |||
NA values are removed from the data | |||
Applying the function on a numerical vector should return one value | |||
Applying the function on a factor should return nlevels + 1 value, or one value per factor level | |||
See `parse_formula` for the usage for formulaes. | |||
See \code{parse_formula} for the usage for formulaes. | |||
} |
@@ -26,10 +26,12 @@ These functions take a dataframe as argument and return a list of statistcs in t | |||
} | |||
\details{ | |||
Already defined are | |||
- stats_default with length, mean/%, sd, med and IQR | |||
- stats_normal with length, mean/% and sd | |||
- stats_nonnormal with length, median/% and IQR | |||
- stats_auto, which picks stats depending of the data | |||
\enumerate{ | |||
\item stats_default with length, mean/\%, sd, med and IQR | |||
\item stats_normal with length, mean/\% and sd | |||
\item stats_nonnormal with length, median/\% and IQR | |||
\item stats_auto, which picks stats depending of the data | |||
} | |||
You can define your own automatic functions, as long as they take a dataframe as argument and return a list of functions or formulas defining conditions to use a stat function. | |||
} |
@@ -18,5 +18,5 @@ A statistical test function | |||
These functions take a variable and a grouping variable as arguments, and return a statistcal test to use, expressed as a single-term formula. | |||
} | |||
\details{ | |||
Currently, only tests_auto is defined, and picks between t test, wilcoxon, anova, kruskal-wallis and fisher depending on the number of groups, the type of the variable, the normality and homoskedasticity of the distributions. | |||
Currently, only \code{tests_auto} is defined, and picks between t test, wilcoxon, anova, kruskal-wallis and fisher depending on the number of groups, the type of the variable, the normality and homoskedasticity of the distributions. | |||
} |