Browse Source

Updated documentation

tags/0.1.0
Maxime Wack 7 years ago
parent
commit
56db2283a0
14 changed files with 237 additions and 115 deletions
  1. +20
    -5
      R/build.R
  2. +40
    -40
      R/convenience_functions.R
  3. +39
    -5
      R/output.R
  4. +11
    -5
      R/stats.R
  5. +1
    -1
      R/tests.R
  6. +4
    -4
      man/ANOVA.Rd
  7. +22
    -22
      man/chisq.test.Rd
  8. +37
    -5
      man/datatable.Rd
  9. +28
    -6
      man/desctable.Rd
  10. +15
    -15
      man/fisher.test.Rd
  11. +7
    -1
      man/pander.desctable.Rd
  12. +6
    -1
      man/statify.Rd
  13. +6
    -4
      man/stats_default.Rd
  14. +1
    -1
      man/tests_auto.Rd

+ 20
- 5
R/build.R View File

@@ -74,22 +74,37 @@ varColumn <- function(data, labels = NULL)

#' Generate a statistics table
#'
#' Generate a statistics table with variable names/labels and levels
#' Generate a statistics table with the chosen statistical functions, and tests if given a \code{"grouped"} dataframe.
#'
#' @section Labels:
#' labels is an option named character vector used to make the table prettier.
#'
#' If given, the variable names for which there is a label will be replaced by their corresponding label.
#'
#' Not all variables need to have a label, and labels for non-existing variables are ignored.
#'
#' labels must be given in the form c(unquoted_variable_name = "label")
#'
#' @section Stats:
#' The stats can be a function which takes a dataframe and returns a list of statistical functions to use.
#' stats can also be a named list of statistical functions, or formulas. The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats. If an element of the list is a formula, it can be used to conditionally use stats depending on the variable. The general form is `condition ~ T | F`, and can be nested, such as `is.factor ~ percent | (is.normal ~ mean | median)`, for example.
#'
#' stats can also be a named list of statistical functions, or formulas.
#'
#' The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats. If an element of the list is a formula, it can be used to conditionally use stats depending on the variable.
#'
#' The general form is \code{condition ~ T | F}, and can be nested, such as \code{is.factor ~ percent | (is.normal ~ mean | median)}, for example.
#'
#' @section Tests:
#' The tests can be a function which takes a variable and a grouping variable, and returns an appropriate statistical test to use in that case.
#' tests can also be a named list of statistical test functions, associating the name of a variable in the data, and a test to use specifically for that variable. That test name must be expressed as a single-term formula (e.g. ~t.test). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name .default, and an automatic test can be defined with the name .auto.
#'
#' If data is a grouped dataframe (using group_by), subtables are created and statistic tests are performed over each sub-group.
#' tests can also be a named list of statistical test functions, associating the name of a variable in the data, and a test to use specifically for that variable.
#'
#' That test name must be expressed as a single-term formula (e.g. \code{~t.test}). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name \code{.default}, and an automatic test can be defined with the name \code{.auto}.
#'
#' If data is a grouped dataframe (using \code{group_by}), subtables are created and statistic tests are performed over each sub-group.
#'
#' The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in pander and DT::datatable are present. Printing reduces the object to a dataframe.
#' @section Output:
#' The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in \pkg{pander} and \pkg{DT} are present. Printing reduces the object to a dataframe.
#'
#' @param data The dataframe to analyze
#' @param stats A list of named statistics to apply to each element of the dataframe, or a function returning a list of named statistics


+ 40
- 40
R/convenience_functions.R View File

@@ -43,11 +43,11 @@ is.normal <- function(x)
#' Fisher's Exact Test for Count Data
#'
#' Performs Fisher's exact test for testing the null of independence
#' of rows and columns in a contingency table with fixed marginals.
#' of rows and columns in a contingency table with fixed marginals, or with a formula expression.
#'
#' If ‘x’ is a matrix, it is taken as a two-dimensional contingency
#' If \code{x} is a matrix, it is taken as a two-dimensional contingency
#' table, and hence its entries should be nonnegative integers.
#' Otherwise, both ‘x’ and ‘y’ must be vectors of the same length.
#' Otherwise, both \code{x} and \code{y} must be vectors of the same length.
#' Incomplete cases are removed, the vectors are coerced into factor
#' objects, and the contingency table is computed from these.
#'
@@ -56,7 +56,7 @@ is.normal <- function(x)
#' computations are based on a C version of the FORTRAN subroutine
#' FEXACT which implements the network developed by Mehta and Patel
#' (1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN
#' code can be obtained from  |http://www.netlib.org/toms/643|.
#' code can be obtained from \url{http://www.netlib.org/toms/643}.
#' Note this fails (with an error message) when the entries of the
#' table are too large. (It transposes the table if necessary so it
#' has no more rows than columns. One constraint is that the product
@@ -64,20 +64,20 @@ is.normal <- function(x)
#'
#' For 2 by 2 tables, the null of conditional independence is
#' equivalent to the hypothesis that the odds ratio equals one.
#' ‘Exact’ inference can be based on observing that in general, given
#' \code{Exact} inference can be based on observing that in general, given
#' all marginal totals fixed, the first element of the contingency
#' table has a non-central hypergeometric distribution with
#' non-centrality parameter given by the odds ratio (Fisher, 1935).
#' The alternative for a one-sided test is based on the odds ratio,
#' so ‘alternative = "greater"’ is a test of the odds ratio being
#' bigger than ‘or’.
#' so \code{alternative = "greater"} is a test of the odds ratio being
#' bigger than \code{or}.
#'
#' Two-sided tests are based on the probabilities of the tables, and
#' take as ‘more extreme’ all tables with probabilities less than or
#' take as \code{more extreme} all tables with probabilities less than or
#' equal to that of the observed table, the p-value being the sum of
#' such probabilities.
#'
#' For larger than 2 by 2 tables and ‘hybrid = TRUE’, asymptotic
#' For larger than 2 by 2 tables and \code{hybrid = TRUE}, asymptotic
#' chi-squared probabilities are only used if the ‘Cochran
#' conditions’ are satisfied, that is if no cell has count zero, and
#' more than 80% of the cells have counts at least 5: otherwise the
@@ -89,24 +89,24 @@ is.normal <- function(x)
#' @param x either a two-dimensional contingency table in matrix form, a factor object, or a formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors.
#' @param y a factor object; ignored if \code{x} is a matrix or a formula.
#' @inheritParams stats::fisher.test
#' @return A list with class ‘"htest"’ containing the following components:
#' @return A list with class \code{"htest"} containing the following components:
#'
#' p.value: the p-value of the test.
#'
#' conf.int: a confidence interval for the odds ratio. Only present in
#' the 2 by 2 case and if argument ‘conf.int = TRUE’.
#' the 2 by 2 case and if argument \code{conf.int = TRUE}.
#'
#' estimate: an estimate of the odds ratio. Note that the _conditional_
#' Maximum Likelihood Estimate (MLE) rather than the
#' unconditional MLE (the sample odds ratio) is used. Only
#' present in the 2 by 2 case.
#'
#' null.value: the odds ratio under the null, ‘or’. Only present in the 2
#' null.value: the odds ratio under the null, \code{or}. Only present in the 2
#' by 2 case.
#'
#' alternative: a character string describing the alternative hypothesis.
#'
#' method: the character string ‘"Fisher's Exact Test for Count Data"’.
#' method: the character string \code{"Fisher's Exact Test for Count Data"}.
#'
#' data.name: a character string giving the names of the data.
#' @references
@@ -139,9 +139,9 @@ is.normal <- function(x)
#' generating r x c tables with given row and column totals.
#' _Applied Statistics_ *30*, 91-97.
#' @seealso
#' ‘chisq.test’
#' \code{\link{chisq.test}}
#'
#' ‘fisher.exact’ in package ‘exact2x2’ for alternative
#' \code{fisher.exact} in package \pkg{kexact2x2} for alternative
#' interpretations of two-sided tests and confidence intervals for 2
#' by 2 tables.
#' @examples
@@ -220,18 +220,18 @@ fisher.test.formula <- function(x,

#' Pearson's Chi-squared Test for Count Data
#'
#' ‘chisq.test’ performs chi-squared contingency table tests and goodness-of-fit tests.
#' \code{chisq.test} performs chi-squared contingency table tests and goodness-of-fit tests, with an added method for formulas.
#'
#' If ‘x’ is a matrix with one row or column, or if ‘x’ is a vector
#' and ‘y’ is not given, then a _goodness-of-fit test_ is performed
#' (‘x’ is treated as a one-dimensional contingency table). The
#' entries of ‘x’ must be non-negative integers. In this case, the
#' If \code{x} is a matrix with one row or column, or if \code{x} is a vector
#' and \code{y} is not given, then a _goodness-of-fit test_ is performed
#' (\code{x} is treated as a one-dimensional contingency table). The
#' entries of \code{x} must be non-negative integers. In this case, the
#' hypothesis tested is whether the population probabilities equal
#' those in ‘p’, or are all equal if ‘p’ is not given.
#' those in \code{p}, or are all equal if \code{p} is not given.
#'
#' If ‘x’ is a matrix with at least two rows and columns, it is taken
#' as a two-dimensional contingency table: the entries of ‘x’ must be
#' non-negative integers. Otherwise, ‘x’ and ‘y’ must be vectors or
#' If \code{x} is a matrix with at least two rows and columns, it is taken
#' as a two-dimensional contingency table: the entries of \code{x} must be
#' non-negative integers. Otherwise, \code{x} and \code{y} must be vectors or
#' factors of the same length; cases with missing values are removed,
#' the objects are coerced to factors, and the contingency table is
#' computed from these. Then Pearson's chi-squared test is performed
@@ -239,11 +239,11 @@ fisher.test.formula <- function(x,
#' counts in a 2-dimensional contingency table is the product of the
#' row and column marginals.
#'
#' If ‘simulate.p.value’ is ‘FALSE’, the p-value is computed from the
#' If \code{simulate.p.value} is \code{FALSE}, the p-value is computed from the
#' asymptotic chi-squared distribution of the test statistic;
#' continuity correction is only used in the 2-by-2 case (if
#' ‘correct’ is ‘TRUE’, the default). Otherwise the p-value is
#' computed for a Monte Carlo test (Hope, 1968) with ‘B’ replicates.
#' \code{correct} is \code{TRUE}, the default). Otherwise the p-value is
#' computed for a Monte Carlo test (Hope, 1968) with \code{B} replicates.
#'
#' In the contingency table case simulation is done by random
#' sampling from the set of all contingency tables with given
@@ -254,17 +254,17 @@ fisher.test.formula <- function(x,
#' exact test.
#'
#' In the goodness-of-fit case simulation is done by random sampling
#' from the discrete distribution specified by ‘p’, each sample being
#' of size ‘n = sum(x)’. This simulation is done in R and may be
#' from the discrete distribution specified by \code{p}, each sample being
#' of size \code{n = sum(x)}. This simulation is done in R and may be
#' slow.
#' @param x a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. ‘x’ and ‘y’ can also both be factors.
#' @param y a numeric vector; ignored if ‘x’ is a matrix or a formula. If ‘x’ is a factor, ‘y’ should be a factor of the same length.
#' @param x a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. \code{x} and \code{y} can also both be factors.
#' @param y a numeric vector; ignored if \code{x} is a matrix or a formula. If \code{x} is a factor, \code{y} should be a factor of the same length.
#' @inheritParams stats::chisq.test
#' @return A list with class ‘"htest"’ containing the following components:
#' @return A list with class \code{"htest"} containing the following components:
#' statistic: the value the chi-squared test statistic.
#'
#' parameter: the degrees of freedom of the approximate chi-squared
#' distribution of the test statistic, ‘NA’ if the p-value is
#' distribution of the test statistic, \code{NA} if the p-value is
#' computed by Monte Carlo simulation.
#'
#' p.value: the p-value for the test.
@@ -282,9 +282,9 @@ fisher.test.formula <- function(x,
#' residuals: the Pearson residuals, ‘(observed - expected) /
#' sqrt(expected)’.
#'
#' stdres: standardized residuals, ‘(observed - expected) / sqrt(V)’,
#' where ‘V’ is the residual cell variance (Agresti, 2007,
#' section 2.4.5 for the case where ‘x’ is a matrix, ‘n * p * (1
#' stdres: standardized residuals, \code{(observed - expected) / sqrt(V)},
#' where \code{V} is the residual cell variance (Agresti, 2007,
#' section 2.4.5 for the case where \code{x} is a matrix, ‘n * p * (1
#' - p)’ otherwise).
#' @source The code for Monte Carlo simulation is a C translation of the Fortran algorithm of Patefield (1981).
#' @references
@@ -297,7 +297,7 @@ fisher.test.formula <- function(x,
#'
#' Agresti, A. (2007) _An Introduction to Categorical Data Analysis,
#' 2nd ed._, New York: John Wiley & Sons. Page 38.
#' @seealso For goodness-of-fit testing, notably of continuous distributions, ‘ks.test’.
#' @seealso For goodness-of-fit testing, notably of continuous distributions, \code{\link{ks.test}}.
#' @examples
#' \dontrun{
#' ## From Agresti(2007) p.39
@@ -368,10 +368,10 @@ chisq.test.formula <- function(x,
B = B)
}

#' Wrapper for summary(aov)
#' Wrapper for oneway.test(var.equal = T)
#'
#' @param formula An anova formula (variable ~ grouping variable)
#' @seealso stats::aov
#' @param formula An anova formula (\code{variable ~ grouping variable})
#' @seealso \code{\link{oneway.test}}
#' @export
ANOVA <- function(formula)
{


+ 39
- 5
R/output.R View File

@@ -30,8 +30,13 @@ as.data.frame.desctable <- function(x, ...)

#' Pander method for desctable
#'
#' Pander method to output a desctable
#'
#' Uses \code{pandoc.table}, with some default parameters (\code{digits = 2}, \code{justify = "left"}, \code{missing = ""}, \code{keep.line.breaks = T}, \code{split.tables = Inf}, and \code{emphasize.rownames = F}), that you can override if needed.
#'
#' @param x A desctable
#' @inheritParams pander::pandoc.table
#' @seealso \code{\link{pandoc.table}}
#' @export
pander.desctable <- function(x = NULL,
digits = 2,
@@ -62,8 +67,41 @@ pander.desctable <- function(x = NULL,
...)
}

#' Datatable
#' Create an HTML table widget using the DataTables library
#'
#' This function creates an HTML widget to display rectangular data (a matrix or data frame) using the JavaScript library DataTables, with a method for \code{desctable} objects.
#'
#' @note
#' You are recommended to escape the table content for security reasons (e.g. XSS attacks) when using this function in Shiny or any other dynamic web applications.
#' @references
#' See \url{http://rstudio.github.io/DT} for the full documentation.
#' @examples
#' library(DT)
#'
#' # see the package vignette for examples and the link to website
#' vignette('DT', package = 'DT')
#'
#' # some boring edge cases for testing purposes
#' m = matrix(nrow = 0, ncol = 5, dimnames = list(NULL, letters[1:5]))
#' datatable(m) # zero rows
#' datatable(as.data.frame(m))
#'
#' m = matrix(1, dimnames = list(NULL, 'a'))
#' datatable(m) # one row and one column
#' datatable(as.data.frame(m))
#'
#' m = data.frame(a = 1, b = 2, c = 3)
#' datatable(m)
#' datatable(as.matrix(m))
#'
#' # dates
#' datatable(data.frame(
#' date = seq(as.Date("2015-01-01"), by = "day", length.out = 5), x = 1:5
#' ))
#' datatable(data.frame(x = Sys.Date()))
#' datatable(data.frame(x = Sys.time()))
#'
#' ###
#' @inheritParams DT::datatable
#' @export
datatable <- function(data, ...)
@@ -93,10 +131,6 @@ datatable.default <- function(data,
DT::datatable(data, options = options, class = class, callback = callback, caption = caption, filter = filter, escape = escape, style = style, width = width, height = height, elementId = elementId, fillContainer = fillContainer, autoHideNavigation = autoHideNavigation, selection = selection, extensions = extensions, plugins = plugins, ...)
}

#' datatable method for desctable
#'
#' @param data A desctable
#' @param ... Additional datatable parameters
#' @rdname datatable
#' @inheritParams base::prettyNum
#' @export


+ 11
- 5
R/stats.R View File

@@ -1,10 +1,14 @@
#' Transform any function into a valid stat function for the table
#'
#' Transform a function into a valid stat function for the table
#'
#' NA values are removed from the data
#'
#' Applying the function on a numerical vector should return one value
#'
#' Applying the function on a factor should return nlevels + 1 value, or one value per factor level
#' See `parse_formula` for the usage for formulaes.
#'
#' See \code{parse_formula} for the usage for formulaes.
#' @param f The function to try to apply, or a formula combining two functions
#' @param x A vector
#' @export
@@ -67,10 +71,12 @@ statify.formula <- function(x, f)
#' These functions take a dataframe as argument and return a list of statistcs in the form accepted by desctable.
#'
#' Already defined are
#' - stats_default with length, mean/%, sd, med and IQR
#' - stats_normal with length, mean/% and sd
#' - stats_nonnormal with length, median/% and IQR
#' - stats_auto, which picks stats depending of the data
#' \enumerate{
#' \item stats_default with length, mean/\%, sd, med and IQR
#' \item stats_normal with length, mean/\% and sd
#' \item stats_nonnormal with length, median/\% and IQR
#' \item stats_auto, which picks stats depending of the data
#' }
#'
#' You can define your own automatic functions, as long as they take a dataframe as argument and return a list of functions or formulas defining conditions to use a stat function.
#'


+ 1
- 1
R/tests.R View File

@@ -27,7 +27,7 @@ testify <- function(x, f, group)
#'
#' These functions take a variable and a grouping variable as arguments, and return a statistcal test to use, expressed as a single-term formula.
#'
#' Currently, only tests_auto is defined, and picks between t test, wilcoxon, anova, kruskal-wallis and fisher depending on the number of groups, the type of the variable, the normality and homoskedasticity of the distributions.
#' Currently, only \code{tests_auto} is defined, and picks between t test, wilcoxon, anova, kruskal-wallis and fisher depending on the number of groups, the type of the variable, the normality and homoskedasticity of the distributions.
#'
#' @param var The variable to test
#' @param grp The variable for the groups


+ 4
- 4
man/ANOVA.Rd View File

@@ -2,16 +2,16 @@
% Please edit documentation in R/convenience_functions.R
\name{ANOVA}
\alias{ANOVA}
\title{Wrapper for summary(aov)}
\title{Wrapper for oneway.test(var.equal = T)}
\usage{
ANOVA(formula)
}
\arguments{
\item{formula}{An anova formula (variable ~ grouping variable)}
\item{formula}{An anova formula (\code{variable ~ grouping variable})}
}
\description{
Wrapper for summary(aov)
Wrapper for oneway.test(var.equal = T)
}
\seealso{
stats::aov
\code{\link{oneway.test}}
}

+ 22
- 22
man/chisq.test.Rd View File

@@ -20,9 +20,9 @@ chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B)
B = 2000)
}
\arguments{
\item{x}{a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. ‘x’ and ‘y’ can also both be factors.}
\item{x}{a numeric vector, or matrix, or formula of the form \code{lhs ~ rhs} where \code{lhs} and \code{rhs} are factors. \code{x} and \code{y} can also both be factors.}

\item{y}{a numeric vector; ignored if ‘x’ is a matrix or a formula. If ‘x’ is a factor, ‘y’ should be a factor of the same length.}
\item{y}{a numeric vector; ignored if \code{x} is a matrix or a formula. If \code{x} is a factor, \code{y} should be a factor of the same length.}

\item{correct}{a logical indicating whether to apply continuity
correction when computing the test statistic for 2 by 2 tables: one
@@ -44,11 +44,11 @@ chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B)
Monte Carlo test.}
}
\value{
A list with class ‘"htest"’ containing the following components:
A list with class \code{"htest"} containing the following components:
statistic: the value the chi-squared test statistic.

parameter: the degrees of freedom of the approximate chi-squared
distribution of the test statistic, ‘NA’ if the p-value is
distribution of the test statistic, \code{NA} if the p-value is
computed by Monte Carlo simulation.

p.value: the p-value for the test.
@@ -66,25 +66,25 @@ expected: the expected counts under the null hypothesis.
residuals: the Pearson residuals, ‘(observed - expected) /
sqrt(expected)’.

stdres: standardized residuals, ‘(observed - expected) / sqrt(V)’,
where ‘V’ is the residual cell variance (Agresti, 2007,
section 2.4.5 for the case where ‘x’ is a matrix, ‘n * p * (1
stdres: standardized residuals, \code{(observed - expected) / sqrt(V)},
where \code{V} is the residual cell variance (Agresti, 2007,
section 2.4.5 for the case where \code{x} is a matrix, ‘n * p * (1
- p)’ otherwise).
}
\description{
‘chisq.test’ performs chi-squared contingency table tests and goodness-of-fit tests.
\code{chisq.test} performs chi-squared contingency table tests and goodness-of-fit tests, with an added method for formulas.
}
\details{
If ‘x’ is a matrix with one row or column, or if ‘x’ is a vector
and ‘y’ is not given, then a _goodness-of-fit test_ is performed
(‘x’ is treated as a one-dimensional contingency table). The
entries of ‘x’ must be non-negative integers. In this case, the
If \code{x} is a matrix with one row or column, or if \code{x} is a vector
and \code{y} is not given, then a _goodness-of-fit test_ is performed
(\code{x} is treated as a one-dimensional contingency table). The
entries of \code{x} must be non-negative integers. In this case, the
hypothesis tested is whether the population probabilities equal
those in ‘p’, or are all equal if ‘p’ is not given.
those in \code{p}, or are all equal if \code{p} is not given.

If ‘x’ is a matrix with at least two rows and columns, it is taken
as a two-dimensional contingency table: the entries of ‘x’ must be
non-negative integers. Otherwise, ‘x’ and ‘y’ must be vectors or
If \code{x} is a matrix with at least two rows and columns, it is taken
as a two-dimensional contingency table: the entries of \code{x} must be
non-negative integers. Otherwise, \code{x} and \code{y} must be vectors or
factors of the same length; cases with missing values are removed,
the objects are coerced to factors, and the contingency table is
computed from these. Then Pearson's chi-squared test is performed
@@ -92,11 +92,11 @@ of the null hypothesis that the joint distribution of the cell
counts in a 2-dimensional contingency table is the product of the
row and column marginals.

If ‘simulate.p.value’ is ‘FALSE’, the p-value is computed from the
If \code{simulate.p.value} is \code{FALSE}, the p-value is computed from the
asymptotic chi-squared distribution of the test statistic;
continuity correction is only used in the 2-by-2 case (if
‘correct’ is ‘TRUE’, the default). Otherwise the p-value is
computed for a Monte Carlo test (Hope, 1968) with ‘B’ replicates.
\code{correct} is \code{TRUE}, the default). Otherwise the p-value is
computed for a Monte Carlo test (Hope, 1968) with \code{B} replicates.

In the contingency table case simulation is done by random
sampling from the set of all contingency tables with given
@@ -107,8 +107,8 @@ assumed for the chi-squared test but rather that for Fisher's
exact test.

In the goodness-of-fit case simulation is done by random sampling
from the discrete distribution specified by ‘p’, each sample being
of size ‘n = sum(x)’. This simulation is done in R and may be
from the discrete distribution specified by \code{p}, each sample being
of size \code{n = sum(x)}. This simulation is done in R and may be
slow.
}
\examples{
@@ -167,5 +167,5 @@ Agresti, A. (2007) _An Introduction to Categorical Data Analysis,
2nd ed._, New York: John Wiley & Sons. Page 38.
}
\seealso{
For goodness-of-fit testing, notably of continuous distributions, ‘ks.test’.
For goodness-of-fit testing, notably of continuous distributions, \code{\link{ks.test}}.
}

+ 37
- 5
man/datatable.Rd View File

@@ -4,7 +4,7 @@
\alias{datatable}
\alias{datatable.default}
\alias{datatable.desctable}
\title{Datatable}
\title{Create an HTML table widget using the DataTables library}
\usage{
datatable(data, ...)

@@ -30,9 +30,9 @@ datatable(data, ...)
...)
}
\arguments{
\item{data}{A desctable}
\item{data}{a data object (either a matrix or a data frame)}

\item{...}{Additional datatable parameters}
\item{...}{arguments passed to \code{format}.}

\item{options}{a list of initialization options (see
\url{http://datatables.net/reference/option/}); the character options
@@ -117,7 +117,39 @@ in the first column of the table if exist (not \code{NULL})}
}
}
\description{
Datatable
This function creates an HTML widget to display rectangular data (a matrix or data frame) using the JavaScript library DataTables, with a method for \code{desctable} objects.
}
\note{
You are recommended to escape the table content for security reasons (e.g. XSS attacks) when using this function in Shiny or any other dynamic web applications.
}
\examples{
library(DT)

# see the package vignette for examples and the link to website
vignette('DT', package = 'DT')

# some boring edge cases for testing purposes
m = matrix(nrow = 0, ncol = 5, dimnames = list(NULL, letters[1:5]))
datatable(m) # zero rows
datatable(as.data.frame(m))

datatable method for desctable
m = matrix(1, dimnames = list(NULL, 'a'))
datatable(m) # one row and one column
datatable(as.data.frame(m))

m = data.frame(a = 1, b = 2, c = 3)
datatable(m)
datatable(as.matrix(m))

# dates
datatable(data.frame(
date = seq(as.Date("2015-01-01"), by = "day", length.out = 5), x = 1:5
))
datatable(data.frame(x = Sys.Date()))
datatable(data.frame(x = Sys.time()))

###
}
\references{
See \url{http://rstudio.github.io/DT} for the full documentation.
}

+ 28
- 6
man/desctable.Rd View File

@@ -26,24 +26,46 @@ desctable(data, stats, tests, labels)
A desctable object, which prints to a table of statistics for all variables
}
\description{
Generate a statistics table with variable names/labels and levels
Generate a statistics table with the chosen statistical functions, and tests if given a \code{"grouped"} dataframe.
}
\details{
\section{Labels}{

labels is an option named character vector used to make the table prettier.

If given, the variable names for which there is a label will be replaced by their corresponding label.

Not all variables need to have a label, and labels for non-existing variables are ignored.

labels must be given in the form c(unquoted_variable_name = "label")
}

\section{Stats}{

The stats can be a function which takes a dataframe and returns a list of statistical functions to use.
stats can also be a named list of statistical functions, or formulas. The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats. If an element of the list is a formula, it can be used to conditionally use stats depending on the variable. The general form is `condition ~ T | F`, and can be nested, such as `is.factor ~ percent | (is.normal ~ mean | median)`, for example.

stats can also be a named list of statistical functions, or formulas.

The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats. If an element of the list is a formula, it can be used to conditionally use stats depending on the variable.

The general form is \code{condition ~ T | F}, and can be nested, such as \code{is.factor ~ percent | (is.normal ~ mean | median)}, for example.
}

\section{Tests}{

The tests can be a function which takes a variable and a grouping variable, and returns an appropriate statistical test to use in that case.
tests can also be a named list of statistical test functions, associating the name of a variable in the data, and a test to use specifically for that variable. That test name must be expressed as a single-term formula (e.g. ~t.test). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name .default, and an automatic test can be defined with the name .auto.

If data is a grouped dataframe (using group_by), subtables are created and statistic tests are performed over each sub-group.
tests can also be a named list of statistical test functions, associating the name of a variable in the data, and a test to use specifically for that variable.

That test name must be expressed as a single-term formula (e.g. \code{~t.test}). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name \code{.default}, and an automatic test can be defined with the name \code{.auto}.

The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in pander and DT::datatable are present. Printing reduces the object to a dataframe.
If data is a grouped dataframe (using \code{group_by}), subtables are created and statistic tests are performed over each sub-group.
}

\section{Output}{

The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in \pkg{pander} and \pkg{DT} are present. Printing reduces the object to a dataframe.
}

\examples{
iris \%>\%
desctable


+ 15
- 15
man/fisher.test.Rd View File

@@ -63,35 +63,35 @@ fisher.test(x, y, workspace, hybrid, control, or, alternative, conf.int,
Monte Carlo test.}
}
\value{
A list with class ‘"htest"’ containing the following components:
A list with class \code{"htest"} containing the following components:

p.value: the p-value of the test.

conf.int: a confidence interval for the odds ratio. Only present in
the 2 by 2 case and if argument ‘conf.int = TRUE’.
the 2 by 2 case and if argument \code{conf.int = TRUE}.

estimate: an estimate of the odds ratio. Note that the _conditional_
Maximum Likelihood Estimate (MLE) rather than the
unconditional MLE (the sample odds ratio) is used. Only
present in the 2 by 2 case.

null.value: the odds ratio under the null, ‘or’. Only present in the 2
null.value: the odds ratio under the null, \code{or}. Only present in the 2
by 2 case.

alternative: a character string describing the alternative hypothesis.

method: the character string ‘"Fisher's Exact Test for Count Data"’.
method: the character string \code{"Fisher's Exact Test for Count Data"}.

data.name: a character string giving the names of the data.
}
\description{
Performs Fisher's exact test for testing the null of independence
of rows and columns in a contingency table with fixed marginals.
of rows and columns in a contingency table with fixed marginals, or with a formula expression.
}
\details{
If ‘x’ is a matrix, it is taken as a two-dimensional contingency
If \code{x} is a matrix, it is taken as a two-dimensional contingency
table, and hence its entries should be nonnegative integers.
Otherwise, both ‘x’ and ‘y’ must be vectors of the same length.
Otherwise, both \code{x} and \code{y} must be vectors of the same length.
Incomplete cases are removed, the vectors are coerced into factor
objects, and the contingency table is computed from these.

@@ -100,7 +100,7 @@ For 2 by 2 cases, p-values are obtained directly using the
computations are based on a C version of the FORTRAN subroutine
FEXACT which implements the network developed by Mehta and Patel
(1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN
code can be obtained from  |http://www.netlib.org/toms/643|.
code can be obtained from \url{http://www.netlib.org/toms/643}.
Note this fails (with an error message) when the entries of the
table are too large. (It transposes the table if necessary so it
has no more rows than columns. One constraint is that the product
@@ -108,20 +108,20 @@ of the row marginals be less than 2^31 - 1.)

For 2 by 2 tables, the null of conditional independence is
equivalent to the hypothesis that the odds ratio equals one.
‘Exact’ inference can be based on observing that in general, given
\code{Exact} inference can be based on observing that in general, given
all marginal totals fixed, the first element of the contingency
table has a non-central hypergeometric distribution with
non-centrality parameter given by the odds ratio (Fisher, 1935).
The alternative for a one-sided test is based on the odds ratio,
so ‘alternative = "greater"’ is a test of the odds ratio being
bigger than ‘or’.
so \code{alternative = "greater"} is a test of the odds ratio being
bigger than \code{or}.

Two-sided tests are based on the probabilities of the tables, and
take as ‘more extreme’ all tables with probabilities less than or
take as \code{more extreme} all tables with probabilities less than or
equal to that of the observed table, the p-value being the sum of
such probabilities.

For larger than 2 by 2 tables and ‘hybrid = TRUE’, asymptotic
For larger than 2 by 2 tables and \code{hybrid = TRUE}, asymptotic
chi-squared probabilities are only used if the ‘Cochran
conditions’ are satisfied, that is if no cell has count zero, and
more than 80% of the cells have counts at least 5: otherwise the
@@ -202,9 +202,9 @@ generating r x c tables with given row and column totals.
_Applied Statistics_ *30*, 91-97.
}
\seealso{
‘chisq.test’
\code{\link{chisq.test}}

‘fisher.exact’ in package ‘exact2x2’ for alternative
\code{fisher.exact} in package \pkg{kexact2x2} for alternative
interpretations of two-sided tests and confidence intervals for 2
by 2 tables.
}

+ 7
- 1
man/pander.desctable.Rd View File

@@ -25,5 +25,11 @@ pander.desctable(x = NULL, digits = 2, justify = "left", missing = "",
\item{...}{unsupported extra arguments directly placed into \code{/dev/null}}
}
\description{
Pander method for desctable
Pander method to output a desctable
}
\details{
Uses \code{pandoc.table}, with some default parameters (\code{digits = 2}, \code{justify = "left"}, \code{missing = ""}, \code{keep.line.breaks = T}, \code{split.tables = Inf}, and \code{emphasize.rownames = F}), that you can override if needed.
}
\seealso{
\code{\link{pandoc.table}}
}

+ 6
- 1
man/statify.Rd View File

@@ -22,8 +22,13 @@ The results for the function applied on the vector, compatible with the format o
}
\description{
Transform a function into a valid stat function for the table
}
\details{
NA values are removed from the data

Applying the function on a numerical vector should return one value

Applying the function on a factor should return nlevels + 1 value, or one value per factor level
See `parse_formula` for the usage for formulaes.

See \code{parse_formula} for the usage for formulaes.
}

+ 6
- 4
man/stats_default.Rd View File

@@ -26,10 +26,12 @@ These functions take a dataframe as argument and return a list of statistcs in t
}
\details{
Already defined are
- stats_default with length, mean/%, sd, med and IQR
- stats_normal with length, mean/% and sd
- stats_nonnormal with length, median/% and IQR
- stats_auto, which picks stats depending of the data
\enumerate{
\item stats_default with length, mean/\%, sd, med and IQR
\item stats_normal with length, mean/\% and sd
\item stats_nonnormal with length, median/\% and IQR
\item stats_auto, which picks stats depending of the data
}

You can define your own automatic functions, as long as they take a dataframe as argument and return a list of functions or formulas defining conditions to use a stat function.
}

+ 1
- 1
man/tests_auto.Rd View File

@@ -18,5 +18,5 @@ A statistical test function
These functions take a variable and a grouping variable as arguments, and return a statistcal test to use, expressed as a single-term formula.
}
\details{
Currently, only tests_auto is defined, and picks between t test, wilcoxon, anova, kruskal-wallis and fisher depending on the number of groups, the type of the variable, the normality and homoskedasticity of the distributions.
Currently, only \code{tests_auto} is defined, and picks between t test, wilcoxon, anova, kruskal-wallis and fisher depending on the number of groups, the type of the variable, the normality and homoskedasticity of the distributions.
}

Loading…
Cancel
Save