Maxime Wack cd1af67483 | пре 4 година | |
---|---|---|
R | пре 4 година | |
man | пре 4 година | |
vignettes | пре 4 година | |
.Rbuildignore | пре 5 година | |
.gitignore | пре 5 година | |
.travis.yml | пре 5 година | |
CRAN-RELEASE | пре 4 година | |
DESCRIPTION | пре 4 година | |
LICENSE | пре 7 година | |
NAMESPACE | пре 7 година | |
NEWS | пре 4 година | |
README.Rmd | пре 4 година | |
README.md | пре 4 година | |
codecov.yml | пре 7 година | |
cran-comments.md | пре 7 година |
Desctable is a comprehensive descriptive and comparative tables generator for R.
Every person doing data analysis has to create tables for descriptive summaries of data (a.k.a. Table.1), or comparative tables.
Many packages, such as the aptly named tableone, address this issue.
However, they often include hard-coded behaviors, have outputs not
easily manipulable with standard R tools, or their syntax are
out-of-style (e.g. the argument order makes them difficult to use with
the pipe (%>%
)).
Enter desctable, a package built with the following objectives in mind:
Install from CRAN with
install.packages("desctable")
or install the development version from github with
devtools::install_github("maximewack/desctable")
# If you were to use DT, load it first
library(DT)
library(desctable)
library(pander) # pander can be loaded at any time
It is recommended to read this manual through its vignette:
vignette("desctable")
desctable uses and exports the pipe (%>%
) operator (from packages
magrittr and dplyr fame), though it is not mandatory to use it.
The single interface to the package is its eponymous desctable
function.
When used on a data.frame, it returns a descriptive table:
iris %>%
desctable()
## N % Mean sd Med IQR
## 1 Sepal.Length 150 NA NA NA 5.80 1.3
## 2 Sepal.Width 150 NA 3.057333 0.4358663 3.00 0.5
## 3 Petal.Length 150 NA NA NA 4.35 3.5
## 4 Petal.Width 150 NA NA NA 1.30 1.5
## 5 Species 150 NA NA NA NA NA
## 6 Species: setosa 50 33.33333 NA NA NA NA
## 7 Species: versicolor 50 33.33333 NA NA NA NA
## 8 Species: virginica 50 33.33333 NA NA NA NA
desctable(mtcars)
## N Mean sd Med IQR
## 1 mpg 32 20.090625 6.0269481 19.200 7.37500
## 2 cyl 32 NA NA 6.000 4.00000
## 3 disp 32 NA NA 196.300 205.17500
## 4 hp 32 NA NA 123.000 83.50000
## 5 drat 32 3.596563 0.5346787 3.695 0.84000
## 6 wt 32 NA NA 3.325 1.02875
## 7 qsec 32 17.848750 1.7869432 17.710 2.00750
## 8 vs 32 NA NA 0.000 1.00000
## 9 am 32 NA NA 0.000 1.00000
## 10 gear 32 NA NA 4.000 1.00000
## 11 carb 32 NA NA 2.000 2.00000
As you can see with these two examples, desctable
describes every
variable, with individual levels for factors. It picks statistical
functions depending on the type and distribution of the variables in the
data, and applies those statistical functions only on the relevant
variables.
The object produced by desctable
is in fact a list of data.frames,
with a “desctable” class.
Methods for reduction to a simple dataframe (as.data.frame
,
automatically used for printing), conversion to markdown (pander
), and
interactive html output with DT (datatable
) are provided:
iris %>%
desctable() %>%
pander()
 | N | % | Mean | sd | Med | IQR |
---|---|---|---|---|---|---|
Sepal.Length | 150 | 5.8 | 1.3 | |||
Sepal.Width | 150 | 3.1 | 0.44 | 3 | 0.5 | |
Petal.Length | 150 | 4.3 | 3.5 | |||
Petal.Width | 150 | 1.3 | 1.5 | |||
Species | 150 | |||||
setosa | 50 | 33 | ||||
versicolor | 50 | 33 | ||||
virginica | 50 | 33 |
You need to load these two packages first (and prior to
desctable for DT) if you want to use them.
Calls to pander
and datatable
with “regular” dataframes will not be
affected by the defaults used in the package, and you can modify these
defaults for desctable objects.
The datatable
wrapper function for desctable objects comes with some
default options and formatting such as freezing the row names and table
header, export buttons, and rounding of values. Both pander
and
datatable
wrapper take a digits argument to set the number of
decimals to show. (pander
uses the digits, justify and missing
arguments of pandoc.table
, whereas datatable
calls prettyNum
with
the digits
parameter, and removes NA
values. You can set digits = NULL
if you want the full table and format it yourself)
desctable
chooses statistical functions for you using this algorithm:
For each variable in the table, compute the relevant statistical
functions in that list (non-applicable functions will safely return
NA
).
How does it work, and how can you adapt this behavior to your needs?
desctable
takes an optional stats argument. This argument can either
be:
This is the default, using the stats_auto
function provided in the
package.
Several other “automatic statistical functions” are defined in this
package: stats_auto
, stats_default
, stats_normal
,
stats_nonnormal
.
You can also provide your own automatic function, which needs to
# Strictly equivalent to iris %>% desctable() %>% pander()
iris %>%
desctable(stats = stats_auto) %>%
pander()
 | N | % | Mean | sd | Med | IQR |
---|---|---|---|---|---|---|
Sepal.Length | 150 | 5.8 | 1.3 | |||
Sepal.Width | 150 | 3.1 | 0.44 | 3 | 0.5 | |
Petal.Length | 150 | 4.3 | 3.5 | |||
Petal.Width | 150 | 1.3 | 1.5 | |||
Species | 150 | |||||
setosa | 50 | 33 | ||||
versicolor | 50 | 33 | ||||
virginica | 50 | 33 |
Statistical functions can be any function defined in R that you want to
use, such as length
or mean
.
The only condition is that they return a single numerical value. One
exception is when they return a vector of length 1 + nlevels(x)
when
applied to factors, as is needed for the percent
function.
As mentioned above, they need to be used inside a named list, such as
mtcars %>%
desctable(stats = list("N" = length, "Mean" = mean, "SD" = sd)) %>%
pander()
 | N | Mean | SD |
---|---|---|---|
mpg | 32 | 20 | 6 |
cyl | 32 | 6.2 | 1.8 |
disp | 32 | 231 | 124 |
hp | 32 | 147 | 69 |
drat | 32 | 3.6 | 0.53 |
wt | 32 | 3.2 | 0.98 |
qsec | 32 | 18 | 1.8 |
vs | 32 | 0.44 | 0.5 |
am | 32 | 0.41 | 0.5 |
gear | 32 | 3.7 | 0.74 |
carb | 32 | 2.8 | 1.6 |
The names will be used as column headers in the resulting table, and the
functions will be applied safely on the variables (errors return NA
,
and for factors the function will be used on individual levels).
Several convenience functions are included in this package. For
statistical function we have: percent
, which prints percentages of
levels in a factor, and IQR
which re-implements stats::IQR
but works
better with NA
values.
Be aware that all functions will be used on variables stripped of
their NA
values!
This is necessary for most statistical functions to be useful, and makes
N (length
) show only the number of observations in the dataset for
each variable.
The general form of these formulas is
predicate_function ~ stat_function_if_TRUE | stat_function_if_FALSE
A predicate function is any function returning either TRUE
or FALSE
when applied on a vector, such as is.factor
, is.numeric
, and
is.logical
.
desctable provides the is.normal
function to test for normality
(it is equivalent to `length(na.omit(x)) > 30 & shapiro.test(x)$p.value
.1`).
The FALSE option can be omitted and NA
will be produced if the
condition in the predicate is not met.
These statements can be nested using parentheses.
For example:
is.factor ~ percent | (is.normal ~ mean)
will either use percent
if the variable is a factor, or mean
if and
only if the variable is normally distributed.
You can mix “bare” statistical functions and formulas in the list defining the statistics you want to use in your table.
iris %>%
desctable(stats = list("N" = length,
"%/Mean" = is.factor ~ percent | (is.normal ~ mean),
"Median" = is.normal ~ NA | median)) %>%
pander()
 | N | %/Mean | Median |
---|---|---|---|
Sepal.Length | 150 | 5.8 | |
Sepal.Width | 150 | 3.1 | |
Petal.Length | 150 | 4.3 | |
Petal.Width | 150 | 1.3 | |
Species | 150 | ||
setosa | 50 | 33 | |
versicolor | 50 | 33 | |
virginica | 50 | 33 |
For reference, here is the body of the stats_auto
function in the
package:
## function (data)
## {
## shapiro <- data %>% Filter(f = is.numeric) %>% lapply(is.normal) %>%
## unlist
## if (length(shapiro) == 0) {
## normal <- F
## nonnormal <- F
## }
## else {
## normal <- any(shapiro)
## nonnormal <- any(!shapiro)
## }
## fact <- any(data %>% lapply(is.factor) %>% unlist)
## if (fact & normal & !nonnormal)
## stats_normal(data)
## else if (fact & !normal & nonnormal)
## stats_nonnormal(data)
## else if (fact & !normal & !nonnormal)
## list(N = length, `%` = percent)
## else if (!fact & normal & nonnormal)
## list(N = length, Mean = is.normal ~ mean, sd = is.normal ~
## sd, Med = stats::median, IQR = is.factor ~ NA | IQR)
## else if (!fact & normal & !nonnormal)
## list(N = length, Mean = mean, sd = stats::sd)
## else if (!fact & !normal & nonnormal)
## list(N = length, Med = stats::median, IQR = IQR)
## else stats_default(data)
## }
## <bytecode: 0x0000000018eb1900>
## <environment: namespace:desctable>
It is often the case that variable names are not “pretty” enough to be
used as-is in a table.
Although you could still edit the variable labels in the table
afterwards using subsetting or string replacement functions, it is
possible to mention a labels argument.
The labels argument is a named character vector associating variable
names and labels.
You don’t need to provide labels for all the variables, and extra labels
will be silently discarded. This allows you to define a “global” labels
vector and use it for every table even after variable selections.
mtlabels <- c(mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "¼ mile time",
vs = "V/S",
am = "Transmission",
gear = "Number of forward gears",
carb = "Number of carburetors")
mtcars %>%
dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
desctable(labels = mtlabels) %>%
pander()
 | N | % | Mean | sd | Med | IQR |
---|---|---|---|---|---|---|
Miles/(US) gallon | 32 | 20 | 6 | 19 | 7.4 | |
Number of cylinders | 32 | 6 | 4 | |||
Displacement (cu.in.) | 32 | 196 | 205 | |||
Gross horsepower | 32 | 123 | 84 | |||
Rear axle ratio | 32 | 3.6 | 0.53 | 3.7 | 0.84 | |
Weight (1000 lbs) | 32 | 3.3 | 1 | |||
¼ mile time | 32 | 18 | 1.8 | 18 | 2 | |
V/S | 32 | 0 | 1 | |||
Transmission | 32 | |||||
Automatic | 19 | 59 | ||||
Manual | 13 | 41 | ||||
Number of forward gears | 32 | 4 | 1 | |||
Number of carburetors | 32 | 2 | 2 |
Creating a comparative table (between groups defined by a factor) using
desctable
is as easy as creating a descriptive table.
It uses the well known group_by
function from dplyr:
iris %>%
group_by(Species) %>%
desctable() -> iris_by_Species
iris_by_Species
## Species: setosa (n=50) / N Species: setosa (n=50) / Mean
## 1 Sepal.Length 50 5.006
## 2 Sepal.Width 50 3.428
## 3 Petal.Length 50 NA
## 4 Petal.Width 50 NA
## Species: setosa (n=50) / sd Species: setosa (n=50) / Med
## 1 0.3524897 5.0
## 2 0.3790644 3.4
## 3 NA 1.5
## 4 NA 0.2
## Species: setosa (n=50) / IQR Species: versicolor (n=50) / N1
## 1 0.400 50
## 2 0.475 50
## 3 0.175 50
## 4 0.100 50
## Species: versicolor (n=50) / Mean1 Species: versicolor (n=50) / sd1
## 1 5.936 0.5161711
## 2 2.770 0.3137983
## 3 4.260 0.4699110
## 4 NA NA
## Species: versicolor (n=50) / Med1 Species: versicolor (n=50) / IQR1
## 1 5.90 0.700
## 2 2.80 0.475
## 3 4.35 0.600
## 4 1.30 0.300
## Species: virginica (n=50) / N2 Species: virginica (n=50) / Mean2
## 1 50 6.588
## 2 50 2.974
## 3 50 5.552
## 4 50 NA
## Species: virginica (n=50) / sd2 Species: virginica (n=50) / Med2
## 1 0.6358796 6.50
## 2 0.3224966 3.00
## 3 0.5518947 5.55
## 4 NA 2.00
## Species: virginica (n=50) / IQR2 tests / p
## 1 0.675 1.505059e-28
## 2 0.375 4.492017e-17
## 3 0.775 4.803974e-29
## 4 0.500 3.261796e-29
## tests / test
## 1 . %>% oneway.test(var.equal = F)
## 2 . %>% oneway.test(var.equal = T)
## 3 kruskal.test
## 4 kruskal.test
The result is a table containing a descriptive subtable for each level of the grouping factor (the statistical functions rules are applied to each subtable independently), with the statistical tests performed, and their p values.
When displayed as a flat dataframe, the grouping header appears in each variable.
You can also see the grouping headers by inspecting the resulting object, which is a deep list of dataframes, each dataframe named after the grouping factor and its levels (with sample size for each).
str(iris_by_Species)
## List of 5
## $ Variables :'data.frame': 4 obs. of 1 variable:
## ..$ Variables: chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## $ Species: setosa (n=50) :'data.frame': 4 obs. of 5 variables:
## ..$ N : int [1:4] 50 50 50 50
## ..$ Mean: num [1:4] 5.01 3.43 NA NA
## ..$ sd : num [1:4] 0.352 0.379 NA NA
## ..$ Med : num [1:4] 5 3.4 1.5 0.2
## ..$ IQR : num [1:4] 0.4 0.475 0.175 0.1
## $ Species: versicolor (n=50):'data.frame': 4 obs. of 5 variables:
## ..$ N : int [1:4] 50 50 50 50
## ..$ Mean: num [1:4] 5.94 2.77 4.26 NA
## ..$ sd : num [1:4] 0.516 0.314 0.47 NA
## ..$ Med : num [1:4] 5.9 2.8 4.35 1.3
## ..$ IQR : num [1:4] 0.7 0.475 0.6 0.3
## $ Species: virginica (n=50) :'data.frame': 4 obs. of 5 variables:
## ..$ N : int [1:4] 50 50 50 50
## ..$ Mean: num [1:4] 6.59 2.97 5.55 NA
## ..$ sd : num [1:4] 0.636 0.322 0.552 NA
## ..$ Med : num [1:4] 6.5 3 5.55 2
## ..$ IQR : num [1:4] 0.675 0.375 0.775 0.5
## $ tests :'data.frame': 4 obs. of 2 variables:
## ..$ p : num [1:4] 1.51e-28 4.49e-17 4.80e-29 3.26e-29
## ..$ test: chr [1:4] ". %>% oneway.test(var.equal = F)" ". %>% oneway.test(var.equal = T)" "kruskal.test" "kruskal.test"
## - attr(*, "class")= chr "desctable"
You can specify groups based on any variable, not only factors:
# With pander output
mtcars %>%
group_by(cyl) %>%
desctable() %>%
pander()
 | cyl: 4 (n=11) N |
Med |
IQR |
cyl: 6 (n=7) N1 |
Med1 |
IQR1 |
cyl: 8 (n=14) N2 |
Med2 |
IQR2 |
tests p |
test |
---|---|---|---|---|---|---|---|---|---|---|---|
mpg | 11 | 26 | 7.6 | 7 | 20 | 2.4 | 14 | 15 | 1.8 | 2.6e-06 | kruskal.test |
disp | 11 | 108 | 42 | 7 | 168 | 36 | 14 | 350 | 88 | 1.6e-06 | kruskal.test |
hp | 11 | 91 | 30 | 7 | 110 | 13 | 14 | 192 | 65 | 3.3e-06 | kruskal.test |
drat | 11 | 4.1 | 0.35 | 7 | 3.9 | 0.56 | 14 | 3.1 | 0.15 | 0.00075 | kruskal.test |
wt | 11 | 2.2 | 0.74 | 7 | 3.2 | 0.62 | 14 | 3.8 | 0.48 | 1.1e-05 | kruskal.test |
qsec | 11 | 19 | 1.4 | 7 | 18 | 2.4 | 14 | 17 | 1.5 | 0.0062 | kruskal.test |
vs | 11 | 1 | 0 | 7 | 1 | 1 | 14 | 0 | 0 | 3.2e-05 | kruskal.test |
am | 11 | 1 | 0.5 | 7 | 0 | 1 | 14 | 0 | 0 | 0.014 | kruskal.test |
gear | 11 | 4 | 0 | 7 | 4 | 0.5 | 14 | 3 | 0 | 0.0062 | kruskal.test |
carb | 11 | 2 | 1 | 7 | 4 | 1.5 | 14 | 3.5 | 1.8 | 0.0017 | kruskal.test |
Also with conditions:
iris %>%
group_by(Petal.Length > 5) %>%
desctable() %>%
pander()
 | Petal.Length > 5: FALSE (n=108) N |
% |
Mean |
sd |
Med |
IQR |
Petal.Length > 5: TRUE (n=42) N1 |
%1 |
Mean1 |
sd1 |
Med1 |
IQR1 |
tests p |
test |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 108 | 5.5 | 1 | 42 | 6.7 | 0.85 | 1.6e-15 | wilcox.test | ||||||
Sepal.Width | 108 | 3.1 | 0.48 | 3 | 0.6 | 42 | 3 | 0.4 | 0.69 | wilcox.test | ||||
Petal.Length | 108 | 3.5 | 3 | 42 | 5.6 | 0.67 | 2.1e-21 | wilcox.test | ||||||
Petal.Width | 108 | 1 | 1.2 | 42 | 2.1 | 0.28 | 2.1 | 0.47 | 1.6e-19 | wilcox.test | ||||
Species | 108 | 42 | 2.5e-26 | fisher.test | ||||||||||
setosa | 50 | 46 | 0 | 0 | ||||||||||
versicolor | 49 | 45 | 1 | 2.4 | ||||||||||
virginica | 9 | 8.3 | 41 | 98 |
And even on multiple nested groups:
mtcars %>%
dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
group_by(vs, am, cyl) %>%
desctable() %>%
pander()
 | vs: 0 (n=18) am: Automatic (n=12) cyl: 8 (n=12) N |
Med |
IQR |
tests p |
test |
am: Manual (n=6) cyl: 4 (n=1) N3 |
Med3 |
IQR3 |
cyl: 6 (n=3) N1 |
Med1 |
IQR1 |
cyl: 8 (n=2) N2 |
Med2 |
IQR2 |
tests p1 |
test1 |
vs: 1 (n=14) am: Automatic (n=7) cyl: 4 (n=3) N4 |
Med4 |
IQR4 |
cyl: 6 (n=4) N11 |
Med11 |
IQR11 |
tests p2 |
test2 |
am: Manual (n=7) cyl: 4 (n=7) N21 |
Med21 |
IQR21 |
tests p11 |
test11 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mpg | 12 | 15 | 2.6 | no.test | 1 | 26 | 0 | 3 | 21 | 0.65 | 2 | 15 | 0.4 | 0.11 | kruskal.test | 3 | 23 | 1.5 | 4 | 19 | 1.7 | 0.057 | wilcox.test | 7 | 30 | 6.3 | no.test | ||
disp | 12 | 355 | 113 | no.test | 1 | 120 | 0 | 3 | 160 | 7.5 | 2 | 326 | 25 | 0.11 | kruskal.test | 3 | 141 | 13 | 4 | 196 | 66 | 0.05 | wilcox.test | 7 | 79 | 24 | no.test | ||
hp | 12 | 180 | 44 | no.test | 1 | 91 | 0 | 3 | 110 | 32 | 2 | 300 | 36 | 0.11 | kruskal.test | 3 | 95 | 18 | 4 | 116 | 14 | 0.05 | wilcox.test | 7 | 66 | 36 | no.test | ||
drat | 12 | 3.1 | 0.11 | no.test | 1 | 4.4 | 0 | 3 | 3.9 | 0.14 | 2 | 3.9 | 0.34 | 0.33 | kruskal.test | 3 | 3.7 | 0.11 | 4 | 3.5 | 0.92 | 0.85 | wilcox.test | 7 | 4.1 | 0.2 | no.test | ||
wt | 12 | 3.8 | 0.81 | no.test | 1 | 2.1 | 0 | 3 | 2.8 | 0.13 | 2 | 3.4 | 0.2 | 0.12 | kruskal.test | 3 | 3.1 | 0.36 | 4 | 3.4 | 0.061 | 0.05 | wilcox.test | 7 | 1.9 | 0.53 | no.test | ||
qsec | 12 | 17 | 0.67 | no.test | 1 | 17 | 0 | 3 | 16 | 0.76 | 2 | 15 | 0.05 | 0.17 | kruskal.test | 3 | 20 | 1.4 | 4 | 19 | 0.89 | 0.23 | wilcox.test | 7 | 19 | 0.62 | no.test | ||
gear | 12 | 3 | 0 | no.test | 1 | 5 | 0 | 3 | 4 | 0.5 | 2 | 5 | 0 | 0.29 | kruskal.test | 3 | 4 | 0.5 | 4 | 3.5 | 1 | 0.84 | wilcox.test | 7 | 4 | 0 | no.test | ||
carb | 12 | 3 | 2 | no.test | 1 | 2 | 0 | 3 | 4 | 1 | 2 | 6 | 2 | 0.26 | kruskal.test | 3 | 2 | 0.5 | 4 | 2.5 | 3 | 0.85 | wilcox.test | 7 | 1 | 1 | no.test |
In the case of nested groups (a.k.a. sub-group analysis), statistical tests are performed only between the groups of the deepest grouping level.
Statistical tests are automatically selected depending on the data and the grouping factor.
desctable
choses the statistical tests using the following algorithm:
fisher.test
no.test
(which does nothing)var.test
> .1) and normality of distribution in both groups,
use t.test(var.equal = T)
var.test
< .1) but normality of distribution in both groups,
use t.test(var.equal = F)
wilcox.test
bartlett.test
> .1) and normality of distribution in all
groups, use oneway.test(var.equal = T)
bartlett.test
< .1) but normality of distribution in all
groups, use oneway.test(var.equal = F)
kruskal.test
But what if you want to pick a specific test for a specific variable, or change all the tests altogether?
desctable
takes an optional tests argument. This argument can either
be
This is the default, using the tests_auto
function provided in the
package.
You can also provide your own automatic function, which needs to
This function will be used on every variable and every grouping factor to determine the appropriate test.
# Strictly equivalent to iris %>% group_by(Species) %>% desctable %>% pander
iris %>%
group_by(Species) %>%
desctable(tests = tests_auto) %>%
pander()
 | Species: setosa (n=50) N |
Mean |
sd |
Med |
IQR |
Species: versicolor (n=50) N1 |
Mean1 |
sd1 |
Med1 |
IQR1 |
Species: virginica (n=50) N2 |
Mean2 |
sd2 |
Med2 |
IQR2 |
tests p |
test |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 50 | 5 | 0.35 | 5 | 0.4 | 50 | 5.9 | 0.52 | 5.9 | 0.7 | 50 | 6.6 | 0.64 | 6.5 | 0.67 | 1.5e-28 | . %>% oneway.test(var.equal = F) |
Sepal.Width | 50 | 3.4 | 0.38 | 3.4 | 0.48 | 50 | 2.8 | 0.31 | 2.8 | 0.48 | 50 | 3 | 0.32 | 3 | 0.38 | 4.5e-17 | . %>% oneway.test(var.equal = T) |
Petal.Length | 50 | 1.5 | 0.18 | 50 | 4.3 | 0.47 | 4.3 | 0.6 | 50 | 5.6 | 0.55 | 5.5 | 0.78 | 4.8e-29 | kruskal.test | ||
Petal.Width | 50 | 0.2 | 0.1 | 50 | 1.3 | 0.3 | 50 | 2 | 0.5 | 3.3e-29 | kruskal.test |
You can provide a named list of statistical functions, but here the mechanism is a bit different from the stats argument.
The list must contain either .auto
or .default
.
.auto
needs to be an automatic function, such as tests_auto
. It
will be used by default on all variables to select a test.default
needs to be a single-term formula containing a
statistical test function that will be used on all variablesYou can also provide overrides to use specific tests for specific
variables.
This is done using list items named as the variable and containing a
single-term formula function.
iris %>%
group_by(Petal.Length > 5) %>%
desctable(tests = list(.auto = tests_auto,
Species = ~chisq.test)) %>%
pander()
 | Petal.Length > 5: FALSE (n=108) N |
% |
Mean |
sd |
Med |
IQR |
Petal.Length > 5: TRUE (n=42) N1 |
%1 |
Mean1 |
sd1 |
Med1 |
IQR1 |
tests p |
test |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 108 | 5.5 | 1 | 42 | 6.7 | 0.85 | 1.6e-15 | wilcox.test | ||||||
Sepal.Width | 108 | 3.1 | 0.48 | 3 | 0.6 | 42 | 3 | 0.4 | 0.69 | wilcox.test | ||||
Petal.Length | 108 | 3.5 | 3 | 42 | 5.6 | 0.67 | 2.1e-21 | wilcox.test | ||||||
Petal.Width | 108 | 1 | 1.2 | 42 | 2.1 | 0.28 | 2.1 | 0.47 | 1.6e-19 | wilcox.test | ||||
Species | 108 | 42 | 2.7e-24 | chisq.test | ||||||||||
setosa | 50 | 46 | 0 | 0 | ||||||||||
versicolor | 49 | 45 | 1 | 2.4 | ||||||||||
virginica | 9 | 8.3 | 41 | 98 |
mtcars %>%
dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
group_by(am) %>%
desctable(tests = list(.default = ~wilcox.test,
mpg = ~t.test)) %>%
pander()
 | am: Automatic (n=19) N |
Med |
IQR |
am: Manual (n=13) N1 |
Med1 |
IQR1 |
tests p |
test |
---|---|---|---|---|---|---|---|---|
mpg | 19 | 17 | 4.2 | 13 | 23 | 9.4 | 0.0014 | t.test |
cyl | 19 | 8 | 2 | 13 | 4 | 2 | 0.0039 | wilcox.test |
disp | 19 | 276 | 164 | 13 | 120 | 81 | 0.00055 | wilcox.test |
hp | 19 | 175 | 76 | 13 | 109 | 47 | 0.046 | wilcox.test |
drat | 19 | 3.1 | 0.63 | 13 | 4.1 | 0.37 | 0.00014 | wilcox.test |
wt | 19 | 3.5 | 0.41 | 13 | 2.3 | 0.84 | 4.3e-05 | wilcox.test |
qsec | 19 | 18 | 2 | 13 | 17 | 2.1 | 0.27 | wilcox.test |
vs | 19 | 0 | 1 | 13 | 1 | 1 | 0.36 | wilcox.test |
gear | 19 | 3 | 0 | 13 | 4 | 1 | 7.6e-06 | wilcox.test |
carb | 19 | 3 | 2 | 13 | 2 | 3 | 0.74 | wilcox.test |
You might wonder why the formula expression. That is needed to capture the test name, and to provide it in the resulting table.
As with statistical functions, any statistical test function defined in R can be used.
The conditions are that the function
variable ~ grouping_variable
) as a first
positional argument (as is the case with most tests, like t.test
),
andp.value
element.Several convenience function are provided: formula versions for
chisq.test
and fisher.test
using generic S3 methods (thus the
behavior of standard calls to chisq.test
and fisher.test
are not
modified), and ANOVA
, a partial application of oneway.test
with
parameter var.equal = T.
In the stats argument, you can not only feed function names, but even
arbitrary function definitions, functional sequences (a feature provided
with the pipe (%>%
)), or partial applications (with the purrr
package):
mtcars %>%
desctable(stats = list("N" = length,
"Sum of squares" = function(x) sum(x^2),
"Q1" = . %>% quantile(prob = .25),
"Q3" = purrr::partial(quantile, probs = .75))) %>%
pander()
 | N | Sum of squares | Q1 | Q3 |
---|---|---|---|---|
mpg | 32 | 14042 | 15 | 23 |
cyl | 32 | 1324 | 4 | 8 |
disp | 32 | 2179627 | 121 | 326 |
hp | 32 | 834278 | 96 | 180 |
drat | 32 | 423 | 3.1 | 3.9 |
wt | 32 | 361 | 2.6 | 3.6 |
qsec | 32 | 10293 | 17 | 19 |
vs | 32 | 14 | 0 | 1 |
am | 32 | 13 | 0 | 1 |
gear | 32 | 452 | 3 | 4 |
carb | 32 | 334 | 2 | 4 |
In the tests arguments, you can also provide function definitions, functional sequences, and partial applications in the formulas:
iris %>%
group_by(Species) %>%
desctable(tests = list(.auto = tests_auto,
Sepal.Width = ~function(f) oneway.test(f, var.equal = F),
Petal.Length = ~. %>% oneway.test(var.equal = T),
Sepal.Length = ~purrr::partial(oneway.test, var.equal = T))) %>%
pander()
 | Species: setosa (n=50) N |
Mean |
sd |
Med |
IQR |
Species: versicolor (n=50) N1 |
Mean1 |
sd1 |
Med1 |
IQR1 |
Species: virginica (n=50) N2 |
Mean2 |
sd2 |
Med2 |
IQR2 |
tests p |
test |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 50 | 5 | 0.35 | 5 | 0.4 | 50 | 5.9 | 0.52 | 5.9 | 0.7 | 50 | 6.6 | 0.64 | 6.5 | 0.67 | 1.7e-31 | purrr::partial(oneway.test, var.equal = T) |
Sepal.Width | 50 | 3.4 | 0.38 | 3.4 | 0.48 | 50 | 2.8 | 0.31 | 2.8 | 0.48 | 50 | 3 | 0.32 | 3 | 0.38 | 1.4e-14 | function(f) oneway.test(f, var.equal = F) |
Petal.Length | 50 | 1.5 | 0.18 | 50 | 4.3 | 0.47 | 4.3 | 0.6 | 50 | 5.6 | 0.55 | 5.5 | 0.78 | 2.9e-91 | . %>% oneway.test(var.equal = T) | ||
Petal.Width | 50 | 0.2 | 0.1 | 50 | 1.3 | 0.3 | 50 | 2 | 0.5 | 3.3e-29 | kruskal.test |
This allows you to modulate the behavior of desctable
in every detail,
such as using paired tests, or non htest tests.
# This is a contrived example, which would be better solved with a dedicated function
library(survival)
bladder$surv <- Surv(bladder$stop, bladder$event)
bladder %>%
group_by(rx) %>%
desctable(tests = list(.default = ~wilcox.test,
surv = ~. %>% survdiff %>% .$chisq %>% pchisq(1, lower.tail = F) %>% list(p.value = .))) %>%
pander()
 | rx: 1 (n=188) N |
Med |
IQR |
rx: 2 (n=152) N1 |
Med1 |
IQR1 |
tests p |
test |
---|---|---|---|---|---|---|---|---|
id | 188 | 24 | 24 | 152 | 66 | 19 | 1.3e-56 | wilcox.test |
number | 188 | 1 | 2 | 152 | 1 | 2 | 0.62 | wilcox.test |
size | 188 | 1 | 2 | 152 | 1 | 2 | 0.32 | wilcox.test |
stop | 188 | 23 | 20 | 152 | 25 | 28 | 0.17 | wilcox.test |
event | 188 | 0 | 1 | 152 | 0 | 1 | 0.02 | wilcox.test |
enum | 188 | 2.5 | 1.5 | 152 | 2.5 | 1.5 | 1 | wilcox.test |
surv | 188 | 152 | 0.023 | . %>% survdiff %>% .$chisq %>% pchisq(1, lower.tail = F) %>% list(p.value = .) |