diff --git a/inst/doc/desctable.html b/inst/doc/desctable.html index 54ac210..0743ac5 100644 --- a/inst/doc/desctable.html +++ b/inst/doc/desctable.html @@ -4,7 +4,7 @@
- + @@ -109,42 +109,42 @@ code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Infdesctable uses and exports the pipe (%>%
) operator (from packages magrittr and dplyr fame), though it is not mandatory to use it.
The single interface to the package is its eponymous desctable
function.
When used on a data.frame, it returns a descriptive table:
-iris %>%
+iris %>%
desctable
-## N Mean/% sd Med IQR
-## Sepal.Length 150 NA NA 5.80 1.3
-## Sepal.Width 150 3.057333 0.4358663 NA NA
-## Petal.Length 150 NA NA 4.35 3.5
-## Petal.Width 150 NA NA 1.30 1.5
-## Species 150 NA NA NA NA
-## Species: setosa 50 33.333333 NA NA NA
-## Species: versicolor 50 33.333333 NA NA NA
-## Species: virginica 50 33.333333 NA NA NA
+## N Mean/% sd Med IQR
+## 1 Sepal.Length 150 NA NA 5.80 1.3
+## 2 Sepal.Width 150 3.057333 0.4358663 NA NA
+## 3 Petal.Length 150 NA NA 4.35 3.5
+## 4 Petal.Width 150 NA NA 1.30 1.5
+## 5 Species 150 NA NA NA NA
+## 6 Species: setosa 50 33.333333 NA NA NA
+## 7 Species: versicolor 50 33.333333 NA NA NA
+## 8 Species: virginica 50 33.333333 NA NA NA
desctable(mtcars)
-## N Mean sd Med IQR
-## mpg 32 20.090625 6.0269481 NA NA
-## cyl 32 NA NA 6.000 4.00000
-## disp 32 NA NA 196.300 205.17500
-## hp 32 NA NA 123.000 83.50000
-## drat 32 3.596563 0.5346787 NA NA
-## wt 32 NA NA 3.325 1.02875
-## qsec 32 17.848750 1.7869432 NA NA
-## vs 32 NA NA 0.000 1.00000
-## am 32 NA NA 0.000 1.00000
-## gear 32 NA NA 4.000 1.00000
-## carb 32 NA NA 2.000 2.00000
+## N Mean sd Med IQR
+## 1 mpg 32 20.090625 6.0269481 NA NA
+## 2 cyl 32 NA NA 6.000 4.00000
+## 3 disp 32 NA NA 196.300 205.17500
+## 4 hp 32 NA NA 123.000 83.50000
+## 5 drat 32 3.596563 0.5346787 NA NA
+## 6 wt 32 NA NA 3.325 1.02875
+## 7 qsec 32 17.848750 1.7869432 NA NA
+## 8 vs 32 NA NA 0.000 1.00000
+## 9 am 32 NA NA 0.000 1.00000
+## 10 gear 32 NA NA 4.000 1.00000
+## 11 carb 32 NA NA 2.000 2.00000
As you can see with these two examples, desctable
describes every variable, with individual levels for factors. It picks statistical functions depending on the type and distribution of the variables in the data, and applies those statistical functions only on the relevant variables.
The object produced by desctable
is in fact a list of data.frames, with a “desctable” class.
Methods for reduction to a simple dataframe (as.data.frame
, automatically used for printing), conversion to markdown (pander
), and interactive html output with DT (datatable
) are provided:
iris %>%
- desctable %>%
+iris %>%
+ desctable %>%
pander
-
+
-
+
@@ -228,11 +228,11 @@ Methods for reduction to a simple dataframe (as.data.frame
, automat
-mtcars %>%
- desctable %>%
+mtcars %>%
+ desctable %>%
datatable
-
-
You need to load these two packages first (and prior to desctable for DT) if you want to use them.
+
+
You need to load these two packages first (and prior to desctable for DT) if you want to use them.
Calls to pander
and datatable
with “regular” dataframes will not be affected by the defaults used in the package, and you can modify these defaults for desctable objects.
Subsequent outputs in this vignette section will use DT. The datatable
wrapper function for desctable objects comes with some default options and formatting such as freezing the row names and table header, export buttons, and rounding of values. Both pander
and datatable
wrapper take a digits argument to set the number of decimals to show. (pander
uses the digits, justify and missing arguments of pandoc.table
, whereas datatable
calls prettyNum
with the digits
parameter, and removes NA
values. You can set digits = NULL
if you want the full table and format it yourself)
@@ -266,22 +266,22 @@ Methods for reduction to a simple dataframe (as.data.frame
, automat
return a named list of statistical functions to use, as defined in the subsequent paragraphs.
# Strictly equivalent to iris %>% desctable %>% datatable
-iris %>%
- desctable(stats = stats_auto) %>%
+iris %>%
+ desctable(stats = stats_auto) %>%
datatable
-
-
+
+
Statistical functions
Statistical functions can be any function defined in R that you want to use, such as length
or mean
.
The only condition is that they return a single numerical value. One exception is when they return a vector of length 1 + nlevels(x)
when applied to factors, as is needed for the percent
function.
As mentionned above, they need to be used inside a named list, such as
-mtcars %>%
- desctable(stats = list("N" = length, "Mean" = mean, "SD" = sd)) %>%
+mtcars %>%
+ desctable(stats = list("N" = length, "Mean" = mean, "SD" = sd)) %>%
datatable
-
-
+
+
The names will be used as column headers in the resulting table, and the functions will be applied safely on the variables (errors return NA
, and for factors the function will be used on individual levels).
Several convenience functions are included in this package. For statistical function we have: percent
, which prints percentages of levels in a factor, and IQR
which re-implements stats::IQR
but works better with NA
values.
Be aware that all functions will be used on variables stripped of their NA
values!
@@ -290,7 +290,7 @@ This is necessary for most statistical functions to be useful, and makes
Conditional formulas
The general form of these formulas is
-predicate_function ~ stat_function_if_TRUE | stat_function_if_FALSE
+predicate_function ~ stat_function_if_TRUE | stat_function_if_FALSE
A predicate function is any function returning either TRUE
or FALSE
when applied on a vector, such as is.factor
, is.numeric
, and is.logical
.
desctable provides the is.normal
function to test for normality (it is equivalent to length(na.omit(x)) > 30 & shapiro.test(x)$p.value > .1
).
The FALSE option can be omitted and NA
will be produced if the condition in the predicate is not met.
@@ -299,13 +299,13 @@ For example:
is.factor ~ percent | (is.normal ~ mean)
will either use percent
if the variable is a factor, or mean
if and only if the variable is normally distributed.
You can mix “bare” statistical functions and formulas in the list defining the statistics you want to use in your table.
-iris %>%
+iris %>%
desctable(stats = list("N" = length,
- "%/Mean" = is.factor ~ percent | (is.normal ~ mean),
- "Median" = is.normal ~ NA | median)) %>%
+ "%/Mean" = is.factor ~ percent | (is.normal ~ mean),
+ "Median" = is.normal ~ NA | median)) %>%
datatable
-
-
+
+
For reference, here is the body of the stats_auto
function in the package:
## function (data)
## {
@@ -356,12 +356,12 @@ You don’t need to provide labels for all the variables, and extra labels will
gear = "Number of forward gears",
carb = "Number of carburetors")
-mtcars %>%
- dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
- desctable(labels = mtlabels) %>%
+mtcars %>%
+ dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
+ desctable(labels = mtlabels) %>%
datatable
-
-
+
+
@@ -372,71 +372,51 @@ mtcars %>%
Simple usage
Creating a comparative table (between groups defined by a factor) using desctable
is as easy as creating a descriptive table.
It uses the well known group_by
function from dplyr:
-iris %>%
- group_by(Species) %>%
+iris %>%
+ group_by(Species) %>%
desctable -> iris_by_Species
iris_by_Species
-## Species: setosa (n=50) / N Species: setosa (n=50) / Mean
-## Sepal.Length 50 5.006
-## Sepal.Width 50 3.428
-## Petal.Length 50 NA
-## Petal.Width 50 NA
-## Species: setosa (n=50) / sd Species: setosa (n=50) / Med
-## Sepal.Length 0.3524897 NA
-## Sepal.Width 0.3790644 NA
-## Petal.Length NA 1.5
-## Petal.Width NA 0.2
-## Species: setosa (n=50) / IQR Species: versicolor (n=50) / N
-## Sepal.Length NA 50
-## Sepal.Width NA 50
-## Petal.Length 0.175 50
-## Petal.Width 0.100 50
-## Species: versicolor (n=50) / Mean
-## Sepal.Length 5.936
-## Sepal.Width 2.770
-## Petal.Length 4.260
-## Petal.Width NA
-## Species: versicolor (n=50) / sd
-## Sepal.Length 0.5161711
-## Sepal.Width 0.3137983
-## Petal.Length 0.4699110
-## Petal.Width NA
-## Species: versicolor (n=50) / Med
-## Sepal.Length NA
-## Sepal.Width NA
-## Petal.Length NA
-## Petal.Width 1.3
-## Species: versicolor (n=50) / IQR
-## Sepal.Length NA
-## Sepal.Width NA
-## Petal.Length NA
-## Petal.Width 0.3
-## Species: virginica (n=50) / N
-## Sepal.Length 50
-## Sepal.Width 50
-## Petal.Length 50
-## Petal.Width 50
-## Species: virginica (n=50) / Mean
-## Sepal.Length 6.588
-## Sepal.Width 2.974
-## Petal.Length 5.552
-## Petal.Width NA
-## Species: virginica (n=50) / sd
-## Sepal.Length 0.6358796
-## Sepal.Width 0.3224966
-## Petal.Length 0.5518947
-## Petal.Width NA
-## Species: virginica (n=50) / Med
-## Sepal.Length NA
-## Sepal.Width NA
-## Petal.Length NA
-## Petal.Width 2
-## Species: virginica (n=50) / IQR tests / p tests / test
-## Sepal.Length NA 8.918734e-22 kruskal.test
-## Sepal.Width NA 4.492017e-17 ANOVA
-## Petal.Length NA 4.803974e-29 kruskal.test
-## Petal.Width 0.5 3.261796e-29 kruskal.test
+## Species: setosa (n=50) / N Species: setosa (n=50) / Mean
+## 1 Sepal.Length 50 5.006
+## 2 Sepal.Width 50 3.428
+## 3 Petal.Length 50 NA
+## 4 Petal.Width 50 NA
+## Species: setosa (n=50) / sd Species: setosa (n=50) / Med
+## 1 0.3524897 NA
+## 2 0.3790644 NA
+## 3 NA 1.5
+## 4 NA 0.2
+## Species: setosa (n=50) / IQR Species: versicolor (n=50) / N
+## 1 NA 50
+## 2 NA 50
+## 3 0.175 50
+## 4 0.100 50
+## Species: versicolor (n=50) / Mean Species: versicolor (n=50) / sd
+## 1 5.936 0.5161711
+## 2 2.770 0.3137983
+## 3 4.260 0.4699110
+## 4 NA NA
+## Species: versicolor (n=50) / Med Species: versicolor (n=50) / IQR
+## 1 NA NA
+## 2 NA NA
+## 3 NA NA
+## 4 1.3 0.3
+## Species: virginica (n=50) / N Species: virginica (n=50) / Mean
+## 1 50 6.588
+## 2 50 2.974
+## 3 50 5.552
+## 4 50 NA
+## Species: virginica (n=50) / sd Species: virginica (n=50) / Med
+## 1 0.6358796 NA
+## 2 0.3224966 NA
+## 3 0.5518947 NA
+## 4 NA 2
+## Species: virginica (n=50) / IQR tests / p tests / test
+## 1 NA 8.918734e-22 kruskal.test
+## 2 NA 4.492017e-17 ANOVA
+## 3 NA 4.803974e-29 kruskal.test
+## 4 0.5 3.261796e-29 kruskal.test
The result is a table containing a descriptive subtable for each level of the grouping factor (the statistical functions rules are applied to each subtable independently), with the statistical tests performed, and their p values.
When displayed as a flat dataframe, the grouping header appears in each variable.
You can also see the grouping headers by inspecting the resulting object, which is a deep list of dataframes, each dataframe named after the grouping factor and its levels (with sample size for each).
@@ -468,13 +448,13 @@ iris_by_Species
## - attr(*, "class")= chr "desctable"
You can specify groups based on any variable, not only factors:
# With pander output
-mtcars %>%
- group_by(cyl) %>%
- desctable %>%
+mtcars %>%
+ group_by(cyl) %>%
+ desctable %>%
pander
-
+
@@ -489,7 +469,7 @@ mtcars %>%
-
+
cyl: 4 (n=11)
N
Med
IQR
@@ -505,7 +485,7 @@ mtcars %>%
-mpg
+mpg
11
26
7.6
@@ -519,7 +499,7 @@ mtcars %>%
kruskal.test
-disp
+disp
11
108
42
@@ -533,7 +513,7 @@ mtcars %>%
kruskal.test
-hp
+hp
11
91
30
@@ -547,7 +527,7 @@ mtcars %>%
kruskal.test
-drat
+drat
11
4.1
0.35
@@ -561,7 +541,7 @@ mtcars %>%
kruskal.test
-wt
+wt
11
2.2
0.74
@@ -575,7 +555,7 @@ mtcars %>%
kruskal.test
-qsec
+qsec
11
19
1.4
@@ -589,7 +569,7 @@ mtcars %>%
kruskal.test
-vs
+vs
11
1
0
@@ -603,7 +583,7 @@ mtcars %>%
kruskal.test
-am
+am
11
1
0.5
@@ -617,7 +597,7 @@ mtcars %>%
kruskal.test
-gear
+gear
11
4
0
@@ -631,7 +611,7 @@ mtcars %>%
kruskal.test
-carb
+carb
11
2
1
@@ -648,20 +628,20 @@ mtcars %>%
Also with conditions:
# With datatable output
-iris %>%
- group_by(Petal.Length > 5) %>%
- desctable %>%
+iris %>%
+ group_by(Petal.Length > 5) %>%
+ desctable %>%
datatable
-
-
+
+
And even on multiple nested groups:
-mtcars %>%
- dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
- group_by(vs, am, cyl) %>%
- desctable %>%
+mtcars %>%
+ dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
+ group_by(vs, am, cyl) %>%
+ desctable %>%
datatable
-
-
+
+
In the case of nested groups (a.k.a. sub-group analysis), statistical tests are performed only between the groups of the deepest grouping level.
Statistical tests are automatically selected depending on the data and the grouping factor.
@@ -698,12 +678,12 @@ iris %>%
This function will be used on every variable and every grouping factor to determine the appropriate test.
# Strictly equivalent to iris %>% group_by(Species) %>% desctable %>% datatable
-iris %>%
- group_by(Species) %>%
- desctable(tests = tests_auto) %>%
+iris %>%
+ group_by(Species) %>%
+ desctable(tests = tests_auto) %>%
datatable
-
-
+
+
List of statistical test functions
@@ -715,21 +695,21 @@ iris %>%
You can also provide overrides to use specific tests for specific variables.
This is done using list items named as the variable and containing a single-term formula function.
-iris %>%
- group_by(Petal.Length > 5) %>%
+iris %>%
+ group_by(Petal.Length > 5) %>%
desctable(tests = list(.auto = tests_auto,
- Species = ~chisq.test)) %>%
+ Species = ~chisq.test)) %>%
datatable
-
-
-mtcars %>%
- dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
- group_by(am) %>%
- desctable(tests = list(.default = ~wilcox.test,
- mpg = ~t.test)) %>%
+
+
+mtcars %>%
+ dplyr::mutate(am = factor(am, labels = c("Automatic", "Manual"))) %>%
+ group_by(am) %>%
+ desctable(tests = list(.default = ~wilcox.test,
+ mpg = ~t.test)) %>%
datatable
-
-
+
+
You might wonder why the formula expression. That is needed to capture the test name, and to provide it in the resulting table.
As with statistical functions, any statistical test function defined in R can be used.
The conditions are that the function
@@ -744,37 +724,37 @@ This is done using list items named as the variable and containing a single-term
Tips and tricks
In the stats argument, you can not only feed function names, but even arbitrary function definitions, functional sequences (a feature provided with the pipe (%>%
)), or partial applications (with the purrr package):
-mtcars %>%
+mtcars %>%
desctable(stats = list("N" = length,
- "Sum of squares" = function(x) sum(x^2),
- "Q1" = . %>% quantile(prob = .25),
- "Q3" = purrr::partial(quantile, probs = .75))) %>%
+ "Sum of squares" = function(x) sum(x^2),
+ "Q1" = . %>% quantile(prob = .25),
+ "Q3" = purrr::partial(quantile, probs = .75))) %>%
datatable
-
-
+
+
In the tests arguments, you can also provide function definitions, functional sequences, and partial applications in the formulas:
-iris %>%
- group_by(Species) %>%
+iris %>%
+ group_by(Species) %>%
desctable(tests = list(.auto = tests_auto,
- Sepal.Width = ~function(f) oneway.test(f, var.equal = F),
- Petal.Length = ~. %>% oneway.test(var.equal = T),
- Sepal.Length = ~purrr::partial(oneway.test, var.equal = T))) %>%
+ Sepal.Width = ~function(f) oneway.test(f, var.equal = F),
+ Petal.Length = ~. %>% oneway.test(var.equal = T),
+ Sepal.Length = ~purrr::partial(oneway.test, var.equal = T))) %>%
datatable
-
-
+
+
This allows you to modulate the behavior of desctable
in every detail, such as using paired tests, or non htest tests.
# This is a contrived example, which would be better solved with a dedicated function
library(survival)
-bladder$surv <- Surv(bladder$stop, bladder$event)
+bladder$surv <- Surv(bladder$stop, bladder$event)
-bladder %>%
- group_by(rx) %>%
- desctable(tests = list(.default = ~wilcox.test,
- surv = ~. %>% survdiff %>% .$chisq %>% pchisq(1, lower.tail = F) %>% list(p.value = .))) %>%
+bladder %>%
+ group_by(rx) %>%
+ desctable(tests = list(.default = ~wilcox.test,
+ surv = ~. %>% survdiff %>% .$chisq %>% pchisq(1, lower.tail = F) %>% list(p.value = .))) %>%
datatable
-
-
+
+
@@ -784,7 +764,7 @@ bladder %>%
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
- script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+ script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();