You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

209 lines
5.2KB

  1. ---
  2. title: "desctable tips"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{desctable tips}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, echo = F, message = F, warning = F}
  10. library(desctable)
  11. ```
  12. Here is collection of tips and tricks to go further with *desctable*
  13. #
  14. ##
  15. ### Label variables
  16. You can define labels for variables using the `.labels` argument in `desc_table`
  17. ```{r}
  18. labels <- c(mpg = "Miles/(US) gallon",
  19. cyl = "Number of cylinders",
  20. disp = "Displacement (cu.in.)",
  21. hp = "Gross horsepower",
  22. drat = "Rear axle ratio",
  23. wt = "Weight (1000 lbs)",
  24. qsec = "1/4 mile time",
  25. vs = "Engine",
  26. am = "Transmission",
  27. gear = "Number of forward gears",
  28. CARBURATOR = "Number of carburetors")
  29. mtcars %>%
  30. desc_table(.labels = labels) %>%
  31. desc_output("DT")
  32. ```
  33. As you can see with `CARBURATOR` instead of `carb`, not all variables need to have a label, and unused labels are discarded.
  34. ### Default statistics
  35. `desc_table` chooses its own statistics this way:
  36. - always show `N = length`
  37. - show `"%" = percent` if there is at least a factor
  38. - show `min`, `max`, `Q1`, `Q3`, `median`, `mean`, `sd`, `IQR` if there is at least a numeric
  39. ### Defining your own default statistics
  40. You can define your own automatic statistic function using the `.auto` argument in `desc_table`.
  41. This function should accept one argument, the table to choose statistics for (in the case of a grouped dataframe the subtables will be passed to the function). It should return a list of statistics.
  42. Here is the code of `stats_auto`, the default value of `.auto`
  43. ```{r, eval = F}
  44. stats_auto <- function(data) {
  45. data %>%
  46. lapply(is.numeric) %>%
  47. unlist() %>%
  48. any -> numeric
  49. data %>%
  50. lapply(is.factor) %>%
  51. unlist() %>%
  52. any() -> fact
  53. stats <- list("Min" = min,
  54. "Q1" = ~quantile(., .25),
  55. "Med" = stats::median,
  56. "Mean" = mean,
  57. "Q3" = ~quantile(., .75),
  58. "Max" = max,
  59. "sd" = stats::sd,
  60. "IQR" = IQR)
  61. if (fact & numeric)
  62. c(list("N" = length,
  63. "%" = percent),
  64. stats)
  65. else if (fact & !numeric)
  66. list("N" = length,
  67. "%" = percent)
  68. else if (!fact & numeric)
  69. stats
  70. }
  71. ```
  72. ### Reuse a list of defined statistics
  73. If you often reuse the same statistics for multiple tables and you don't want to repeat yourself, you can splice a list to `desc_table` using the `rlang::!!!` operator
  74. ```{r}
  75. stats = list(N = length,
  76. Mean = mean,
  77. SD = sd)
  78. mtcars %>%
  79. desc_table(!!!stats) %>%
  80. desc_output("DT")
  81. ```
  82. When splicing, all stats need to be explicitly named
  83. ```{r}
  84. stats2 = list(N = length,
  85. mean,
  86. sd)
  87. mtcars %>%
  88. desc_table(!!!stats2) %>%
  89. desc_output("DT")
  90. ```
  91. You can also define a "dumb" automatic function
  92. ```{r}
  93. default_stats <- function(data)
  94. {
  95. list(N = length,
  96. mean,
  97. sd)
  98. }
  99. ```
  100. ### Default statistical tests
  101. `desc_table` chooses its own statistical tests this way:
  102. - if the variable is a factor, use `fisher.test`
  103. - if `fisher.test` fails, fallback on `chisq.test`
  104. - if the variable is numeric, use
  105. - `wilcoxon.test` if there are two groups
  106. - `kruskal.test` if there are more than two groups
  107. ### Defining your own default statistical tests
  108. You can define your own automatic statistic function using the `.auto` argument in `desc_tests`.
  109. This function should accept two arguments, the variable to compare and the grouping variable, and return a statistical test that accepts a `formula` argument and returns an object with a `p.value` element.
  110. Here is the code of `tests_auto`, the default value of `.auto`
  111. ```{r, eval = F}
  112. tests_auto <- function(var, grp) {
  113. grp <- factor(grp)
  114. if (nlevels(grp) < 2)
  115. ~no.test
  116. else if (is.factor(var)) {
  117. if (tryCatch(is.numeric(fisher.test(var ~ grp)$p.value), error = function(e) F))
  118. ~fisher.test
  119. else
  120. ~chisq.test
  121. } else if (nlevels(grp) == 2)
  122. ~wilcox.test
  123. else
  124. ~kruskal.test
  125. }
  126. ```
  127. You can also provide a default statistical test using the `.default` argument
  128. ```{r}
  129. mtcars %>%
  130. group_by(am) %>%
  131. desc_table(mean, sd) %>%
  132. desc_tests(.default = ~t.test) %>%
  133. desc_output("DT")
  134. ```
  135. Note that as with named tests, it is necessary to prepend the test name with a tilde (`~`).
  136. You can still choose individual tests when you define either a `.auto` or a `.default` test
  137. ```{r, warning = F}
  138. mtcars %>%
  139. group_by(am) %>%
  140. desc_table(mean, sd, median, IQR) %>%
  141. desc_tests(.default = ~t.test, carb = ~wilcox.test) %>%
  142. desc_output("DT")
  143. ```
  144. Note that if a `.default` test is provided, `.auto` is ignored.
  145. ### Output options
  146. You can set the number of significant digits to display with the `digits` argument.
  147. The p values are truncated at 1E-digits.
  148. ```{r}
  149. iris %>%
  150. group_by(Species) %>%
  151. desc_table(mean, sd) %>%
  152. desc_tests() %>%
  153. desc_output("DT", digits = 10)
  154. ```
  155. Any additional argument given to `desc_output` will be carried to the output function
  156. ```{r}
  157. iris %>%
  158. group_by(Species) %>%
  159. desc_table(mean, sd) %>%
  160. desc_output("DT", filter = "top")
  161. ```