Moved to an all-Rmd based workflow

пре 6 година · 69c8d9b03f
--- a/01_Import.R
+++ b/01_Import.R
@@ -1,9 +0,0 @@
 library(tidyverse)
 #library(readxl)
 #library(haven)
 #library(rvest)
 #library(jsonlite)

 # Import the raw data in csv ----
 read_ ("Data/Raw/") %>%
  write.csv("Data/.csv", row.names = F)
--- a/01_Import.Rmd
+++ b/01_Import.Rmd
@@ -0,0 +1,23 @@
 ```{r libs}
 library(tidyverse)
 #library(readxl)
 #library(haven)
 #library(rvest)
 #library(jsonlite)
 ```

 # Import

 This is where you would import all of your data, from every source needed (flat files, excel files, database queries, etc.).

 ```{r import}
 read_ ("Data/Raw/") -> data
 ```

 All data should undergo a cursory check (are the types correctly attributed for example?), and be saved as-is in a flat file format (preferably csv).

 ```{r export}
 data %>%
  write.csv("Data/.csv", row.names = F)
 ```

--- a/02_Tidy.R
+++ b/02_Tidy.R
@@ -1,11 +0,0 @@
 library(tidyverse)
 # library(lubridate)

 # Load the csv data and tidy it ----
 read_csv("Data/.csv") %>%
  select() %>%
  filter() %>%
  mutate() %>%
  mutate_if(is.character, factor) %>%
 # Save the tidy-ed data ----
  saveRDS(file = "Data/tidy.rds")
--- a/02_Tidy.Rmd
+++ b/02_Tidy.Rmd
@@ -0,0 +1,31 @@
 ```{r libs}
 library(tidyverse)
 # library(lubridate)
 ```

 # Tidy

 First the data are read from the flat file produced in the **Import** step.

 ```{r import}
 read_csv("Data/.csv") -> data
 ```

 This is where you would tidy your data.  
 This shouldn't contain destructive transformation of data, just handling of types and obvious errors.

 ```{r tidy}
 data %>%
  select() %>%
  filter() %>%
  mutate() %>%
  mutate_if(is.character, factor) -> data
 ```

 The data are then exported in Rds to keep the formatting.  
 You can export as many objects as you want, as long as they are inside a named list.

 ```{r export}
 list(data = data) %>%
  saveRDS(file = "Data/tidy.rds")
 ```
--- a/03_Transform.R
+++ b/03_Transform.R
@@ -1,9 +0,0 @@
 library(tidyverse)
 # library(lubridate)

 # Load the tidy-ed data ----
 readRDS("Data/tidy.rds") %>%
 # Transform the data

 # Save the transformed data ----
 saveRDS("Data/transformed.rds")
--- a/03_Transform.Rmd
+++ b/03_Transform.Rmd
@@ -0,0 +1,32 @@
 ```{r libs}
 library(tidyverse)
 # library(lubridate)
 ```

 # Transform

 First the tidy-ed data are read from the Rds and exposed in the global environment.

 ```{r import}
 readRDS("Data/tidy.rds") %>%
  list2env(envir = globalenv())
 ```

 This is where you would transform your data.  
 Destructive transformation are allowed here, feel free to experiment!

 ```{r transform}
 data %>%
  mutate() %>%
  select() %>%
  filter() -> data
 ```

 Data are exported again in Rds, after transformation.

 ```{r export}
 list(data = data) %>%
  saveRDS("Data/transformed.rds")
 ```


--- a/04_Analyse.R
+++ b/04_Analyse.R
@@ -1,20 +0,0 @@
 library(tidyverse)
 # library(broom)
 # library(modelr)

 # Initialize an empty list to store the results ----
 results <- list()

 # Load the transformed data ----
 readRDS("Data/transformed.rds") -> df

 # Specific transformations ----

 # Models ----

 # Tables ----

 # Plots ----

 # Save the results object ----
 saveRDS(results, file = "Data/results.rds")
--- a/04_Analyse.Rmd
+++ b/04_Analyse.Rmd
@@ -0,0 +1,35 @@
 ```{r libs}
 library(tidyverse)
 # library(broom)
 # library(modelr)
 ```

 # Analyse

 Initialize an empty list to store the results

 ```{r init list}
 results <- list()
 ```

 The transformed data is imported into the global environment.

 ```{r import}
 readRDS("Data/transformed.rds") %>%
  list2env(envir = globalenv())
 ```

 # Specific transformations

 # Models

 # Tables

 # Plots

 The result from the analyses is saved in an object to produce the report
 ```{r export}
 results %>%
  saveRDS(file = "results.rds")
 ```

--- a/Rmd/report.Rmd
+++ b/Rmd/report.Rmd
@@ -13,7 +13,7 @@ library(knitr)
 #library(pander)
 #library(DT)

 readRDS("../results.rds") %>%
 readRDS("results.rds") %>%
  list2env(envir = globalenv())

 opts_chunk$set(echo = F,
--- a/README.md
+++ b/README.md
@@ -27,30 +27,29 @@ Also run `install.packages(c("tidyverse", "rmarkdown", "knitr"))` to install the

 ## Directory structure

 The project contains three subdirectories: **Data/**, **Docs/** and **Rmd/**.  
 The project contains two subdirectories: **Data/** and **Docs/**.  
 **Data/** also contains a **Raw/** subdirectory.

 **Data/Raw/** should contain the raw data when they exist as files (csv, xls(x), SQLite databases, SAS files, SPSS files, etc.).

 **Docs/** should contain all external documents you have about the project (synopsis, context articles/presentations, etc.)

 **Rmd/** will contain the files used to communicate the results.

 ## Scripts

 Four scripts are already present, populated with boilerplate code for each of the steps.  
 Five scripts are already present, populated with boilerplate code for each of the steps.  
 Each of the scripts is an Rmd file, though they are not supposed to be knitted but more or less used like a notebook. (See [blogpost] for an idea of my workflow)
 Packages `dplyr`, `magrittr`, `tidyr`, and `purrr` can be useful all the way.

 **Every step makes a "savepoint" of your work, allowing you to rapidly iterate on any of the steps without having to re-run the previous ones (unless you've changed something up in the chain).**  

 ### 01_Import.R
 ### 01_Import.Rmd

 The first script is used to import raw data (whatever the source) and save a local csv copy in **Data/**.  
 Useful packages from the tidyverse here are `readr`, `readxl`, `rvest`, `haven`, and `jsonlite`.

 Having the data ready as simple csv is useful to always be able to start from the beginning, even if the original source is unavailable.

 ### 02_Tidy.R
 ### 02_Tidy.Rmd

 This step consists mostly of "non-destructive" data management: assign types to columns (factors with correct/human readable levels, dates, etc.), correct/censor obviously abnormal values and errors), transform between *long* and *wide* format, etc.  
 Useful packages here are `lubridate`, `stringr`, and `forcats`.
@@ -59,7 +58,7 @@ The results are saved in a **Data/tidy.rds** file.

 After this second step, you will have your full data ready to use in R and shouldn't have to run the first two steps anymore (unless you get hold of new data).

 ### 03_Transform.R
 ### 03_Transform.Rmd

 This script is for data transforming. It will contain all transformations of the data to make them ready for analyses.  
 Some "destructive" data management can occur here, such as dropping variables or observations, or modifying the levels of some factors.  
@@ -67,7 +66,7 @@ Useful packages here are `forcats`, `lubridate`, and `stringr`.

 The results are saved in a **Data/transformed.rds** file.

 ### 04_Analyze.R
 ### 04_Analyze.Rmd

 This script will contain more data transforming, and the analyses with production of the resulting tables and plots.  
 There is a bit of an overlap between **03_Transform.R** and **04_Analyze.R** as it is often an iterative process. Both files can be merged into one, but it can be useful to have some time-consuming transformations in a separate script and have the results handy.  
@@ -93,7 +92,7 @@ results
   └─ figure3
 ```

 ### Rmd/report.Rmd
 ### 05_Report.Rmd

 The Rmd file should not contain *any* literal values: every number, table, graph *has* to come from the results object (in its original form).  
 Only some really minor cosmetic modifications should be made then (running `prettyNum` on numerics or table columns, `select`/`filter`/`arrange`/`rename` on the full tables, etc.)