Browse Source

Moved to an all-Rmd based workflow

master
Maxime Wack 5 years ago
parent
commit
69c8d9b03f
10 changed files with 130 additions and 59 deletions
  1. +0
    -9
      01_Import.R
  2. +23
    -0
      01_Import.Rmd
  3. +0
    -11
      02_Tidy.R
  4. +31
    -0
      02_Tidy.Rmd
  5. +0
    -9
      03_Transform.R
  6. +32
    -0
      03_Transform.Rmd
  7. +0
    -20
      04_Analyse.R
  8. +35
    -0
      04_Analyse.Rmd
  9. +1
    -1
      05_Report.Rmd
  10. +8
    -9
      README.md

+ 0
- 9
01_Import.R View File

@@ -1,9 +0,0 @@
library(tidyverse)
#library(readxl)
#library(haven)
#library(rvest)
#library(jsonlite)

# Import the raw data in csv ----
read_ ("Data/Raw/") %>%
write.csv("Data/.csv", row.names = F)

+ 23
- 0
01_Import.Rmd View File

@@ -0,0 +1,23 @@
```{r libs}
library(tidyverse)
#library(readxl)
#library(haven)
#library(rvest)
#library(jsonlite)
```

# Import

This is where you would import all of your data, from every source needed (flat files, excel files, database queries, etc.).

```{r import}
read_ ("Data/Raw/") -> data
```

All data should undergo a cursory check (are the types correctly attributed for example?), and be saved as-is in a flat file format (preferably csv).

```{r export}
data %>%
write.csv("Data/.csv", row.names = F)
```


+ 0
- 11
02_Tidy.R View File

@@ -1,11 +0,0 @@
library(tidyverse)
# library(lubridate)

# Load the csv data and tidy it ----
read_csv("Data/.csv") %>%
select() %>%
filter() %>%
mutate() %>%
mutate_if(is.character, factor) %>%
# Save the tidy-ed data ----
saveRDS(file = "Data/tidy.rds")

+ 31
- 0
02_Tidy.Rmd View File

@@ -0,0 +1,31 @@
```{r libs}
library(tidyverse)
# library(lubridate)
```

# Tidy

First the data are read from the flat file produced in the **Import** step.

```{r import}
read_csv("Data/.csv") -> data
```

This is where you would tidy your data.
This shouldn't contain destructive transformation of data, just handling of types and obvious errors.

```{r tidy}
data %>%
select() %>%
filter() %>%
mutate() %>%
mutate_if(is.character, factor) -> data
```

The data are then exported in Rds to keep the formatting.
You can export as many objects as you want, as long as they are inside a named list.

```{r export}
list(data = data) %>%
saveRDS(file = "Data/tidy.rds")
```

+ 0
- 9
03_Transform.R View File

@@ -1,9 +0,0 @@
library(tidyverse)
# library(lubridate)

# Load the tidy-ed data ----
readRDS("Data/tidy.rds") %>%
# Transform the data

# Save the transformed data ----
saveRDS("Data/transformed.rds")

+ 32
- 0
03_Transform.Rmd View File

@@ -0,0 +1,32 @@
```{r libs}
library(tidyverse)
# library(lubridate)
```

# Transform

First the tidy-ed data are read from the Rds and exposed in the global environment.

```{r import}
readRDS("Data/tidy.rds") %>%
list2env(envir = globalenv())
```

This is where you would transform your data.
Destructive transformation are allowed here, feel free to experiment!

```{r transform}
data %>%
mutate() %>%
select() %>%
filter() -> data
```

Data are exported again in Rds, after transformation.

```{r export}
list(data = data) %>%
saveRDS("Data/transformed.rds")
```



+ 0
- 20
04_Analyse.R View File

@@ -1,20 +0,0 @@
library(tidyverse)
# library(broom)
# library(modelr)

# Initialize an empty list to store the results ----
results <- list()

# Load the transformed data ----
readRDS("Data/transformed.rds") -> df

# Specific transformations ----

# Models ----

# Tables ----

# Plots ----

# Save the results object ----
saveRDS(results, file = "Data/results.rds")

+ 35
- 0
04_Analyse.Rmd View File

@@ -0,0 +1,35 @@
```{r libs}
library(tidyverse)
# library(broom)
# library(modelr)
```

# Analyse

Initialize an empty list to store the results

```{r init list}
results <- list()
```

The transformed data is imported into the global environment.

```{r import}
readRDS("Data/transformed.rds") %>%
list2env(envir = globalenv())
```

# Specific transformations

# Models

# Tables

# Plots

The result from the analyses is saved in an object to produce the report
```{r export}
results %>%
saveRDS(file = "results.rds")
```


Rmd/report.Rmd → 05_Report.Rmd View File

@@ -13,7 +13,7 @@ library(knitr)
#library(pander)
#library(DT)

readRDS("../results.rds") %>%
readRDS("results.rds") %>%
list2env(envir = globalenv())

opts_chunk$set(echo = F,

+ 8
- 9
README.md View File

@@ -27,30 +27,29 @@ Also run `install.packages(c("tidyverse", "rmarkdown", "knitr"))` to install the

## Directory structure

The project contains three subdirectories: **Data/**, **Docs/** and **Rmd/**.
The project contains two subdirectories: **Data/** and **Docs/**.
**Data/** also contains a **Raw/** subdirectory.

**Data/Raw/** should contain the raw data when they exist as files (csv, xls(x), SQLite databases, SAS files, SPSS files, etc.).

**Docs/** should contain all external documents you have about the project (synopsis, context articles/presentations, etc.)

**Rmd/** will contain the files used to communicate the results.

## Scripts

Four scripts are already present, populated with boilerplate code for each of the steps.
Five scripts are already present, populated with boilerplate code for each of the steps.
Each of the scripts is an Rmd file, though they are not supposed to be knitted but more or less used like a notebook. (See [blogpost] for an idea of my workflow)
Packages `dplyr`, `magrittr`, `tidyr`, and `purrr` can be useful all the way.

**Every step makes a "savepoint" of your work, allowing you to rapidly iterate on any of the steps without having to re-run the previous ones (unless you've changed something up in the chain).**

### 01_Import.R
### 01_Import.Rmd

The first script is used to import raw data (whatever the source) and save a local csv copy in **Data/**.
Useful packages from the tidyverse here are `readr`, `readxl`, `rvest`, `haven`, and `jsonlite`.

Having the data ready as simple csv is useful to always be able to start from the beginning, even if the original source is unavailable.

### 02_Tidy.R
### 02_Tidy.Rmd

This step consists mostly of "non-destructive" data management: assign types to columns (factors with correct/human readable levels, dates, etc.), correct/censor obviously abnormal values and errors), transform between *long* and *wide* format, etc.
Useful packages here are `lubridate`, `stringr`, and `forcats`.
@@ -59,7 +58,7 @@ The results are saved in a **Data/tidy.rds** file.

After this second step, you will have your full data ready to use in R and shouldn't have to run the first two steps anymore (unless you get hold of new data).

### 03_Transform.R
### 03_Transform.Rmd

This script is for data transforming. It will contain all transformations of the data to make them ready for analyses.
Some "destructive" data management can occur here, such as dropping variables or observations, or modifying the levels of some factors.
@@ -67,7 +66,7 @@ Useful packages here are `forcats`, `lubridate`, and `stringr`.

The results are saved in a **Data/transformed.rds** file.

### 04_Analyze.R
### 04_Analyze.Rmd

This script will contain more data transforming, and the analyses with production of the resulting tables and plots.
There is a bit of an overlap between **03_Transform.R** and **04_Analyze.R** as it is often an iterative process. Both files can be merged into one, but it can be useful to have some time-consuming transformations in a separate script and have the results handy.
@@ -93,7 +92,7 @@ results
└─ figure3
```

### Rmd/report.Rmd
### 05_Report.Rmd

The Rmd file should not contain *any* literal values: every number, table, graph *has* to come from the results object (in its original form).
Only some really minor cosmetic modifications should be made then (running `prettyNum` on numerics or table columns, `select`/`filter`/`arrange`/`rename` on the full tables, etc.)


Loading…
Cancel
Save