@@ -1,9 +0,0 @@ | |||
library(tidyverse) | |||
#library(readxl) | |||
#library(haven) | |||
#library(rvest) | |||
#library(jsonlite) | |||
# Import the raw data in csv ---- | |||
read_ ("Data/Raw/") %>% | |||
write.csv("Data/.csv", row.names = F) |
@@ -0,0 +1,23 @@ | |||
```{r libs} | |||
library(tidyverse) | |||
#library(readxl) | |||
#library(haven) | |||
#library(rvest) | |||
#library(jsonlite) | |||
``` | |||
# Import | |||
This is where you would import all of your data, from every source needed (flat files, excel files, database queries, etc.). | |||
```{r import} | |||
read_ ("Data/Raw/") -> data | |||
``` | |||
All data should undergo a cursory check (are the types correctly attributed for example?), and be saved as-is in a flat file format (preferably csv). | |||
```{r export} | |||
data %>% | |||
write.csv("Data/.csv", row.names = F) | |||
``` | |||
@@ -1,11 +0,0 @@ | |||
library(tidyverse) | |||
# library(lubridate) | |||
# Load the csv data and tidy it ---- | |||
read_csv("Data/.csv") %>% | |||
select() %>% | |||
filter() %>% | |||
mutate() %>% | |||
mutate_if(is.character, factor) %>% | |||
# Save the tidy-ed data ---- | |||
saveRDS(file = "Data/tidy.rds") |
@@ -0,0 +1,31 @@ | |||
```{r libs} | |||
library(tidyverse) | |||
# library(lubridate) | |||
``` | |||
# Tidy | |||
First the data are read from the flat file produced in the **Import** step. | |||
```{r import} | |||
read_csv("Data/.csv") -> data | |||
``` | |||
This is where you would tidy your data. | |||
This shouldn't contain destructive transformation of data, just handling of types and obvious errors. | |||
```{r tidy} | |||
data %>% | |||
select() %>% | |||
filter() %>% | |||
mutate() %>% | |||
mutate_if(is.character, factor) -> data | |||
``` | |||
The data are then exported in Rds to keep the formatting. | |||
You can export as many objects as you want, as long as they are inside a named list. | |||
```{r export} | |||
list(data = data) %>% | |||
saveRDS(file = "Data/tidy.rds") | |||
``` |
@@ -1,9 +0,0 @@ | |||
library(tidyverse) | |||
# library(lubridate) | |||
# Load the tidy-ed data ---- | |||
readRDS("Data/tidy.rds") %>% | |||
# Transform the data | |||
# Save the transformed data ---- | |||
saveRDS("Data/transformed.rds") |
@@ -0,0 +1,32 @@ | |||
```{r libs} | |||
library(tidyverse) | |||
# library(lubridate) | |||
``` | |||
# Transform | |||
First the tidy-ed data are read from the Rds and exposed in the global environment. | |||
```{r import} | |||
readRDS("Data/tidy.rds") %>% | |||
list2env(envir = globalenv()) | |||
``` | |||
This is where you would transform your data. | |||
Destructive transformation are allowed here, feel free to experiment! | |||
```{r transform} | |||
data %>% | |||
mutate() %>% | |||
select() %>% | |||
filter() -> data | |||
``` | |||
Data are exported again in Rds, after transformation. | |||
```{r export} | |||
list(data = data) %>% | |||
saveRDS("Data/transformed.rds") | |||
``` | |||
@@ -1,20 +0,0 @@ | |||
library(tidyverse) | |||
# library(broom) | |||
# library(modelr) | |||
# Initialize an empty list to store the results ---- | |||
results <- list() | |||
# Load the transformed data ---- | |||
readRDS("Data/transformed.rds") -> df | |||
# Specific transformations ---- | |||
# Models ---- | |||
# Tables ---- | |||
# Plots ---- | |||
# Save the results object ---- | |||
saveRDS(results, file = "Data/results.rds") |
@@ -0,0 +1,35 @@ | |||
```{r libs} | |||
library(tidyverse) | |||
# library(broom) | |||
# library(modelr) | |||
``` | |||
# Analyse | |||
Initialize an empty list to store the results | |||
```{r init list} | |||
results <- list() | |||
``` | |||
The transformed data is imported into the global environment. | |||
```{r import} | |||
readRDS("Data/transformed.rds") %>% | |||
list2env(envir = globalenv()) | |||
``` | |||
# Specific transformations | |||
# Models | |||
# Tables | |||
# Plots | |||
The result from the analyses is saved in an object to produce the report | |||
```{r export} | |||
results %>% | |||
saveRDS(file = "results.rds") | |||
``` | |||
@@ -13,7 +13,7 @@ library(knitr) | |||
#library(pander) | |||
#library(DT) | |||
readRDS("../results.rds") %>% | |||
readRDS("results.rds") %>% | |||
list2env(envir = globalenv()) | |||
opts_chunk$set(echo = F, |
@@ -27,30 +27,29 @@ Also run `install.packages(c("tidyverse", "rmarkdown", "knitr"))` to install the | |||
## Directory structure | |||
The project contains three subdirectories: **Data/**, **Docs/** and **Rmd/**. | |||
The project contains two subdirectories: **Data/** and **Docs/**. | |||
**Data/** also contains a **Raw/** subdirectory. | |||
**Data/Raw/** should contain the raw data when they exist as files (csv, xls(x), SQLite databases, SAS files, SPSS files, etc.). | |||
**Docs/** should contain all external documents you have about the project (synopsis, context articles/presentations, etc.) | |||
**Rmd/** will contain the files used to communicate the results. | |||
## Scripts | |||
Four scripts are already present, populated with boilerplate code for each of the steps. | |||
Five scripts are already present, populated with boilerplate code for each of the steps. | |||
Each of the scripts is an Rmd file, though they are not supposed to be knitted but more or less used like a notebook. (See [blogpost] for an idea of my workflow) | |||
Packages `dplyr`, `magrittr`, `tidyr`, and `purrr` can be useful all the way. | |||
**Every step makes a "savepoint" of your work, allowing you to rapidly iterate on any of the steps without having to re-run the previous ones (unless you've changed something up in the chain).** | |||
### 01_Import.R | |||
### 01_Import.Rmd | |||
The first script is used to import raw data (whatever the source) and save a local csv copy in **Data/**. | |||
Useful packages from the tidyverse here are `readr`, `readxl`, `rvest`, `haven`, and `jsonlite`. | |||
Having the data ready as simple csv is useful to always be able to start from the beginning, even if the original source is unavailable. | |||
### 02_Tidy.R | |||
### 02_Tidy.Rmd | |||
This step consists mostly of "non-destructive" data management: assign types to columns (factors with correct/human readable levels, dates, etc.), correct/censor obviously abnormal values and errors), transform between *long* and *wide* format, etc. | |||
Useful packages here are `lubridate`, `stringr`, and `forcats`. | |||
@@ -59,7 +58,7 @@ The results are saved in a **Data/tidy.rds** file. | |||
After this second step, you will have your full data ready to use in R and shouldn't have to run the first two steps anymore (unless you get hold of new data). | |||
### 03_Transform.R | |||
### 03_Transform.Rmd | |||
This script is for data transforming. It will contain all transformations of the data to make them ready for analyses. | |||
Some "destructive" data management can occur here, such as dropping variables or observations, or modifying the levels of some factors. | |||
@@ -67,7 +66,7 @@ Useful packages here are `forcats`, `lubridate`, and `stringr`. | |||
The results are saved in a **Data/transformed.rds** file. | |||
### 04_Analyze.R | |||
### 04_Analyze.Rmd | |||
This script will contain more data transforming, and the analyses with production of the resulting tables and plots. | |||
There is a bit of an overlap between **03_Transform.R** and **04_Analyze.R** as it is often an iterative process. Both files can be merged into one, but it can be useful to have some time-consuming transformations in a separate script and have the results handy. | |||
@@ -93,7 +92,7 @@ results | |||
└─ figure3 | |||
``` | |||
### Rmd/report.Rmd | |||
### 05_Report.Rmd | |||
The Rmd file should not contain *any* literal values: every number, table, graph *has* to come from the results object (in its original form). | |||
Only some really minor cosmetic modifications should be made then (running `prettyNum` on numerics or table columns, `select`/`filter`/`arrange`/`rename` on the full tables, etc.) | |||