No puede seleccionar más de 25 temas Los temas deben comenzar con una letra o número, pueden incluir guiones ('-') y pueden tener hasta 35 caracteres de largo.
Maxime Wack 1b074858ce Corrected jsonlite hace 6 años
Data Empty files hace 7 años
Docs Empty files hace 7 años
Rmd Script files hace 7 años
.gitignore Initial commit hace 7 años
01_Import.R Corrected jsonlite hace 6 años
02_Tidy.R Use RDS files for storage and put them in Data/ hace 7 años
03_Transform.R Use RDS files for storage and put them in Data/ hace 7 años
04_Analyse.R Use RDS files for storage and put them in Data/ hace 7 años
LICENSE Updated license to MIT hace 7 años
README.md Updated README hace 7 años

README.md

tidyflow: a workflow that fits the tidyverse

Tidyflow is not a package, but a project skeleton that you can clone/fork to start your own projects.

It follows the project structure proposed by Hadley Wickham in R for Data Science

Image under CC-BY-NC-ND

Install

If you are on github, simply fork the repo.

If you don't want to use github as your remote, clone the depo in a new directory

git clone https://www.github.com/maximewack/tidyflow new_project

Then change the git remote origin to your own remote repo.

git remote set-url your_repo_url

The project already contains a .gitignore file for R projects.
Add rules for your data files if you don't want them to be shared.

Also run install.packages(c("tidyverse", "rmarkdown", "knitr")) to install the necessary dependencies.

Directory structure

The project contains three subdirectories: Data/, Docs/ and Rmd/.
Data/ also contains a Raw/ subdirectory.

Data/Raw/ should contain the raw data when they exist as files (csv, xls(x), SQLite databases, SAS files, SPSS files, etc.).

Docs/ should contain all external documents you have about the project (synopsis, context articles/presentations, etc.)

Rmd/ will contain the files used to communicate the results.

Scripts

Four scripts are already present, populated with boilerplate code for each of the steps.
Packages dplyr, magrittr, tidyr, and purrr can be useful all the way.

Every step makes a “savepoint” of your work, allowing you to rapidly iterate on any of the steps without having to re-run the previous ones (unless you've changed something up in the chain).

01_Import.R

The first script is used to import raw data (whatever the source) and save a local csv copy in Data/.
Useful packages from the tidyverse here are readr, readxl, rvest, haven, and jsonlite.

Having the data ready as simple csv is useful to always be able to start from the beginning, even if the original source is unavailable.

02_Tidy.R

This step consists mostly of “non-destructive” data management: assign types to columns (factors with correct/human readable levels, dates, etc.), correct/censor obviously abnormal values and errors), transform between long and wide format, etc.
Useful packages here are lubridate, stringr, and forcats.

The results are saved in a Data/tidy.rds file.

After this second step, you will have your full data ready to use in R and shouldn't have to run the first two steps anymore (unless you get hold of new data).

03_Transform.R

This script is for data transforming. It will contain all transformations of the data to make them ready for analyses.
Some “destructive” data management can occur here, such as dropping variables or observations, or modifying the levels of some factors.
Useful packages here are forcats, lubridate, and stringr.

The results are saved in a Data/transformed.rds file.

04_Analyze.R

This script will contain more data transforming, and the analyses with production of the resulting tables and plots.
There is a bit of an overlap between 03_Transform.R and 04_Analyze.R as it is often an iterative process. Both files can be merged into one, but it can be useful to have some time-consuming transformations in a separate script and have the results handy.
Useful packages here are broom, ggplot2, and modelr.

In this script all the “interesting results,” full tables and ggplot graphs are included in a single hierarchical list, saved in a Data/results.rds file.
All the results from the analyses should be saved as-is without transformation, so that every result can be used in the Rmd. Having all the results pre-computed for the Rmd means that it will take mere seconds to re-compile, while still having access to all the results if you want/need to use them somewhere in the manuscript/report.

The results object can look like this:

results
├─ tables
│  ├─ demographics
│  ├─ ttt_vs_control
│  └─ table3
├─ list_of_interesting_values
├─ interesting_values2
└─ plots
   ├─ figure1
   ├─ figure2
   └─ figure3

Rmd/report.Rmd

The Rmd file should not contain any literal values: every number, table, graph has to come from the results object (in its original form).
Only some really minor cosmetic modifications should be made then (running prettyNum on numerics or table columns, select/filter/arrange/rename on the full tables, etc.)
Multiple Rmds can be made using the same results: one for a full blown scientific article, one for a quick report, one for a presentation, etc.

You will never have to check again for discrepancies between tables/figures and text, or even between different media.