SRC PhD R course Module 3

RMarkdown, Github & Co

Stefan Daume

16. June 2022

 

Why learn RMarkdown & git/Github?

Key motivation

Avoid repetitive and error-prone tasks.

You should use RMarkdown if you want to …

  • concentrate on content rather than formatting
  • share one document in many different formats (Markdown, PDF, Word, HTML)
  • ensure correct citations and bibliographies
  • switch between different citation formats
  • integrate your data analysis automatically, not statically
  • … and much more

(R)Markdown

Markdown vs markup

Markdown allows us to concentrate on document structure and content. We can then worry about styling and presentation later.

Markdown is a type of markup language (like HTML), but it is lightweight and more readable.

Some text with simple formatting

This is a list:

  • with some bold and
  • some italic text.

And a hyperlink for good measure.

Markup samples

HTML

<p>This is a list:</p>
<ul>
<li>with some <strong>bold</strong> and</li>
<li>some <em>italic</em> text.</li>
</ul>
<p>And a <a href="https://bookdown.org/yihui/rmarkdown/">hyperlink</a>
for good measure.</p>

LaTeX

This is a list:

\begin{itemize}
\tightlist
\item
  with some \textbf{bold} and
\item
  some \emph{italic} text.
\end{itemize}

And a \href{https://bookdown.org/yihui/rmarkdown/}{hyperlink} for good
measure.

The same with Markdown

Basic Markdown

This is a list:

* with some **bold** and 
* some *italic* text.

And a [hyperlink](https://bookdown.org/yihui/rmarkdown/) for good measure.

Typical workflow with markdown:

  1. write content as a Markdown document,
  2. generate the final document in a suitable output format (commonly HTML, PDF, Word)
  3. publish

Essential markdown syntax

Basic formatting and structuring

Even tables

An overview of core markdown syntax can be found in this RMarkdown book chapter and even more options in a condensed form as an RMarkdown cheat sheet.

‘R Markdown’ vs ‘Markdown’

  • Purpose: dynamically weave together text, data and analysis workflows.
  • This is accomplished with the knitr package, an R package conveniently integrated into the R Studio UI.

Differences to basic Markdown

  • R Markdown files use the file extension .Rmd instead of .md.
  • R Markdown files must start with a so-called YAML header section.

YAML - Yet Another Markup Language?

The YAML header must be placed at the beginning of a document and is enclosed by three dashes ---.

---
title: "Untitled"
output: html_document
date: '2022-06-14'
---

Above is the default YAML header when generating an RMarkdown file in R Studio.

YAML Ain’t Markup Language!

The YAML header contains meta-data (e.g. title, date, author(s) etc) as well as information about the output format and style.

A YAML header with more options might look like this:

---
title: "R Course SRC"
subtitle: "Module 3"
date: "`r Sys.Date()`"
author: 'Stefan Daume' 
output: 
  html_document:
    toc: yes
bibliography: references.bib 
link-citations: yes
---

Exercise

  1. Create a default ‘R Markdown’ document in R Studio.
  2. “knit” the document to HTML and view the result.
  3. Use the Knit button to select different output formats and check the YAML header afterwards.

R Markdown: data-driven documents

  • You can now integrate your analysis as R code into the document
  • The analysis (i.e. the R code) is executed and the results updated when you knit the document.
  • Text and code are interspersed.
  • Code sections are included in code chunks like this.
```{r some-explanatory-label, echo=FALSE}
# here goes your R code
```

An example from the previous sessions

```{r life-expectancy, echo=FALSE}
library(gapminder)

gapminder %>% 
    group_by(year) %>%
    summarise(ale = mean(lifeExp)) %>%
    ggplot(aes(x = year, y = ale)) +
    geom_line(color = "orange") +
    labs(x = "Year", 
         y = "Average life expectancy") +
    theme_classic(base_size = 16)
```

Plots in R Markdown

```{r life-expectancy, echo=FALSE}
library(gapminder)

gapminder %>% 
    group_by(year) %>%
    summarise(ale = mean(lifeExp)) %>%
    ggplot(aes(x = year, y = ale)) +
    geom_line(color = "orange") +
    labs(x = "Year", 
         y = "Average life expectancy") +
    theme_classic(base_size = 16)
```

Remember the Markdown table format?

Dynamic tables with R Markdown

This code …

```{r}
# summarize gapminder data by continent
gapminder_latest <- gapminder %>% 
  filter(year == year_of_interest) %>%
  group_by(continent) %>%
  summarise(avrg_le = mean(lifeExp),
            avrg_gdp = mean(gdpPercap))
              
# print the results as a table
gapminder_latest %>%
  knitr::kable()
```

… creates this table:

continent avrg_le avrg_gdp
Africa 54.80604 3089.033
Americas 73.60812 11003.032
Asia 70.72848 12473.027
Europe 77.64860 25054.482
Oceania 80.71950 29810.188

Customizing kable tables

This code …

```{r}
# summarize gapminder data by continent
gapminder_latest <- gapminder %>% 
  filter(year == year_of_interest) %>%
  group_by(continent) %>%
  summarise(avrg_le = mean(lifeExp),
            avrg_gdp = mean(gdpPercap))
              
# print the results as a table
gapminder_latest %>%
  knitr::kable(digits = c(0,1,2))
```

… creates this table:

continent avrg_le avrg_gdp
Africa 54.8 3089.03
Americas 73.6 11003.03
Asia 70.7 12473.03
Europe 77.6 25054.48
Oceania 80.7 29810.19

More expressive tables with kableExtra

The kableExtra package offers even more options:

  • data-driven colouring

  • interactive tables

  • grouped headers

  • tables with (interactive) sparklines

  • and more …

Table caption: Dynamic formatting with the the help of kableExtra. This example shows Gapminder data summarised by continent for the year 2007.

Continent

Mean life expectancy

Mean GDP

Africa

54.8

3089.03

Americas

73.6

11003.03

Asia

70.7

12473.03

Europe

77.6

25054.48

Oceania

80.7

29810.19

Central ‘Setup’ code section

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)

library(readr)
library(dplyr)
library(ggplot2)
library(gapminder)

year_of_interest <- 2007
```

Simplify library import and prepare datasets for reference in the whole document.

Handling citations

Citations and bibliographies

One of the most useful and powerful features for researchers using R Markdown.

Requires a BibTeX database

A BibTeX database is simply a text file with the extension .bib and entries such as:

@misc{XieAllaire_et_2022,
  author = {Xie, Yihui and Allaire, J. J. and Grolemund, Garrett},
  title = {{R Markdown: The Definitive Guide}},
  url = {https://bookdown.org/yihui/rmarkdown/},
  urldate = {2022-06-07},
  year = {2022}
}

No need to write those. Export from your reference manager or journal pages.

Include citations

Point to the .bib file in the YAML header.

---
title: "R Course SRC"
subtitle: "Module 3"
date: "2022-06-16"
author: 'Stefan Daume' 
output: 
  html_document:
    toc: yes
bibliography: references.bib 
link-citations: yes
---

And then include citations in the text with the format [@CitationKey], which in the previously shown example was [@XieAllaire_et_2022], which is a reference to (Xie, Allaire, and Grolemund 2022).

Include a bibliography

After presenting all results we have now reached the end of the document. Here should follow the bibliograpy.

# References

Add the header # References at the end of your document, knit and the complete bibliography is added to the output document.

Include a bibliography

After presenting all results we have now reached the end of the manuscript. As a final section should follow the bibliography.

# References

Add the header # References at the end of your document, knit and the complete bibliography is added to the output document.

Switch citation and bibliography styles dynamically

Specify citation style in the YAML header.

---
title: "R Course SRC"
subtitle: "Module 3"
date: "2022-06-16"
author: 'Stefan Daume' 
output: 
  html_document:
    toc: yes
bibliography: references.bib 
link-citations: yes
csl: ecology-and-society.csl
---

The Citation Style Database database contains thousands of journal citation styles. Download the relevant one, reference in the YAML header and the output document will have the required citation style.

Easy sharing and online publishing

  1. knit your R Mardown document to HTML
  2. push the HTML to Github (next part of this module)
  3. enable sharing of Github Pages

This is how this presentation works (and the others before).

“Continous Analysis” as the ultimate goal

Thank You!

Key Resources

References

Bryan, Jennifer. 2017. “Excuse me, do you have a moment to talk about version control?” PeerJ Preprints 5:e3159v2 (August). https://doi.org/10.7287/PEERJ.PREPRINTS.3159V2.

———. 2021. “Happy Git and GitHub for the useR.” https://happygitwithr.com/.

Chacon, Scott, and Ben Straub. 2014. Pro Git. Apress. https://doi.org/10.1007/978-1-4842-0076-6.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2022. “R Markdown: The Definitive Guide.” https://bookdown.org/yihui/rmarkdown/.

Colophon

SRC PhD R course Module 3 — RMarkdown, Github & Co" by Stefan Daume

 

Presented on 16. June 2022.

 

This presentation can be cited using: doi:…

 

PRESENTATION DETAILS

Author/Affiliation: Stefan Daume, Stockholm Resilience Centre, Stockholm University

Presentation URL: https://sdaume.github.io/r-course-module-3/slides

Presentation Source: [TBD]

Presentation PDF: [TBD]

 

CREDITS & LICENSES

This presentation is delivered with the help of several free and open source tools and libraries. It utilises the reveal.js presentation framework and has been created using RMarkdown, knitr, RStudio and Pandoc. highlight.js provides syntax highlighting for code sections. MathJax supports the rendering of mathematical notations. PDF and JPG copies of this presentation were generated with DeckTape. Please note the respective licenses of these tools and libraries.

 

If not noted and attributed otherwise, the contents (text, charts, images) of this presentation are Copyright © 2022 of the Author and provided under a CC BY 4.0 public domain license.

Appendix (some notes on Github)

You need git and Github if … (non-exhaustive list)

  • … you have files like this, but realise that this is not efficient
    • my_paper_draft_2021_05_16.docx
    • my_paper_draft_2021_05_18.docx
    • my_paper_draft_2021_05_19.docx
    • my_paper_draft_2021_05_19_v1.docx
    • my_paper_draft_2021_05_19_v2.docx
    • my_paper_draft_2021_05_19_v3_with_comments.docx
  • … you are not creating regular backups of your work
  • … you want to collaborate with others
  • … you want to maintain projects rather than a single file (Google Doc)
  • … you want to be able to easily revert back to previous versions of your work

Git history

  • linux development, started 2005
  • a version management system, i.e. tracks changes in project resources
  • git takes snapshots of a managed project (image)
  • distributed version control system (that means you always have a complete copy of your version history on your local computer)

Key concepts

  • repo
  • staging
  • commit
  • diff
  • push
  • pull
  • branch (advanced)
  • remote origin

Useful things

  • .gitignore

How to write a great commit comment

Most important:

  • Keep things atomic!

Document consistently:

  • Keep the subject line short.
  • Use the imperative mood in the subject line (Because a commit message should always complete the following line: “If applied, this commit will [YOUR_SUBJECT_LINE].”)
  • Use the body to explain what and why vs. how (Because “the how” can be obtained from the diff. The commit message should provide the context for “the how”.)

Setting Git up with R Studio

Do this once:

  • install git locally (see (Bryan 2021))
  • sign up for a Github account
  • create a personal access token

Do this for every new project:

  • create a Github repo first (follow the New project, Github first workflow in (Bryan 2021))
    • say yes to creating a README
    • why? its easiest! you have everything in place to create remote backups!
  • copy the HTTPS link of your new repo
  • then create an R Studio project with the option from “version control > git”

When your new project is set up

  • make a change to the README.md (a useful project description)
  • commit the changes of the README file
  • and push to the remote Github repo
  • check the Github repo