Working more efficiently with RStudio

Reading Time:

For most of social science work Stata is all we "need". But it costs money, it's not friendly if you need to show model comparisons, and it's not a tool you would pick up if your tasks involve machine learning. Recently, I have built my datasets in Stata and wrote interactive documents and calculations in R.

A couple of tips from hard-won experience.

  1. Realistically, you will often use library() to load some packages that you do not need for a given task or session. But there are risks. If you always load packages that you like, what you intended your code to do will hinge on the order in which you loaded packages. It's better to be conservative and load the packages that you are actually using.
  2. Run old.packages() from time to time. If an important package (like stargazer) has an update, you'll want to grab it.
  3. Get to know a small number of packages well, not one function from a dozen of packages.
    Doing some text analysis? I'd say it's worth learning the functionality of quanteda and probably tidytext before diving into too many other packages.
    Doing some disciplined data mining? Starting with figuring out what Matt Taddy's gamlr does, seeing that you can get quite far with a single package is probably a good idea.
  4. Use RMarkdown (See 4B, 4C below).
  5. For the most computationally challenging chunks of the code, consider storing them so that they are not re-executed each time you knit the document. Do this with: {r chunkXYZ, cache=TRUE}.
  6. Relatedly, use lapply() or sapply() instead of for loops, if you can. For example, imagine you have a tibble or a data frame, and each row is a tweet. You want to know how long the typical tweet in the dataset is, possibly to check whether words past the old 140-character limit are included. If the column labeled "text" contains what you want to measure - the string length - for each tweet, you would apply the nchar() function for each row by running apply(tweets[,"text"],1,nchar).

4B: It's better to send the code to the console, and to view plots in the Plot window rather than right below your code chunk in the Rmd file.

In RStudio preferences, uncheck the "Show output inline for all R Markdown documents" option.

rstudio_preferences-1

4C: My yaml preferences and initial code chunks.

Time savers

I often reuse these lines to get an html or pdf file containing some R code started quickly. If you want to use the code, feel free to grab it:

Make a quick HTML file with R code

---
title: "Homework"
author: "Your_Name"
date: "April 2018"
output: 
  html_document:
    keep_md: true
---

```{r Setup, include=FALSE, results='hide', warning=FALSE}

packages <- c("devtools","knitr","tidyverse",
                "lubridate")

packages <- lapply(packages, FUN = function(x) {
  if(!require(x, character.only = TRUE)) {
    install.packages(x)
  library(x, character.only = TRUE)
  }
})

Make a quick pdf report with R code

I'd say Yihui Xie's bookdown is the way to go.

The top of the Rmd document will contain:

---
title: "Code & Charts"
author: "A project by XYZ"
date: "2018"
output:
  bookdown::pdf_document2:
  toc: false
  fig_caption: true
  number_sections: false
latex_engine: pdflatex
geometry: margin=1in
fontsize: 11pt
fontfamily: mathpazo
---

Voting patterns in 2016: Partisanship

Most partisans - Democrats and Republicans - voted for the candidate their own party nominated....