Data science and visualization

A collection of 18 posts

"Data science and visualization" Stories Page 1 of 2  

Learning Git and Github

Grant McDermott: Version control with Git(Hub)Daniel M. Sullivan: A brief intro to Git Learn Git BranchingMichael Stepner: git vs. Dropboxworldbank/DIME-Resources: Repo for all...

Degrees and courses in Data Science

A year ago, I saw this announcement inside Harvard Yard:Observed in the habitat earlier today: pic.twitter.com/PCMUJOXQcE— Jan Zilinsky (@janzilinsky) August 30,...

Choosing colors for charts: examples and references

After ColorBrewer, now a very popular helper, what's next? Using these two tools, I stored several colors I want to use in my charts made in...

Out of sample predictions from OLS regressions: a K-folds tutorial in R

Evaluating predictions out of sample, OOS. Splitting datasets into training and test, holdout data using R....

Pre-processing text data with tm, quanteda & tidytext packages

Suppose you start with some sentences / passages / documents, and you want to pre-process the corpus before generating a document-term matrix (DTM, or DFM). This post will...

Working more efficiently with RStudio

For most of social science work Stata is all we "need". But it costs money, it's not friendly if you need to show model...

NAs in R: some warnings (and a worked example; calculating standard deviations)

This post shows why is.na and !is.na are not ideal approaches to “clean” a dataset with missing values when we want to compute summary...

What every STATA user needs to know - how missing values are treated

This is a post for people who are learning Stata. A common source of mistakes is generating a binary variable that should classify observations according to...

How to talk honestly about your (descriptive) regression

After running a regression, even if you just want to look at empirical correlations (i.e. you do not claim observed associations are causal) you will...

Using STATA: Bar charts with multiple groups using by() and over()

Let's compare Q1 GDP growth vs. the rest of each year, starting in 2009: Here is the code to make the above chart: graph bar ann_...

Data visualization principle: Does the chart needs to be interactive?

Re-posting a reminder that: Yes, interactive charts can be engaging. But many viewers will not see the data that is not shown by default. It can...

Page 1 of 2