Using STATA: Bar charts with multiple groups using by() and over()

 •  Filed under Stata - tips and tricks, Data science and visualization

Let's compare Q1 GDP growth vs. the rest of each year, starting in 2009:

Here is the code to make the above chart:

graph bar ann_growthQ1 ann_growthRest, ///  
bargap(5) ///  
graphregion(color(white)) ///  
over(year, gap(50) label(angle(45))) ///  
ytitle("Real GDP growth (percent)") ///  
ylabel(, angle(horizontal)) ///  
bar(1, color(red*0.3)) bar(2, color(blue*0.7)) ///  
legend(label(1 "First quarter") label(2 "Other quarters") rows(2) ring(0) pos(4) region(lcolor(white))) ///  
title("GDP growth in Q1 vs. average of Q2-Q4") ///  
nofill  
graph export gdp-bar.png, width(1200) replace  

That's an improved version of what we'd get with graph bar ann_growthQ1 ann_growthRest, over(year):

How to get the data

freduse GDPC1  
gen dd=yq(year(daten), quarter(daten))  
tsset dd, quarterly

* Compute annualized quarterly growth:
gen ann_growth = ((GDPC1/l.GDPC1)^4-1)*100

* Generate variable for observation tagging:
gen quarter = substr(date,6,2)  
gen q1 = (quarter == "01")  
gen year = substr(date,1,4)  
destring year, replace

* Compute average growth rates in Q1 and the remainder of each year
egen avg = mean(ann_growth), by(year q1)

keep if year>=2009 /// restrict the sample to crisis and post-crisis years

collapse (mean) ann_growth, by(year q1)

reshape wide ann, i(year) j(q1)

rename ann_growth1 ann_growthQ1  
rename ann_growth0 ann_growthRest  

Other ways to show the data

If you do not reshape the data, you can use over() over() twice, like this:

graph bar ann_growth if year >=2008, ///  
graphregion(color(white)) ///  
over(year,label(angle(45) labsize(small))) ///  
over(q_other, relabel(1 "Q1" 2 "Average of Q2-Q4")) ///  
ytitle("Real GDP growth (percent)") ///  
ylabel(, angle(horizontal)) ///  
title("Seasonally-adjusted GDP growth" "early vs. late in the year") ///  
nofill ///  
intensity(*.7)  
graph export gdp-over.png, width(1200) replace  

This gets us:

Combining over() and by() is a bit more complicated because I haven't seen a way to declare labels inside by(), so I labeled the groups before creating the chart:

label define qo 0 "First quarter" 1 "Other quarters"  
label values q_other qo

graph bar ann_growth if year >=2008, ///  
graphregion(color(white)) ///  
over(year,label(angle(45) labsize(small))) /// make the x-axis readable by changing the angle and decreasing font size  
by(q_other, cols(2) note("")) /// change to col(1) stacked exhibits are preferred; remove the default note on groups  
ytitle("Real GDP growth (percent)") ///  
ylabel(, angle(horizontal)) ///  
nofill ///  
intensity(*.7) /// stylistic  
graph export gdp.png, width(1200) replace  

This gives us:

Finally, we could show the above with a dot plot, which would need a bit more work:

graph dot ann_growth if year >=2008, ///  
over(q_other, relabel(1 "Q1" 2 "Q2-Q4")) ///  
over(year, label(angle(45) labsize(small))) ///  
ytitle("Real GDP growth (percent)") ///  
graphregion(color(white)) nofill  
*graph save Graph dot,gph
graph export dot2.png, width(1200) replace  
*marker(1,mcolor(purple))

It seems like a better idea to reshape the dataset, rather than using over() twice:

reshape wide ann, i(year) j(q1)  
rename ann_growth1 ann_growthQ1  
rename ann_growth0 ann_growthRest  
graph dot ann_growthQ1 ann_growthRest if year >=2009, ///  
over(year) ///  
ytitle("Real GDP growth (percent)") ///  
graphregion(color(white)) nofill ///  
marker(1,mcolor(purple*.5) msize(*1.5)) ///  
marker(2,mcolor(midgreen) msize(*1.5)) ///  
legend(label(1 "Q1 growth") label(2 "Q2-Q4 growth"))  
graph export dot-plot.png, width(1200) replace