Using STATA: freduse command, reshape dataset, split years, generate decade-level observations

 •  Filed under Stata - tips and tricks

Question: Is GDP growth systematically slower in the first half of a typical year? (We already know that a "Q1-effect" exists, as Justin Wolfers has shown.)

This STATA assignment can be broken down into the following chunks:

  • Download real GDP growth data for the U.S.
  • Calculate annualized growth rates
  • Compare GDP growth in the first vs. second half of the year
  • Check for differences across years and decades


Get the data and declare time-series structure:

freduse GDPC1  
gen dd=yq(year(daten), quarter(daten))  
tsset dd, quarterly  

Compute annualized quarterly growth:
gen ann_growth = ((GDPC1/l.GDPC1)^4-1)*100

Generate variables for observation tagging:

gen month = substr(date,6,2)  
gen H1 = (month == "01" | month == "04")  
label var H1 "First half of the year"  
gen year = substr(date,1,4)  
destring year, replace  

Compute average growth rates in the first and second half of each year

egen avg = mean(ann_growth), by(year H1)  
collapse (mean) ann_growth, by(year H1)`  

This is now a long dataset.

For each row to correspond to ONE year, make the dataset wide, using reshape:

reshape wide ann_growth, i(year) j(H1)

For each year, we now how two variables (ann_growth1 is the average growth rate in the first half of the year, ann_growth0 comes from those cells that previously corresponded to H1==0.

Set a new indicator variable equal to 1 if growth in H1 is slower than growth in H2 in a given year:

gen H1Slower = (ann_growth1 < ann_growth0) if !missing(ann_growth0)

The if statement is necessary, because missing values are treated as positive infinity; without if !missing(ann_growth0) we would be asserting that `H1Slower==1' in 2016, even though the data has not come out yet.

Restrict the time period: keep if year >=1960

Label decades starting in 1960:

gen decade=year

recode decade (1960/1969 = 1960) (1970/1979 = 1970) (1980/1989 = 1980) (1990/1999 = 1990) (2000/2009 = 2000) (2010/2016 = 2010)

Now calculate some quantities of interest, for example the number of times H1 growth was slower than H2 in each decate: tabstat H1Slower, by(decade) stat(sum)

Or count the number of times H1 growth was slower than H2 since 2000
tabstat H1Slower if year>=2000, stat(sum)

Or calculate the share of such years, by decade
tabstat H1Slower, by(decade) stat(mean)

Summary for variables: H1SlowerThanRest by categories of: decade

decade |      mean  
    1960 |        .4
    1970 |        .2
    1980 |        .7
    1990 |        .5
    2000 |        .4
    2010 |        .5
   Total |  .4464286

Conclusion: There is no "H1-effect". Seasonally-adjusted GDP growth is not typically slower than average growth in Q3-Q4.