# Using STATA: freduse command, reshape dataset, split years, generate decade-level observations

Reading Time:

Reading Time:

**Question:** Is GDP growth systematically slower in the first half of a typical year? (We already know that a "Q1-effect" exists, as Justin Wolfers has shown.)

This STATA assignment can be broken down into the following chunks:

- Download real GDP growth data for the U.S.
- Calculate annualized growth rates
- Compare GDP growth in the first vs. second half of the year
- Check for differences across years and decades

**Soultion:**

Get the data and declare time-series structure:

```
freduse GDPC1
gen dd=yq(year(daten), quarter(daten))
tsset dd, quarterly
```

Compute annualized quarterly growth:

`gen ann_growth = ((GDPC1/l.GDPC1)^4-1)*100`

Generate variables for observation tagging:

```
gen month = substr(date,6,2)
gen H1 = (month == "01" | month == "04")
label var H1 "First half of the year"
gen year = substr(date,1,4)
destring year, replace
```

Compute average growth rates in the first and second half of each year

```
egen avg = mean(ann_growth), by(year H1)
collapse (mean) ann_growth, by(year H1)`
```

This is now a long dataset.

For each row to correspond to ONE year, make the dataset wide, using `reshape`

:

`reshape wide ann_growth, i(year) j(H1)`

For each year, we now how two variables (`ann_growth1`

is the average growth rate in the first half of the year, `ann_growth0`

comes from those cells that previously corresponded to `H1==0`

.

Set a new indicator variable equal to 1 if growth in H1 is slower than growth in H2 in a given year:

`gen H1Slower = (ann_growth1 < ann_growth0) if !missing(ann_growth0)`

The if statement is necessary, because missing values are treated as positive infinity; without `if !missing(ann_growth0)`

we would be asserting that `H1Slower==1' in 2016, even though the data has not come out yet.

Restrict the time period:

`keep if year >=1960`

Label decades starting in 1960:

`gen decade=year`

`recode decade (1960/1969 = 1960) (1970/1979 = 1970) (1980/1989 = 1980) (1990/1999 = 1990) (2000/2009 = 2000) (2010/2016 = 2010)`

Now calculate some quantities of interest, for example the number of times H1 growth was slower than H2 in each decate: `tabstat H1Slower, by(decade) stat(sum)`

Or count the number of times H1 growth was slower than H2 since 2000

`tabstat H1Slower if year>=2000, stat(sum)`

Or calculate the share of such years, by decade

`tabstat H1Slower, by(decade) stat(mean)`

`Summary for variables: H1SlowerThanRest by categories of: decade`

```
decade | mean
1960 | .4
1970 | .2
1980 | .7
1990 | .5
2000 | .4
2010 | .5
---------+----------
Total | .4464286
--------------------
```

**Conclusion:** There is no "H1-effect". Seasonally-adjusted GDP growth is not typically slower than average growth in Q3-Q4.