# Chapter 10 Panel Data - Fixed Effects and some Random Effects

## 10.1 Seminar

In this seminar, you will be asked to work more on your own. Start by clearing your workspace and setting your working directory. We will then introduce the necessary R code for today using the example from the lecture. This will be brief and afterwards, you can analyse yourself whether more guns lead to less crime.

rm(list = ls())
setwd("Your directory")

We start by loading the resource curse data and checking the data with the str() function.

a <- read.csv("resourcecurse.csv")
str(a)
'data.frame':   876 obs. of  10 variables:
$country : Factor w/ 73 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...$ countrycode : Factor w/ 73 levels "AFG","ALB","ARG",..: 1 1 1 1 1 1 1 1 1 1 ...
$year : int 1996 1998 2000 2002 2003 2004 2005 2006 2007 2008 ...$ aid         : num  NA NA NA 1.15 1.21 ...
$oil : Factor w/ 532 levels "..","0","0.000156118640282417",..: 1 1 1 57 58 61 66 48 44 47 ...$ gdp.capita  : num  NA NA NA NA NA NA NA NA NA NA ...
$institutions: num -2.06 -2.09 -2.13 -1.75 -1.58 ...$ polity2     : int  -7 -7 -7 NA NA NA NA NA NA NA ...
$population : int 17822884 18863999 20093756 21979923 23064851 24118979 25070798 25893450 26616792 27294031 ...$ mortality   : num  106 104 104 103 104 ...

The oil variable is coded as a factor variable but it should be numeric. Missing values as coded as “..”. convert the variable to a numeric variable and drop missing values.

# recode missings
a$oil[which(a$oil=="..")] <- NA
# convert to numeric
a$oil <- as.numeric(a$oil)

To estimate panel data models, we need to install the plm package. You only need to do this once.

install.packages("plm")

Every time, we want to use the package (when we start a new R session), we load the plm library like so:

library(plm)
Loading required package: Formula

We log-transform gdp per capita and population size.

a$log.gdp <- log(a$gdp.capita)
a$log.pop <- log(a$population)

### 10.1.1 Our data

Variable Description
country country name
countrycode 3 letter country abbreviation
year
aid net aid flow (in per cent of GDP)
oil oil rents (in per cent of GDP)
gdp.capita GDP per capita in constant 2000 US dollars
institutions world governance indicator index for quality of institutions
polity2 polity IV project index
population
mortality rate (per 1000 live births)

We test the rentier states theory and the resource curse that we discussed in the lecture. It states that rentier capitalism can be a curse on the systemic level. States that extract rents from easily lootable resources instead of taxing their people develop institutions that become unresponsive to their citizens and provide less public goods. North and Weingast (academic heroes), for instance, relate the advent of democracy in Britain to the struggle for property rights.

### 10.1.2 Unit fixed effects (country fixed effects)

In class, our first fixed effects model was called m3. It was the unit fixed effects model. Recall, that the unit fixed effects model is the same as including dummy variables for all countries except the baseline country. Therefore, we control for all potential confounders that vary across countries but are constant over time (e.g., the colonial heritage of a country).

# run fixed effects model
m3 <- plm(
institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality,
data = a,
index = c("country", "year"),
model = "within",
effect = "individual"
)

# model output
summary(m3)
Oneway (individual) effect Within Model

Call:
plm(formula = institutions ~ oil + aid + log.gdp + polity2 +
log.pop + mortality, data = a, effect = "individual", model = "within",
index = c("country", "year"))

Unbalanced Panel: n = 58, T = 1-12, N = 672

Residuals:
Min.    1st Qu.     Median    3rd Qu.       Max.
-0.3936224 -0.0622048 -0.0019414  0.0580157  0.3903817

Coefficients:
Estimate   Std. Error t-value       Pr(>|t|)
oil       -0.000077706  0.000092452 -0.8405       0.400961
aid        0.002250157  0.000980402  2.2951       0.022065 *
log.gdp    0.190834199  0.032396694  5.8905 0.000000006374 ***
polity2    0.016004181  0.002707903  5.9102 0.000000005696 ***
log.pop   -0.190493863  0.070707709 -2.6941       0.007253 **
mortality  0.008294374  0.001553846  5.3380 0.000000132901 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    8.8269
Residual Sum of Squares: 7.3822
R-Squared:      0.16367
F-statistic: 19.8307 on 6 and 608 DF, p-value: < 0.000000000000000222

Similar to the F-test, we use the check whether country fixed effects explain any variation at all using the Lagrange Multiplier test.

# check for unit(country) fixed effects
plmtest(m3, effect="individual")

Lagrange Multiplier Test - (Honda) for unbalanced panels

data:  institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality
normal = 53.332, p-value < 0.00000000000000022
alternative hypothesis: significant effects

The null hypothesis is that country fixed effects do not have any effect and that would mean, statistically, that we could leave them out. However, in this case we reject the null hypothesis and hence we do need to control for country fixed effects.

### 10.1.3 Time fixed effects

We now estimate the time fixed effects model to illustrate how this would be done. However, we already know that we do need to include country fixed effects. Not estimating country fixed effects would be a mistake. The time fixed effects model does not include country fixed effects and, therefore, it makes that mistake. Generally, in the time fixed effects model, we control for all sources of confounding that vary over time but are constant across the units (the countries) such as technological change, for instance (you can argue whether technological change really affects all countries in our sample in the same way). The time fixed effects model includes a dummy variable for every time period except the baseline.

# time fixed effects model
m4 <- plm(
institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality,
data = a,
index = c("country", "year"),
model = "within",
effect = "time")

# model output time fixed effects
summary(m4)
Oneway (time) effect Within Model

Call:
plm(formula = institutions ~ oil + aid + log.gdp + polity2 +
log.pop + mortality, data = a, effect = "time", model = "within",
index = c("country", "year"))

Unbalanced Panel: n = 58, T = 1-12, N = 672

Residuals:
Min.   1st Qu.    Median   3rd Qu.      Max.
-1.196568 -0.282023 -0.028316  0.291527  0.865248

Coefficients:
Estimate  Std. Error t-value              Pr(>|t|)
oil       -0.00094474  0.00010632 -8.8855 < 0.00000000000000022 ***
aid        0.01147113  0.00307715  3.7278             0.0002099 ***
log.gdp    0.45007149  0.01913597 23.5197 < 0.00000000000000022 ***
polity2    0.03248425  0.00280650 11.5746 < 0.00000000000000022 ***
log.pop   -0.01333510  0.01052619 -1.2668             0.2056601
mortality  0.00360009  0.00119458  3.0137             0.0026806 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    457.29
Residual Sum of Squares: 106.31
R-Squared:      0.76752
F-statistic: 359.866 on 6 and 654 DF, p-value: < 0.000000000000000222

Notice that adjusted R^2 is much larger in the time fixed effects model than in the country fixed effects model. That does not mean that the time fixed effects model is better. In fact adjusted R^2 cannot be compared between country fixed effects and time fixed effects models. In the country fixed effects model, adjusted R^2 is the variation in the dependent variable that is explained by our independent variables that vary within in countries. It is the explained within country variation. In a time fixed effects model, adjusted R^2 gives us the explained within time variation.

The time fixed effects model gives us different results than the country fixed effects model. We don’t like the time fixed effects model here because we already saw that we need to include time fixed effects from the plmtest(). We can, however, check whether we need to include time fixed effects or put differently whether time fixed effects matter jointly. We do this using the plmtest() again.

# test for time fixed effects
plmtest(m4, effect="time")

Lagrange Multiplier Test - time effects (Honda) for unbalanced
panels

data:  institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality
normal = 1.5508, p-value = 0.06048
alternative hypothesis: significant effects

The test comes back insignificant. That means, statistically speaking, we do not need to control for time fixed effects to have a consistent model. The test gives you justification to stick with the country fixed effects model. But, we will ignore the test. In the country fixed effects model, we have 602 degrees of freedom. We can afford to estimate country fixed effects in addition. There, are 12 time periods (indicated by the capital T in the summary output) and you can verify this like so:

# frequency table of year (i.e., number of observations per period)
table(a$year)  1996 1998 2000 2002 2003 2004 2005 2006 2007 2008 2009 2010 73 73 73 73 73 73 73 73 73 73 73 73  # number of time periods length(table(a$year))
[1] 12

With 602 degrees of freedom, we can easily afford to estimate another 11 parameters (1 for each year where 1 year is the baseline category). Having 602 degrees of freedom is like having 602 free observations (that is a lot of information).

We do not make a mistake by controlling for potential confounders that vary across countries and are constant over time (unit fixed effects) and confounders that vary across time but are constant across units (time fixed effects). Therefore, we do that.

### 10.1.4 Twoway fixed effects

We now estimate the twoway fixed effects model. We control for all confounders that vary across units (countries) but are constant over time and we control for all confounders that vary over time but are constant across units.

# two-way fixed effects model
m5 <-  plm(
institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality,
data = a,
index = c("country", "year"),
model = "within",
effect = "twoways"
)

summary(m5)
Twoways effects Within Model

Call:
plm(formula = institutions ~ oil + aid + log.gdp + polity2 +
log.pop + mortality, data = a, effect = "twoways", model = "within",
index = c("country", "year"))

Unbalanced Panel: n = 58, T = 1-12, N = 672

Residuals:
Min.     1st Qu.      Median     3rd Qu.        Max.
-0.37357541 -0.06093757  0.00020216  0.05919668  0.45397954

Coefficients:
Estimate  Std. Error t-value            Pr(>|t|)
oil       0.000013209 0.000096684  0.1366            0.891380
aid       0.002925254 0.000984319  2.9719            0.003079 **
log.gdp   0.298727506 0.038019594  7.8572 0.00000000000001837 ***
polity2   0.016062925 0.002665367  6.0265 0.00000000293299449 ***
log.pop   0.016589819 0.080469078  0.2062            0.836733
mortality 0.004167650 0.001725965  2.4147            0.016049 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    8.3506
Residual Sum of Squares: 6.967
R-Squared:      0.16568
F-statistic: 19.7587 on 6 and 597 DF, p-value: < 0.000000000000000222

### 10.1.5 Serial correlation/auto-correlation

In a panel model, we always have serial correlation. Maybe always is an overstatement but just maybe. Serial correlation means that a variable at time t (let’s say 2000) and in country i (let’s say Greece) is related to its value at t-1 (in 1999). Anything that is path dependent would fall into this category. Surely, institutional quality is path dependent. There is a statistical test for auto-correlation but really your default assumption should be that auto-correlation is present.

Let’s carry out the test. The null hypothesis is that we do not have auto-correlation.

# Breusch-Godfrey test
pbgtest(m5)

Breusch-Godfrey/Wooldridge test for serial correlation in panel
models

data:  institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality
chisq = 229.21, df = 1, p-value < 0.00000000000000022
alternative hypothesis: serial correlation in idiosyncratic errors

Clearly, we do have auto-correlation, so we need to correct our standard errors. We need to libraries for this. First, sandwich and second, lmtest.

library(sandwich)
library(lmtest)

# heteroskedasticity and autocorrelation consistent standard errors
m5.hac <- coeftest(m5, vcov = vcovHC(m5, method = "arellano", type = "HC3"))
m5.hac

t test of coefficients:

Estimate  Std. Error t value Pr(>|t|)
oil       0.000013209 0.000148966  0.0887 0.929375
aid       0.002925254 0.001684554  1.7365 0.082989 .
log.gdp   0.298727506 0.132230608  2.2591 0.024234 *
polity2   0.016062925 0.006201084  2.5903 0.009822 **
log.pop   0.016589819 0.307818867  0.0539 0.957037
mortality 0.004167650 0.005529193  0.7538 0.451294
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The difference is noticeable. It is a mistake not to correct for serial correlation. The difference is that we now fail to reject the null hypothesis for the effect of aid.

### 10.1.6 Cross-sectional dependence/ spatial dependence

Spatial dependence is common in panel data sets but unlike serial correlation, it is not always present. Spatial correlation means that some units that cluster together (usually geographically) are affected by some external shock in the same way. For instance, the Arab Spring affected counties in the MENA region in the same way.

We test for cross-sectional dependence. If it exists, we need to correct for it. The null hypothesis is that we do not have spatial dependence.

# Peasaran test for cross-sectional dependence
pcdtest(m5)
Warning in pcdres(tres = tres, n = n, w = w, form =
paste(deparse(x\$formula)), : Some pairs of individuals (7 percent) do
not have any or just one time period in common and have been omitted from
calculation

Pesaran CD test for cross-sectional dependence in panels

data:  institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality
z = -2.2516, p-value = 0.02435
alternative hypothesis: cross-sectional dependence

The test comes back significant. Therefore, we need to adjust our standard errors for serial correlation, heteroskedasticity and spatial dependency.

Some political scientists like to estimate the so-called panel corrected standard errors (PCSE). In fact, Beck and Katz 1995 is one of the most cited political science papers of all time. However, Driscoll and Kraay (1998) propose standard errors that work even better in short panels (where we have few observations per unit). Their standard errors are sometimes called the SCC estimator. We correct for spatial correlation using SCC standard errors.

# Driscoll and Kraay SCC standard errors
m5.scc <- coeftest(m5, vcov = vcovSCC(m5, type = "HC3", cluster = "group"))
m5.scc

t test of coefficients:

Estimate  Std. Error t value Pr(>|t|)
oil       0.000013209 0.000143564  0.0920 0.926725
aid       0.002925254 0.001628816  1.7959 0.073010 .
log.gdp   0.298727506 0.133686296  2.2345 0.025817 *
polity2   0.016062925 0.005974345  2.6887 0.007374 **
log.pop   0.016589819 0.314543279  0.0527 0.957955
mortality 0.004167650 0.004991541  0.8349 0.404084
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is our final model. We find no evidence for hypothesis 1 and 2. Both oil and aid are unrelated to institutional quality (note that this is different from what you saw in the lecture. I had an error in the code. This version is correct.)

### 10.1.7 The random effects model

We show you the random effects model only because you see it applied often in political science. However, the model rests on an heroic assumption. Recall from our lecture, the random effects model assumes that the time invariant confounders are unrelated to our regressors. The assumption says: “There are no confounders. By assumption. Basta!” That’s unsatisfactory. In fact, this assumption will almost always be violated. The random effects model is weak from a causal inference standpoint. However, it tends to do well in prediction tasks where we are interested in predicting outcomes but don’t really care whether X is causally related to Y.

Let’s estimate the random effects model.

# random effects model
ran.effects <- plm(
institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality,
data = a,
index = c("country", "year"),
model = "random")

# model output
summary(ran.effects)
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)

Call:
plm(formula = institutions ~ oil + aid + log.gdp + polity2 +
log.pop + mortality, data = a, model = "random", index = c("country",
"year"))

Unbalanced Panel: n = 58, T = 1-12, N = 672

Effects:
var std.dev share
idiosyncratic 0.01214 0.11019 0.071
individual    0.15870 0.39837 0.929
theta:
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.7334  0.9204  0.9204  0.9194  0.9204  0.9204

Residuals:
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-0.42270 -0.06989 -0.00032  0.00070  0.08034  0.37430

Coefficients:
Estimate  Std. Error t-value              Pr(>|t|)
(Intercept) -1.33884120  0.62959170 -2.1265              0.033827 *
oil         -0.00021206  0.00009369 -2.2634              0.023933 *
aid          0.00206725  0.00103411  1.9991              0.046009 *
log.gdp      0.31213762  0.02902202 10.7552 < 0.00000000000000022 ***
polity2      0.01942826  0.00273234  7.1105  0.000000000002993849 ***
log.pop     -0.09216364  0.03199441 -2.8806              0.004097 **
mortality    0.01026414  0.00129407  7.9317  0.000000000000009172 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    11.862
Residual Sum of Squares: 9.0511
R-Squared:      0.23701
F-statistic: 34.4236 on 6 and 665 DF, p-value: < 0.000000000000000222

As mentioned, you will have an extremely hard time convincing anyone of a causal claim made based on a random effects model. However, sometimes you cannot estimate a fixed effects model. For instance, if you wish to estimate the effect of the electoral system on some outcome, you have the problem that the electoral system does not vary within countries (countries tend to choose an electoral system and stick with it). That means, you cannot estimate a unit-fixed effects model. You can however, estimate the random effects model in that case.

The absolute minimum hurdle that you need to pass to be allowed to use the random effects model is to carry out the Hausman test. The test assesses whether the errors are correlated with the X variables. It thus, tests the assumption that the random effects model is based on.

However, we have to caution against the Hausman test! The Hausman test does not take heteroskedastic errors into account and it does not take serial correlation into account. That’s a big problem. Even if the Hausman tests, confirms that the random effects model is consistent, it may be wrong. We should always be skeptical of the random effects model (when it’s used to make a causal claim).

Let’s run the Hausman test. Its null hypothesis is that the errors and the X’s are uncorrelated and hence the random effects model is consistent.

# hausman test
phtest(m5, ran.effects)

Hausman Test

data:  institutions ~ oil + aid + log.gdp + polity2 + log.pop + mortality
chisq = 136.39, df = 6, p-value < 0.00000000000000022
alternative hypothesis: one model is inconsistent

The Hausman test rejects the null hypothesis. The random effects model is inconsistent. You now have all the tools to carry out your own analysis. Go ahead and show us whether more guns lead to less crime or not.

### 10.1.8 More guns, less crime

More guns, less crime. This is the claim of an in(famous) book. It shows that violent crime rates in the United States decrease when gun ownership restrictions are relaxed. The data used in Lott’s research compares violent crimes, robberies, and murders across 50 states to determine whether the so called “shall” laws that remove discretion from license granting authorities actually decrease crime rates. So far 41 states have passed these “shall” laws where a person applying for a licence to carry a concealed weapon doesn’t have to provide justification or “good cause” for requiring a concealed weapon permit.

Load the guns.csv dataset directly into R by running the following line:

a <- read.csv("http://philippbroniecki.github.io/philippbroniecki.github.io/assets/data/guns.csv")

The data includes the following variables:

Variable Description
mur Murder rate (incidents per 100,000)
shall =1 if state has a shall-carry law in effect in that year, 0 otherwise
incarc rate Incarceration rate in the state in the previous year
(sentenced prisoners per 100,000 residents; value for the previous year)
pm1029 Percent of state population that is male, ages 10 to 29
stateid ID number of states (Alabama = , Alaska = 2, etc.)
year Year (1977 - 1999)

### 10.1.9 Question 1

Estimate the effect of shall using a simple linear model and interpret it.

summary(lm(mur~shall+incarc_rate+pm1029,data=a))

Call:
lm(formula = mur ~ shall + incarc_rate + pm1029, data = a)

Residuals:
Min      1Q  Median      3Q     Max
-18.020  -2.486  -0.161   2.123  40.141

Coefficients:
Estimate Std. Error t value             Pr(>|t|)
(Intercept) -26.676300   1.527265 -17.467 < 0.0000000000000002 ***
shall        -1.964093   0.316082  -6.214       0.000000000718 ***
incarc_rate   0.037136   0.000814  45.624 < 0.0000000000000002 ***
pm1029        1.641943   0.087414  18.784 < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.441 on 1169 degrees of freedom
Multiple R-squared:  0.6524,    Adjusted R-squared:  0.6515
F-statistic: 731.4 on 3 and 1169 DF,  p-value: < 0.00000000000000022

Answer: According to our simple linear model, lax gun laws reduce the murder rate. It decreases by roughly 2 incidents per 100,000.

### 10.1.10 Question 2

Estimate a unit fixed effects model and a random effects model. Are both models consistent? If not, which is the appropriate model? Use a consistent model to estimate the effect of the shall laws on the murder rate.

# panel data library
library(plm)

# fixed effects
m.fe <- plm(mur ~ shall + incarc_rate + pm1029,
data = a,
index = c("stateid", "year"),
model = "within",
effect = "individual")

# random effects
m.re <- plm(mur ~ shall + incarc_rate + pm1029,
data = a,
index = c("stateid", "year"),
model = "random")

# hausman test
phtest(m.fe, m.re)

Hausman Test

data:  mur ~ shall + incarc_rate + pm1029
chisq = 147.59, df = 3, p-value < 0.00000000000000022
alternative hypothesis: one model is inconsistent
# effect
summary(m.fe)
Oneway (individual) effect Within Model

Call:
plm(formula = mur ~ shall + incarc_rate + pm1029, data = a, effect = "individual",
model = "within", index = c("stateid", "year"))

Balanced Panel: n = 51, T = 23, N = 1173

Residuals:
Min.    1st Qu.     Median    3rd Qu.       Max.
-21.102428  -0.958945   0.016047   1.082008  29.031961

Coefficients:
Estimate Std. Error t-value              Pr(>|t|)
shall       -1.4513886  0.3154300 -4.6013           0.000004678 ***
incarc_rate  0.0174551  0.0011261 15.4998 < 0.00000000000000022 ***
pm1029       0.9582993  0.0859610 11.1481 < 0.00000000000000022 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    12016
Residual Sum of Squares: 9800
R-Squared:      0.18444
F-statistic: 84.3526 on 3 and 1119 DF, p-value: < 0.000000000000000222

Answer: The Hausman test shows that we reject the null hypothesis which states that both random effects model and fixed effects model are consistent. The unique errors ui are correlated with the regressors. Therefore, we must rely on the fixed effects model.

The effect of the shall laws has decreased slightly but is still significantly related to the murder rate. Lax gun laws reduce the murder rate by 1.45 incidents per 100,000.

### 10.1.11 Question 3

Think of a theoretical reason to control for time fixed effects (what confounding sources could bias our estimate of the shall laws?). Test for time fixed effects using the appropriate test. If time fixed effects are required, re-estimate the fixed effects model as a twoway fixed effects model and interpret the effect of lax gun laws.

m.tfe <- plm(
mur ~ shall + incarc_rate + pm1029,
data = a,
index = c("stateid", "year"),
model = "within",
effect = "time"
)

plmtest(m.tfe, effect = "time")

Lagrange Multiplier Test - time effects (Honda) for balanced
panels

data:  mur ~ shall + incarc_rate + pm1029
normal = 16.104, p-value < 0.00000000000000022
alternative hypothesis: significant effects
# twoway FE model
m.2wfe <- plm(
mur ~ shall + incarc_rate + pm1029,
data = a,
index = c("stateid", "year"),
model = "within",
effect = "twoway")
summary(m.2wfe)
Twoways effects Within Model

Call:
plm(formula = mur ~ shall + incarc_rate + pm1029, data = a, effect = "twoway",
model = "within", index = c("stateid", "year"))

Balanced Panel: n = 51, T = 23, N = 1173

Residuals:
Min.     1st Qu.      Median     3rd Qu.        Max.
-19.2097691  -0.9748749  -0.0069663   1.0119176  27.1354552

Coefficients:
Estimate Std. Error t-value              Pr(>|t|)
shall       -0.5640474  0.3325054 -1.6964             0.0901023 .
incarc_rate  0.0209756  0.0011252 18.6411 < 0.00000000000000022 ***
pm1029       0.7326357  0.2189770  3.3457             0.0008485 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    11263
Residual Sum of Squares: 8519.4
R-Squared:      0.24357
F-statistic: 117.746 on 3 and 1097 DF, p-value: < 0.000000000000000222

Answer: In the 90s, crime rates in inner cities dropped across many Western countries. This trend will have affected U.S. states in a relatively similar way. This source of confounding will be correlated with the murder rate. Such a strong theoretical foundation for confounding should be controlled for using time fixed effects independent of the test for time fixed effects.

We reject the null hypothesis - time fixed effects are insignificant (make no difference). We, therefore, control for time fixed effects to reduce omitted variable bias from sources that vary over time but are constant across states.

The effect of the shall laws is indistinguishable from zero (at the 0.05 alpha level). We conclude that the shall laws do not increase or decrease the murder rate.

### 10.1.12 Question 4

Correct the standard errors to account for heteroskedasticity and serial correlation. Does the conclusion regarding the effect of the shall laws change?

m.2wfe.hac <- coeftest(m.2wfe, vcov = vcovBK(m.2wfe, type = "HC3", cluster = "group"))
m.2wfe.hac

t test of coefficients:

Estimate Std. Error t value           Pr(>|t|)
shall       -0.5640474  0.7662556 -0.7361             0.4618
incarc_rate  0.0209756  0.0028249  7.4253 0.0000000000002254 ***
pm1029       0.7326357  0.5118496  1.4313             0.1526
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: The standard error more than doubled. Our substantive conclusion does not change: The shall laws have no effect on the murder rate in our sample.

### 10.1.13 Question 5

Test for cross-sectional dependence and if present, use the SSC estimator to correct for heteroskedasticity, serial correlation, and spatial dependence. Does our conclusion regarding the effect of the shall laws change?

# test for cross-sectional dependence
pcdtest(m.2wfe)

Pesaran CD test for cross-sectional dependence in panels

data:  mur ~ shall + incarc_rate + pm1029
z = 3.9121, p-value = 0.00009148
alternative hypothesis: cross-sectional dependence
# correct standard errors
m.2wfe.scc <- coeftest(m.2wfe, vcov = vcovSCC(m.2wfe, type = "HC3", cluster = "group"))
m.2wfe.scc

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)
shall       -0.564047   0.542698 -1.0393  0.29888
incarc_rate  0.020976   0.010321  2.0324  0.04236 *
pm1029       0.732636   0.551066  1.3295  0.18396
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: The effect of the shall laws remains insignificant. The standard error decreased slightly.

Overall, we find no evidence for the claim made in the book. Guns do not appear to decrease the number of violent crimes. There is also no evidence for the opposite effect.