3.3 Specification Analysis
Our analysis has been based on the assumption that the correct specification of the regression model is known to be: Y =X β+ε 3.3.1 Omission of Relevant Variables The true model: Y =X β+ε=X 1β1+X 2β2+ε The specified model: Y =X 1β1+ε
ˆ=(X ' X ) −1X ' Y β1111
′(X 1β1+X 2β2+ε) =(X 1X 1) X 1
'
−1
=β1+(X 1X 1) X 1X 2β2+(X 1X 1) X 1ε
' −1' ' −1'
ˆX ) =β+(X ' X ) −1X ' X β 1) E (β1111122
ˆ is Unless X 1' X 2=0, βbiased 1
Example:
⎧(x 2i −2)(y i −) *ˆ=++⇔−=(−) +⇒=Y βX βεY X βεβ⎪1222222
(x 2i −2) 2 ⎨
⎪*⎩Y =β1+X 2β2+X 3β3+ε⇔Y −=(X 2−2) β2+(X 3−3) β3+ε
ˆ=β+⇒β22
(x 2i −2)(x 3i −3)
(x
2i
−2)
2
β3+(x 2i −2) ε
2i
(x
−2)
2
ˆX ) =β+(x 2i −2)(x 3i −3) β ⇒E (β2232
(x −) 2i 2
=β2+
Cov (X 2, X 3) β3
Var (X 2)
① when X 2, X 3 are positively correlated, Cov (X 2, X 3) >0 & when β3>0,
ˆX ) >β, ⇒βˆis upper-biased. E (β222
② when X 2, X 3 are negatively correlated, Cov (X 2, X 3)
ˆis also upper-biased. β2
③ when Cov (X 2, X 3) >0 & when β30,
ˆis lower-biased. β2
2) e 1=M 1Y =M 1(X 1β1+X 2β2+ε) =M 1X 2β2+M 1ε e 1' e 1=β2' X 2' M 1X 2β2+(n −K 1) σ2
E (e 1' e 1X )=(n −K 1) σ2+β2' X 2' M 1X 2β2 ⎛e 1' e 1⎞22E ⎜X X M X =+' ' ≥σββσ ⎟22122⎜(n −K ) ⎟
1⎝⎠
s 2 is biased upward. ˆX ) =σ(X X ) 3) Var (β111
2
'
−1
1ˆX ) =σ2(X ′M X ) − Var (β1.2121ˆ∵Var (β1.2
⎛X 1' ⎞⎞2⎛X ) =σ⎜⎜⎟(X 1X 2) ⎟
X ' ⎝⎝2⎠⎠upper . left . submatrix
X 1' X 2⎞
⎟
X 2' X 2⎠upper . left . submatrix
−1−1
⎛X ' X
....................... =σ2⎜11
⎝X 2' X 1
=σ2[X 1' X 1−X 1' X 2(X 2' X 2) −1X 2' X 1]−1 =σ2(X 1' M 2X 1) −1(M 2=I −X 2(X 2' X 2) −1X 2)
ˆX ) −1−Var (βˆX ) −1=(1/σ2) X ' X (X ' X ) −1X ' X : A positive definite Var (β11.2122221matrix.
ˆ is biased, it has a smaller variance than βˆ. Although β11.2
For example, X 1, X 2 are each a single variable.
ˆX ) =σ2/s , s =(x −) 2 Var (β∑i 1111111ˆVar (β1.2
(x 1' x 2) 2
) X ) =σ/s 11(1−
x 1' x 1x 2' x 2
2
ˆ The more highly correlated x 1 and x 2 are, the larger the variance β1.2ˆ. So, it is possible that βˆ is a more precise estimator compared with β11
based on the mean-squared-error criterion
4) Conclusion:
When we omit relevant variables in regression, the estimator of either β1 or
σ2 will be biased. Since we can not estimate β1 and σ2 unbiasedly , we
can not test the hypothesis about β1 correctly. 3.3.2 Inclusion of Irrelevant Variables
The true model: Y =X 1β1+ε
The specified model: Y =X 1β1+X 2β2+ε (X 2: irrelevant variable)
1) βˆ=(X ′X ) −1X ′Y =(X ′X ) −1X ′(X 1
β+ε) E (βˆX ) =E ⎛⎜βˆ1⎜X ⎞⎟=(X ′X ) −1X ′X ⎛I ⎞⎛β⎞1β1=⎝βˆ2⎟(X ′X ) −1X ′X ⎜K 1⎠
⎝0⎟⎠β1=⎜1⎝0⎟⎠
∴E (βˆ1X ) =β1, E (βˆ2X ) =0=β2 2) Var (βˆ1X ) =σ2(X 1′X 1) −1 Var (βˆ1,2X ) =σ2(X 1′M 2X 1) −1 Var (βˆ1,2X ) >Var (βˆ1
X ) 3) e =MY =M (X 1β1+ε) =M ε (M =I −X (X ′X ) −1X ′, X =(X 1, X 2)) ∵MX =0⇒M (X 1, X 2) =0⇒MX 1=0
∴E (e ′e X ) =σ2(n −K ) ⇒E (
e ′e
n −K
X ) =σ2 (Unbiased) 4) Est . Var (βˆ1,2X ) >Est . Var (βˆ1X )
5) Extension
①σˆ21=e 1′e −K ˆ2
1/(n 1), σ2=e ′2
e 2/(n −K ) (n −K 1) σ
ˆ21σ
2
=ε′M 1ε/σ2∼χ2(n −K 1)
2ˆ2(n −K ) σ
σ
2
=ε′M ε/σ2∼χ2(n −K )
⎧⎛(n −K 1) σˆ12⎞24
ˆ=−⇒=σσX 2(n K ) Var (X ) 2/(n −K 1) ⎪Var ⎜⎟112
σ⎪⎝⎠
⎨2
⎛(n −K ) σ⎞ˆ2⎪24
ˆσσVar X 2(n K ) Var (X ) 2/(n −K ) =−⇒=⎜⎟22⎪σ⎝⎠⎩
2
ˆ12X )
In specified model, estimated variance is more volatile.
② t-test
In the correct model:
ˆ(j )
βˆ(j )
∼t (n −K 1)
In the specified model:
∼t (n −K )
Since
n −K
ˆ(j )
β∼t (n −K ) . So, if t-stat lies in [t α/2(n −K 1), t α/2(n −K )],
H 0will be accepted in the specified model, while it should be rejected in the correct model. ⇒loss power of t-test.
6) Conclusion:
① Exclusion of Relevant Variable (i). biased estimator
(ii). incorrect inference procedure. t , F ② Inclusion of Irrelevant Variable (i). loss of efficiency
ˆX ) =βThe specified model lose efficiency since E (βˆX ) Var (β
, while
increase except that when X 1′X 2=0
ˆX ) =Var (βˆX ) . ⇔X 1′X 1=X 1′M 2X 1, Var (β11,2(ii). loss of testing power except for X 1′X 2=0.
It would seem that one would generally want to “overfit” the model. From a
theoretical standpoint, the difficulty with this view is that the failure to use correct information is always costly. In this instance, the cost is the reduced precision of the estimates. As we have shown, the covariance matrix in the short regression (omittingX 2) is never large than the covariance matrix for the estimator obtained in the presence of superfluous variables. 3.3.3 A More General Test of Specification Error
The Ramsey RESET test (Regression Specification Error Test)
Ramsey has argued that various specification error listed above. (omitted
variable, incorrect functional form, correlation between X and ε) give rise to a nonzero ε vector.
H 0:ε∼N (0,σ2I n ) : no specification error H 1:ε∼N (µ, σ2I n ) µ≠0
Augmented Regression: Y =X β+Z α+ε. The test for specification error is then α=0
ˆ2, Y ˆ3, ) Z =(Y
Null hypothesis: no specification error Alternative hypothesis: specification error Shortcoming: Limited power
3.4 Choosing between Nonnested Models
Hypothesis Test: Y =X β+ε
H 0:R β=q H 1:R β≠q Exception:
(i). which of two possible sets of repressors is more appropriate (ii). whether a linear of log-linear model is more appropriate 3.4.1 An Encompassing Model
H 0:Y =X β+εH 1:Y =Z γ+ε
An artificial nesting of two models, i.e.“Supermodel”:
Y =++ωδ+ε
: the set of variables in X that are not in Z : the set of variables in Z that are not in X
ω : the variables that the models have in common Problem:
(i). F test: H 0:=0
Accept H 0⇒Model Y =X β+ε
Reject H 0⇒Model Y =++ωδ+ε
(ii). In a time-series setting, multicollinearity
3.4.2 The J Test (Davidson & MacKinnon, 1981) Model: Y =(1−α) X β+αZ γ+ε
H 0:α=0(against H 1:Y =Z γ+ε1) H 1:α=1(against H 0:Y =X β+ε) 1) Problem (i). nonlinearity
(ii).αcan not be estimated, we can only get (1−α) β & αγ.
J Test: t =αˆ/se (αˆ) ∼d
N (0,1) 2) Step
① X is the appropriate set of variables to explain Y
a. Y =Z γ+ε1 ....... Y
ˆ=Z γˆ b. Y =X β+λY
ˆ+ε ⇒λˆ c. Testing the significance of λ.
Accept λ=0, acceptH 0:Y =X β+ε, Z can not help to explain Y. Reject λ=0, acceptH 1:Y =Z γ+ε, Z can significantly help to explain Y. ②Z is the appropriate set of variables to explain Y
a. H =X β+εˆ0:Y 0, Y
=X βˆ
ˆ ˆλ+ε⇒λb. Y =Z γ+Y 1
c. Testing the significance of λ
Accept λ=0, ⇒ Accept H 1: X can not help to explain Y. Reject λ=0, ⇒ Accept H 0: X can significantly help to explain Y. ③ Accept Both of H 0 and H 1
The data is not sufficient to distinguish H 0 with H 1. Reject Both. Two sets of variables can not help to explain Y. Example:
H 0:C t =β1+β2Y t +β3Y t −1+ε H 1:C t =γ1+γ2Y t +γ3C t −1+ε
H 0: consumption responds to changes in income over two periods H 1: consumption responds to changes in income over many periods 3.4.3 Cox Test (1961,1962)
1) H
0: X is the appropriate set of variables
d
Q =N (0,1),
2
⋅b ′X ′M Z M X M Z Xb s X
v 01= 4
s ZX
22s Z s Z n n
c 01=ln[2]=ln[2]
2s ZX 2s X +(1/n ) b ′X ′M Z Xb
M Z =I −Z (Z ′Z ) −1Z ′ M X =I −X (X ′X ) −1Z ′ b =(X ′X ) −1X ′Y
2
s Z =
1
′e Z : mean-squared residual in the regression of Y on Z e Z n
s 21X =n
e ′X e X : mean-squared residual in the regression of Y on X s 2ZX
=s 2
1X +n
b ′X ′M Z Xb
2) H
0: Z is the appropriate set of variables
q =d
N (0,1) v 2
10=s Z ⋅
d ′Z ′M X M Z M X Zd
s 4
XZ
c n
s 2X 01=2ln[s 2]
XZ
s 22
+1XZ =s Z n d ′Z ′M X Zd
3.3 Specification Analysis
Our analysis has been based on the assumption that the correct specification of the regression model is known to be: Y =X β+ε 3.3.1 Omission of Relevant Variables The true model: Y =X β+ε=X 1β1+X 2β2+ε The specified model: Y =X 1β1+ε
ˆ=(X ' X ) −1X ' Y β1111
′(X 1β1+X 2β2+ε) =(X 1X 1) X 1
'
−1
=β1+(X 1X 1) X 1X 2β2+(X 1X 1) X 1ε
' −1' ' −1'
ˆX ) =β+(X ' X ) −1X ' X β 1) E (β1111122
ˆ is Unless X 1' X 2=0, βbiased 1
Example:
⎧(x 2i −2)(y i −) *ˆ=++⇔−=(−) +⇒=Y βX βεY X βεβ⎪1222222
(x 2i −2) 2 ⎨
⎪*⎩Y =β1+X 2β2+X 3β3+ε⇔Y −=(X 2−2) β2+(X 3−3) β3+ε
ˆ=β+⇒β22
(x 2i −2)(x 3i −3)
(x
2i
−2)
2
β3+(x 2i −2) ε
2i
(x
−2)
2
ˆX ) =β+(x 2i −2)(x 3i −3) β ⇒E (β2232
(x −) 2i 2
=β2+
Cov (X 2, X 3) β3
Var (X 2)
① when X 2, X 3 are positively correlated, Cov (X 2, X 3) >0 & when β3>0,
ˆX ) >β, ⇒βˆis upper-biased. E (β222
② when X 2, X 3 are negatively correlated, Cov (X 2, X 3)
ˆis also upper-biased. β2
③ when Cov (X 2, X 3) >0 & when β30,
ˆis lower-biased. β2
2) e 1=M 1Y =M 1(X 1β1+X 2β2+ε) =M 1X 2β2+M 1ε e 1' e 1=β2' X 2' M 1X 2β2+(n −K 1) σ2
E (e 1' e 1X )=(n −K 1) σ2+β2' X 2' M 1X 2β2 ⎛e 1' e 1⎞22E ⎜X X M X =+' ' ≥σββσ ⎟22122⎜(n −K ) ⎟
1⎝⎠
s 2 is biased upward. ˆX ) =σ(X X ) 3) Var (β111
2
'
−1
1ˆX ) =σ2(X ′M X ) − Var (β1.2121ˆ∵Var (β1.2
⎛X 1' ⎞⎞2⎛X ) =σ⎜⎜⎟(X 1X 2) ⎟
X ' ⎝⎝2⎠⎠upper . left . submatrix
X 1' X 2⎞
⎟
X 2' X 2⎠upper . left . submatrix
−1−1
⎛X ' X
....................... =σ2⎜11
⎝X 2' X 1
=σ2[X 1' X 1−X 1' X 2(X 2' X 2) −1X 2' X 1]−1 =σ2(X 1' M 2X 1) −1(M 2=I −X 2(X 2' X 2) −1X 2)
ˆX ) −1−Var (βˆX ) −1=(1/σ2) X ' X (X ' X ) −1X ' X : A positive definite Var (β11.2122221matrix.
ˆ is biased, it has a smaller variance than βˆ. Although β11.2
For example, X 1, X 2 are each a single variable.
ˆX ) =σ2/s , s =(x −) 2 Var (β∑i 1111111ˆVar (β1.2
(x 1' x 2) 2
) X ) =σ/s 11(1−
x 1' x 1x 2' x 2
2
ˆ The more highly correlated x 1 and x 2 are, the larger the variance β1.2ˆ. So, it is possible that βˆ is a more precise estimator compared with β11
based on the mean-squared-error criterion
4) Conclusion:
When we omit relevant variables in regression, the estimator of either β1 or
σ2 will be biased. Since we can not estimate β1 and σ2 unbiasedly , we
can not test the hypothesis about β1 correctly. 3.3.2 Inclusion of Irrelevant Variables
The true model: Y =X 1β1+ε
The specified model: Y =X 1β1+X 2β2+ε (X 2: irrelevant variable)
1) βˆ=(X ′X ) −1X ′Y =(X ′X ) −1X ′(X 1
β+ε) E (βˆX ) =E ⎛⎜βˆ1⎜X ⎞⎟=(X ′X ) −1X ′X ⎛I ⎞⎛β⎞1β1=⎝βˆ2⎟(X ′X ) −1X ′X ⎜K 1⎠
⎝0⎟⎠β1=⎜1⎝0⎟⎠
∴E (βˆ1X ) =β1, E (βˆ2X ) =0=β2 2) Var (βˆ1X ) =σ2(X 1′X 1) −1 Var (βˆ1,2X ) =σ2(X 1′M 2X 1) −1 Var (βˆ1,2X ) >Var (βˆ1
X ) 3) e =MY =M (X 1β1+ε) =M ε (M =I −X (X ′X ) −1X ′, X =(X 1, X 2)) ∵MX =0⇒M (X 1, X 2) =0⇒MX 1=0
∴E (e ′e X ) =σ2(n −K ) ⇒E (
e ′e
n −K
X ) =σ2 (Unbiased) 4) Est . Var (βˆ1,2X ) >Est . Var (βˆ1X )
5) Extension
①σˆ21=e 1′e −K ˆ2
1/(n 1), σ2=e ′2
e 2/(n −K ) (n −K 1) σ
ˆ21σ
2
=ε′M 1ε/σ2∼χ2(n −K 1)
2ˆ2(n −K ) σ
σ
2
=ε′M ε/σ2∼χ2(n −K )
⎧⎛(n −K 1) σˆ12⎞24
ˆ=−⇒=σσX 2(n K ) Var (X ) 2/(n −K 1) ⎪Var ⎜⎟112
σ⎪⎝⎠
⎨2
⎛(n −K ) σ⎞ˆ2⎪24
ˆσσVar X 2(n K ) Var (X ) 2/(n −K ) =−⇒=⎜⎟22⎪σ⎝⎠⎩
2
ˆ12X )
In specified model, estimated variance is more volatile.
② t-test
In the correct model:
ˆ(j )
βˆ(j )
∼t (n −K 1)
In the specified model:
∼t (n −K )
Since
n −K
ˆ(j )
β∼t (n −K ) . So, if t-stat lies in [t α/2(n −K 1), t α/2(n −K )],
H 0will be accepted in the specified model, while it should be rejected in the correct model. ⇒loss power of t-test.
6) Conclusion:
① Exclusion of Relevant Variable (i). biased estimator
(ii). incorrect inference procedure. t , F ② Inclusion of Irrelevant Variable (i). loss of efficiency
ˆX ) =βThe specified model lose efficiency since E (βˆX ) Var (β
, while
increase except that when X 1′X 2=0
ˆX ) =Var (βˆX ) . ⇔X 1′X 1=X 1′M 2X 1, Var (β11,2(ii). loss of testing power except for X 1′X 2=0.
It would seem that one would generally want to “overfit” the model. From a
theoretical standpoint, the difficulty with this view is that the failure to use correct information is always costly. In this instance, the cost is the reduced precision of the estimates. As we have shown, the covariance matrix in the short regression (omittingX 2) is never large than the covariance matrix for the estimator obtained in the presence of superfluous variables. 3.3.3 A More General Test of Specification Error
The Ramsey RESET test (Regression Specification Error Test)
Ramsey has argued that various specification error listed above. (omitted
variable, incorrect functional form, correlation between X and ε) give rise to a nonzero ε vector.
H 0:ε∼N (0,σ2I n ) : no specification error H 1:ε∼N (µ, σ2I n ) µ≠0
Augmented Regression: Y =X β+Z α+ε. The test for specification error is then α=0
ˆ2, Y ˆ3, ) Z =(Y
Null hypothesis: no specification error Alternative hypothesis: specification error Shortcoming: Limited power
3.4 Choosing between Nonnested Models
Hypothesis Test: Y =X β+ε
H 0:R β=q H 1:R β≠q Exception:
(i). which of two possible sets of repressors is more appropriate (ii). whether a linear of log-linear model is more appropriate 3.4.1 An Encompassing Model
H 0:Y =X β+εH 1:Y =Z γ+ε
An artificial nesting of two models, i.e.“Supermodel”:
Y =++ωδ+ε
: the set of variables in X that are not in Z : the set of variables in Z that are not in X
ω : the variables that the models have in common Problem:
(i). F test: H 0:=0
Accept H 0⇒Model Y =X β+ε
Reject H 0⇒Model Y =++ωδ+ε
(ii). In a time-series setting, multicollinearity
3.4.2 The J Test (Davidson & MacKinnon, 1981) Model: Y =(1−α) X β+αZ γ+ε
H 0:α=0(against H 1:Y =Z γ+ε1) H 1:α=1(against H 0:Y =X β+ε) 1) Problem (i). nonlinearity
(ii).αcan not be estimated, we can only get (1−α) β & αγ.
J Test: t =αˆ/se (αˆ) ∼d
N (0,1) 2) Step
① X is the appropriate set of variables to explain Y
a. Y =Z γ+ε1 ....... Y
ˆ=Z γˆ b. Y =X β+λY
ˆ+ε ⇒λˆ c. Testing the significance of λ.
Accept λ=0, acceptH 0:Y =X β+ε, Z can not help to explain Y. Reject λ=0, acceptH 1:Y =Z γ+ε, Z can significantly help to explain Y. ②Z is the appropriate set of variables to explain Y
a. H =X β+εˆ0:Y 0, Y
=X βˆ
ˆ ˆλ+ε⇒λb. Y =Z γ+Y 1
c. Testing the significance of λ
Accept λ=0, ⇒ Accept H 1: X can not help to explain Y. Reject λ=0, ⇒ Accept H 0: X can significantly help to explain Y. ③ Accept Both of H 0 and H 1
The data is not sufficient to distinguish H 0 with H 1. Reject Both. Two sets of variables can not help to explain Y. Example:
H 0:C t =β1+β2Y t +β3Y t −1+ε H 1:C t =γ1+γ2Y t +γ3C t −1+ε
H 0: consumption responds to changes in income over two periods H 1: consumption responds to changes in income over many periods 3.4.3 Cox Test (1961,1962)
1) H
0: X is the appropriate set of variables
d
Q =N (0,1),
2
⋅b ′X ′M Z M X M Z Xb s X
v 01= 4
s ZX
22s Z s Z n n
c 01=ln[2]=ln[2]
2s ZX 2s X +(1/n ) b ′X ′M Z Xb
M Z =I −Z (Z ′Z ) −1Z ′ M X =I −X (X ′X ) −1Z ′ b =(X ′X ) −1X ′Y
2
s Z =
1
′e Z : mean-squared residual in the regression of Y on Z e Z n
s 21X =n
e ′X e X : mean-squared residual in the regression of Y on X s 2ZX
=s 2
1X +n
b ′X ′M Z Xb
2) H
0: Z is the appropriate set of variables
q =d
N (0,1) v 2
10=s Z ⋅
d ′Z ′M X M Z M X Zd
s 4
XZ
c n
s 2X 01=2ln[s 2]
XZ
s 22
+1XZ =s Z n d ′Z ′M X Zd