A First Course in Probability (10th Edition)
A First Course in Probability (10th Edition)
10th Edition
ISBN: 9780134753119
Author: Sheldon Ross
Publisher: PEARSON
Bartleby Related Questions Icon

Related questions

Question

Please answer question (b)

Calculus Perspective of Normal Equations
3. In the lecture, we discussed a geometric argument to get the least squares estimator.
Based on the properties of orthogonality, we can obtain the normal equations below:
X' (Y − X0) = 0.
We can rearrange the equation to solve for when X is full column rank.
Ô = (XTX)-1XTY.
Here, we are using X to denote the design matrix:
X=
=
and
1 1,1 X1,2
1 X2,1
X2,2
:
:
1 xn,1 Xn,2
X1.P
x2.p
⠀
In.p
ē =
1
= 1 x1
where 1 is the vector of all 1s of length n and x, is the n-vector
it is the jth feature vector.
To build intuition for these equations and relate them to the SLR estimating equations,
we will derive them algebraically using calculus.
n
n
X2
1
(a) Show that finding the optimal estimator 0 by solving the normal equa- tions is
equivalent to requiring that the residual vector e = Y-X0^should average to zero,
and the residual vector e should be orthogonal to X¡ for every j. That is, show
that the matrix form of normal equation can be written as:
Xp
eį = 0
I1.j
In other words,
Inj.
we = Σvijei = 0
for all j = 1,..., p. (Hint: Expand the normal equation above and perform matrix
multiplication for the first few terms. Can you find a pattern?)
expand button
Transcribed Image Text:Calculus Perspective of Normal Equations 3. In the lecture, we discussed a geometric argument to get the least squares estimator. Based on the properties of orthogonality, we can obtain the normal equations below: X' (Y − X0) = 0. We can rearrange the equation to solve for when X is full column rank. Ô = (XTX)-1XTY. Here, we are using X to denote the design matrix: X= = and 1 1,1 X1,2 1 X2,1 X2,2 : : 1 xn,1 Xn,2 X1.P x2.p ⠀ In.p ē = 1 = 1 x1 where 1 is the vector of all 1s of length n and x, is the n-vector it is the jth feature vector. To build intuition for these equations and relate them to the SLR estimating equations, we will derive them algebraically using calculus. n n X2 1 (a) Show that finding the optimal estimator 0 by solving the normal equa- tions is equivalent to requiring that the residual vector e = Y-X0^should average to zero, and the residual vector e should be orthogonal to X¡ for every j. That is, show that the matrix form of normal equation can be written as: Xp eį = 0 I1.j In other words, Inj. we = Σvijei = 0 for all j = 1,..., p. (Hint: Expand the normal equation above and perform matrix multiplication for the first few terms. Can you find a pattern?)
(b) Remember that the (empirical) MSE for multiple linear regression is
MSE (0)
1
==
n
n
Σ(Yi-00 - 01x₁,1 - - Op.xi.p)²
i=1
Use calculus to show that any 0 = [00,01,0p] that minimizes the MSE must
solve the normal equations.
(Hint: Recall that, at a minimum of MSE, the partial derivatives of MSE with
respect to every ; must all be zero. Find these partial derivatives and compare
them to your answer in Q3a.)
Remark: The two subparts above again together show that the geometric perspec-
tive is equivalent to the calculus approach of solving derivative and setting it to 0
for OLS. This is a desirable property of a linear model with L2 loss, and it generally
does not hold true for other models and loss types. We hope these exercises clear
up some mysteries about the geometric derivation!
expand button
Transcribed Image Text:(b) Remember that the (empirical) MSE for multiple linear regression is MSE (0) 1 == n n Σ(Yi-00 - 01x₁,1 - - Op.xi.p)² i=1 Use calculus to show that any 0 = [00,01,0p] that minimizes the MSE must solve the normal equations. (Hint: Recall that, at a minimum of MSE, the partial derivatives of MSE with respect to every ; must all be zero. Find these partial derivatives and compare them to your answer in Q3a.) Remark: The two subparts above again together show that the geometric perspec- tive is equivalent to the calculus approach of solving derivative and setting it to 0 for OLS. This is a desirable property of a linear model with L2 loss, and it generally does not hold true for other models and loss types. We hope these exercises clear up some mysteries about the geometric derivation!
Expert Solution
Check Mark
Knowledge Booster
Background pattern image
Recommended textbooks for you
Text book image
A First Course in Probability (10th Edition)
Probability
ISBN:9780134753119
Author:Sheldon Ross
Publisher:PEARSON
Text book image
A First Course in Probability
Probability
ISBN:9780321794772
Author:Sheldon Ross
Publisher:PEARSON