5-merged
.pdf
keyboard_arrow_up
School
Texas A&M University, Kingsville *
*We aren’t endorsed by this school
Course
FDD
Subject
Mathematics
Date
May 14, 2024
Type
Pages
14
Uploaded by MinisterDolphinPerson1117 on coursehero.com
AMATH 383 Fall 2023
Problem Set 5
Last Problem Set!!
Due: Monday 11/20 at 11:59pm
Note: Submit electronically to Gradescope. Please show all work.
1.
(30 points) Hessians and linear compositions.
Recall that the Hessian of a function is a (generally symmetric) matrix
vector of second partial derivatives:
∇
2
f
(
β
) =
∂
2
f
∂β
1
∂β
1
. . .
∂
2
f
∂β
1
∂β
n
.
.
.
∂
2
f
∂β
n
∂β
1
. . .
∂
2
f
∂β
n
∂β
n
Recall our gradient chain rule for compositions: if
g
(
β
) =
f
(
Xβ
−
y
)
then
∇
g
(
β
) =
X
T
∇
f
(
Xβ
−
y
)
.
This rule also extends nicely for hessian computation:
∇
2
g
(
β
) =
X
T
∇
2
f
(
Xβ
−
y
)
X.
•
(a) Compute the Hessian of
f
(
β
) =
1
2
∥
Xβ
−
y
∥
2
2
+
λ
2
∥
β
∥
2
.
∇
f
(
β
) =
X
T
∇
(
1
2
∥
Xβ
−
y
∥
2
) +
λβ
=
X
T
(
Xβ
−
y
) +
λβ
=
X
T
Xβ
−
X
T
y
+
λβ
∇
2
f
(
β
) =
X
T
X
+
λI
1
•
(b) Compute the Hessian of
f
(
β
) =
m
X
i
=1
exp(
x
T
i
β
)
−
y
i
x
T
i
β
∇
f
(
β
) =
∑
m
i
=1
x
i
exp
(
x
T
i
β
)
−
(
y
i
x
i
)
∇
2
f
(
β
) =
∑
m
i
=1
x
i
exp
(
x
T
i
β
)
x
T
i
•
(c) Compute the Hessian of
f
(
β
) =
m
X
i
=1
log(1 + exp(
x
T
i
β
))
−
y
i
x
T
i
β.
Let
f
(
z
) =
log
(1 +
exp
(
z
)) where
z
=
x
T
i
β
Then
f
′
(
z
) =
exp
(
z
)
1+
exp
(
z
)
f
′′
(
z
) =
exp
(
z
)
(
exp
(
z
)+1)
2
Then
∇
f
(
β
) =
∑
m
i
=1
1
1+
exp
(
x
T
i
β
)
x
i
exp
(
x
T
i
β
)
−
y
i
x
i
∇
2
f
(
β
) =
∑
m
i
=1
x
i
(
exp
(
x
T
i
β
)
(1+
exp
(
x
T
i
β
))
2
)
x
T
i
•
(d) Compute the Hessian of
f
(
β
) =
m
X
i
=1
log(1 + (
x
T
i
β
)
2
/ν
)
Let
f
(
z
) =
log
(1 +
z
2
ν
) where
z
=
x
T
i
β
Then
f
′
(
z
) =
2
z
z
2
+
ν
f
′′
(
z
) =
−
2(
z
2
−
ν
)
(
z
2
+
ν
)
2
Then
∇
2
f
(
β
) =
∑
m
i
=1
−
x
i
(
2((
x
T
i
β
)
2
−
ν
)
((
x
T
i
β
)
2
+
ν
)
2
)
x
T
i
2.
(20 points) Show that Newton’s method finds the solution to any
regularized linear least squares problem in one step.
The minimum of
f
(
β
) needs to be found.
Consider the linear least squares problem
f
(
β
) =
∥
Xβ
−
y
∥
2
2
+
λ
2
∥
β
∥
2
for some
c
and
λ
.
∇
f
(
β
) = 2
X
T
Xβ
−
X
T
y
+
λβ
∇
2
f
(
β
) = 2
X
T
X
+
λI
Then
β
1
=
β
0
−
(2
X
T
X
+
λI
)
−
1
(2
X
T
Xβ
0
−
X
T
y
+
λβ
0
)
β
1
=
β
0
−
(2
X
T
X
+
λI
)
−
1
((2
X
T
X
+
λI
)
β
0
−
X
T
y
)
⇒
β
1
=
β
0
−
β
0
+ (2
X
T
X
+
λI
)
−
1
X
T
y
) = (2
X
T
X
+
λI
)
−
1
X
T
y
)
2
To find the true solution
∇
f
(
β
) should be set to zero so
2
X
T
Xβ
−
X
T
y
+
λβ
= 0
⇒
(2
X
T
X
+
λ
)
β
=
X
T
y
β
= (2
X
T
X
+
λ
)
−
1
X
T
y
=
β
1
3.
(30 points) Recall that gradient descent to minimize
f
(
x
) takes the
form
β
k
+1
=
β
k
−
α
k
∇
f
(
β
k
)
while Newton’s method takes the form
β
k
+1
=
β
k
−
α
k
H
−
1
k
∇
f
(
β
k
)
where
H
k
can be the Hessian
H
k
=
∇
2
f
(
β
k
) or a suitable approxi-
mation, and
α
k
is a suitable step that guarantees descent of function
values.
Using the starter code for your homework, finish implementing these
to methods (look for TODO items in the Python code).
Make sure
hw1.supp is in the same directory as your main python file.
4.
(50 points) Implement binomial and Poisson regression methods
and run the cells in the coding notebook that produce the plots. Please
save your final notebook as a pdf that includes the plots and printouts
that are automatically generated by the notebook.
To make things a bit easier, you can go ahead and always use the line
search with the option
use_line_search=True
in the call, instead of
solving for an upper bound in case of logistic regression. I’ll discuss
that part in class.
3
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
1/11
AMATH 515 Homework 1
Due Date: 01/27/2021
Homework Instruction
: Please fill in the gaps in this template where commented as TODO
.
When submitting, make sure your file is still named 515Hw1_Coding.ipynb
-- that's what
Gradescope will be looking for. There is no need to submit hw1_supp.py
, it will already be
on the server when you submit your work. You'll have 10 attempts
to pass the tests.
# require numpy module
import
numpy as
np
from
numpy.linalg import
norm
from
numpy.linalg import
solve
import
matplotlib.pyplot as
plt
# import supplementary functions for Homework 1
import
sys
sys
.
path
.
insert
(
0
, './'
)
from
hw1_supp import
*
# You can change this seed if you want to test that your program works with differe
# but please set it back to 123 when you submit your work.
seed =
123
Gradient Descent Solver
Recall the gradient descent algorithm we learned in class and complete the gradient descent
solver.
def
optimizeWithGD
(
x0
, func
, grad
, step_size
, tol
=
1e-6
, max_iter
=
1000
, use_line_sea
"""
Optimize with Gradient Descent
input
-----
x0 : array_like
Starting point for the solver.
func : function
Takes x and return the function value.
grad : function
Takes x and returns the gradient of "func".
step_size : float or None
If it is a float number and `use_line_search=False`, it will be used as the
Otherwise, line search will be used
tol : float, optional
Gradient tolerance for terminating the solver.
max_iter : int, optional
Maximum number of iterations for terminating the solver.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
11/21/23, 6:12 PM
383Hw5_Coding
file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html
2/11
use_line_search : bool, optional
When it is true a line search will be used, other wise `step_size` has to b
output
------
x : array_like
Final solution
obj_his : array_like
Objective function's values convergence history
err_his : array_like
Norm of gradient convergence history
exit_flag : int
0, norm of gradient below `tol`
1, exceed maximum number of iteration
2, line search fail
3, other
"""
# safeguard
if
not
use_line_search and
step_size is
None
:
print
(
'Please specify the step_size or use the line search.'
)
return
x0
, np
.
array
([]), np
.
array
([]), 3
# initial step
x =
np
.
copy
(
x0
)
g =
grad
(
x
)
#
obj =
func
(
x
)
err =
norm
(
g
)
#
obj_his =
np
.
zeros
(
max_iter +
1
)
err_his =
np
.
zeros
(
max_iter +
1
)
#
obj_his
[
0
] =
obj
err_his
[
0
] =
err
# start iterations
iter_count =
0
while
err >=
tol
:
if
use_line_search
:
step_size =
lineSearch
(
x
, g
, g
, func
)
#
# if line search fail step_size will be None
if
step_size is
None
:
print
(
'Gradient descent line search fail.'
)
return
x
, obj_his
[:
iter_count
+
1
], err_his
[:
iter_count
+
1
], 2
#
# gradient descent step
#####
# TODO: with given step_size, complete gradient descent step
x =
x -
step_size
*
g
#####
#
# update function and gradient
g =
grad
(
x
)
#
obj =
func
(
x
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help