5-merged

.pdf

School

Texas A&M University, Kingsville *

*We aren’t endorsed by this school

Course

FDD

Subject

Mathematics

Date

May 14, 2024

Type

pdf

Pages

14

Uploaded by MinisterDolphinPerson1117 on coursehero.com

AMATH 383 Fall 2023 Problem Set 5 Last Problem Set!! Due: Monday 11/20 at 11:59pm Note: Submit electronically to Gradescope. Please show all work. 1. (30 points) Hessians and linear compositions. Recall that the Hessian of a function is a (generally symmetric) matrix vector of second partial derivatives: 2 f ( β ) = 2 f ∂β 1 ∂β 1 . . . 2 f ∂β 1 ∂β n . . . 2 f ∂β n ∂β 1 . . . 2 f ∂β n ∂β n Recall our gradient chain rule for compositions: if g ( β ) = f ( y ) then g ( β ) = X T f ( y ) . This rule also extends nicely for hessian computation: 2 g ( β ) = X T 2 f ( y ) X. (a) Compute the Hessian of f ( β ) = 1 2 y 2 2 + λ 2 β 2 . f ( β ) = X T ( 1 2 y 2 ) + λβ = X T ( y ) + λβ = X T X T y + λβ 2 f ( β ) = X T X + λI 1
(b) Compute the Hessian of f ( β ) = m X i =1 exp( x T i β ) y i x T i β f ( β ) = m i =1 x i exp ( x T i β ) ( y i x i ) 2 f ( β ) = m i =1 x i exp ( x T i β ) x T i (c) Compute the Hessian of f ( β ) = m X i =1 log(1 + exp( x T i β )) y i x T i β. Let f ( z ) = log (1 + exp ( z )) where z = x T i β Then f ( z ) = exp ( z ) 1+ exp ( z ) f ′′ ( z ) = exp ( z ) ( exp ( z )+1) 2 Then f ( β ) = m i =1 1 1+ exp ( x T i β ) x i exp ( x T i β ) y i x i 2 f ( β ) = m i =1 x i ( exp ( x T i β ) (1+ exp ( x T i β )) 2 ) x T i (d) Compute the Hessian of f ( β ) = m X i =1 log(1 + ( x T i β ) 2 ) Let f ( z ) = log (1 + z 2 ν ) where z = x T i β Then f ( z ) = 2 z z 2 + ν f ′′ ( z ) = 2( z 2 ν ) ( z 2 + ν ) 2 Then 2 f ( β ) = m i =1 x i ( 2(( x T i β ) 2 ν ) (( x T i β ) 2 + ν ) 2 ) x T i 2. (20 points) Show that Newton’s method finds the solution to any regularized linear least squares problem in one step. The minimum of f ( β ) needs to be found. Consider the linear least squares problem f ( β ) = y 2 2 + λ 2 β 2 for some c and λ . f ( β ) = 2 X T X T y + λβ 2 f ( β ) = 2 X T X + λI Then β 1 = β 0 (2 X T X + λI ) 1 (2 X T 0 X T y + λβ 0 ) β 1 = β 0 (2 X T X + λI ) 1 ((2 X T X + λI ) β 0 X T y ) β 1 = β 0 β 0 + (2 X T X + λI ) 1 X T y ) = (2 X T X + λI ) 1 X T y ) 2
To find the true solution f ( β ) should be set to zero so 2 X T X T y + λβ = 0 (2 X T X + λ ) β = X T y β = (2 X T X + λ ) 1 X T y = β 1 3. (30 points) Recall that gradient descent to minimize f ( x ) takes the form β k +1 = β k α k f ( β k ) while Newton’s method takes the form β k +1 = β k α k H 1 k f ( β k ) where H k can be the Hessian H k = 2 f ( β k ) or a suitable approxi- mation, and α k is a suitable step that guarantees descent of function values. Using the starter code for your homework, finish implementing these to methods (look for TODO items in the Python code). Make sure hw1.supp is in the same directory as your main python file. 4. (50 points) Implement binomial and Poisson regression methods and run the cells in the coding notebook that produce the plots. Please save your final notebook as a pdf that includes the plots and printouts that are automatically generated by the notebook. To make things a bit easier, you can go ahead and always use the line search with the option use_line_search=True in the call, instead of solving for an upper bound in case of logistic regression. I’ll discuss that part in class. 3
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 1/11 AMATH 515 Homework 1 Due Date: 01/27/2021 Homework Instruction : Please fill in the gaps in this template where commented as TODO . When submitting, make sure your file is still named 515Hw1_Coding.ipynb -- that's what Gradescope will be looking for. There is no need to submit hw1_supp.py , it will already be on the server when you submit your work. You'll have 10 attempts to pass the tests. # require numpy module import numpy as np from numpy.linalg import norm from numpy.linalg import solve import matplotlib.pyplot as plt # import supplementary functions for Homework 1 import sys sys . path . insert ( 0 , './' ) from hw1_supp import * # You can change this seed if you want to test that your program works with differe # but please set it back to 123 when you submit your work. seed = 123 Gradient Descent Solver Recall the gradient descent algorithm we learned in class and complete the gradient descent solver. def optimizeWithGD ( x0 , func , grad , step_size , tol = 1e-6 , max_iter = 1000 , use_line_sea """ Optimize with Gradient Descent input ----- x0 : array_like Starting point for the solver. func : function Takes x and return the function value. grad : function Takes x and returns the gradient of "func". step_size : float or None If it is a float number and `use_line_search=False`, it will be used as the Otherwise, line search will be used tol : float, optional Gradient tolerance for terminating the solver. max_iter : int, optional Maximum number of iterations for terminating the solver. In [ ]: In [ ]: In [ ]: In [ ]:
11/21/23, 6:12 PM 383Hw5_Coding file:///C:/Users/Bernie Liu/Downloads/383Hw5_Coding.html 2/11 use_line_search : bool, optional When it is true a line search will be used, other wise `step_size` has to b output ------ x : array_like Final solution obj_his : array_like Objective function's values convergence history err_his : array_like Norm of gradient convergence history exit_flag : int 0, norm of gradient below `tol` 1, exceed maximum number of iteration 2, line search fail 3, other """ # safeguard if not use_line_search and step_size is None : print ( 'Please specify the step_size or use the line search.' ) return x0 , np . array ([]), np . array ([]), 3 # initial step x = np . copy ( x0 ) g = grad ( x ) # obj = func ( x ) err = norm ( g ) # obj_his = np . zeros ( max_iter + 1 ) err_his = np . zeros ( max_iter + 1 ) # obj_his [ 0 ] = obj err_his [ 0 ] = err # start iterations iter_count = 0 while err >= tol : if use_line_search : step_size = lineSearch ( x , g , g , func ) # # if line search fail step_size will be None if step_size is None : print ( 'Gradient descent line search fail.' ) return x , obj_his [: iter_count + 1 ], err_his [: iter_count + 1 ], 2 # # gradient descent step ##### # TODO: with given step_size, complete gradient descent step x = x - step_size * g ##### # # update function and gradient g = grad ( x ) # obj = func ( x )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help