Bartleby Related Questions Icon

Related questions

Question

2.

learning rate of\alpha= 0:5,\gamma= 0.5 an initial Q-table of all zeros, and the following experience traces (given as (s, a, s', r) tuples):

(a) (s2, Up, s1, -0.04)

(b) (s1, Right, s4, 1.0)

(c) (s2, Right, s3, -0.04)

(d) (s3, Up, s2, -0.04)

(e) (s2, Up, s1, -0.04)

(f) (s1, Right, s4, 1.0)

 

A. Assuming that the world resets after the agent visits state s4, and the agent starts in state s2, do these experience traces suggest that this is a greedy agent? Why or why not?

 

B. If a greedy agent were being used to generate experience traces for Q-Learning in this environment, would we be guaranteed to visit every state (in the limit)? What single aspect of the environment could be changed to flip your answer (yes to no, no to yes)?

Expert Solution
Check Mark
Knowledge Booster
Background pattern image
Similar questions