2.
learning rate of\alpha= 0:5,\gamma= 0.5 an initial Q-table of all zeros, and the following experience traces (given as (s, a, s', r) tuples):
(a) (s2, Up, s1, -0.04)
(b) (s1, Right, s4, 1.0)
(c) (s2, Right, s3, -0.04)
(d) (s3, Up, s2, -0.04)
(e) (s2, Up, s1, -0.04)
(f) (s1, Right, s4, 1.0)
A. Assuming that the world resets after the agent visits state s4, and the agent starts in state s2, do these experience traces suggest that this is a greedy agent? Why or why not?
B. If a greedy agent were being used to generate experience traces for Q-Learning in this environment, would we be guaranteed to visit every state (in the limit)? What single aspect of the environment could be changed to flip your answer (yes to no, no to yes)?
Step by stepSolved in 3 steps
- Algorithm for Fuzzy decision-making for Dog Eat Dog.Decide-Direction()out: best direction θblocal: best evaluation eb; direction candidate θ; evaluation e of the directionconstant: number of directions sarrow_forwardConsider the doctor-patient problem. There is a Doctor's clinic which has one Doctor, one patient chair, and n chairs for waiting for patients if there are any to sit on the chair. If there is no patient, then the Doctor sleeps in his own chair. When a patient arrives, he has to wake up the Doctor. If there are many patients and the Doctor is treating a patient, then the remaining patients either wait if there are empty chairs in the waiting room or they leave if no chairs are empty. Write a solution (Algorithm / Pseudocode) using semaphores.arrow_forward4. (1) Draw a transition graph for the dfa M={Q,E,8,q,,F ), where Q={q,,91»92 }; E = {a,b},F = {qo,92} and ở is definded as S(90,a) = q,,8(q,b) =q,,5(q,,a)=q0,8(q,,b) =q2,8(q2,a) =q2,8(q2,b) =q2 (2) Give the language accepted by the above dfa.arrow_forward
- We can implement requests to the waiter as either a queue of requests or as a periodic retry of a request. With a queue, requests are handled in the order they are received. Th e problem with using the queue is that we may not always be able to service the philosopher whose request is at the head of the queue (due to the unavailability of resources). Describe a scenario with 5 philosophers where a queue is provided, but service is not granted even though there are forks available for another philosopher (whose request is deeper in the queue) to eat.If we implement requests to the waiter by periodically repeating our request until the resources become available, will this solve the problem described in the above Exercise? Explain.arrow_forwardsolving linear programming problems by the graphical method is limited?arrow_forwardShow how the Boyer-Moore string matching algorithm works for the following input.How many comparisons of characters are done in each case?(a) T = AAAAAAAAAA, P = AAAB(b) T = ratsbatsandcats, P =cathatarrow_forward
- Consider the training set given below for predicting lung cancer in patients based on their symptoms (chronic cough and weight loss) and other lifestyle and environmental attributes (tobacco smoking and expo- sure to radon). a) Draw a two-level decision tree obtained using entropy as the impurity measure. Show your steps clearly (i.e., the computation of information gain for every candidate attribute at the first and second levels of the decision tree must be shown).arrow_forwardConstruct a graph to show how the time to complete the 10th car changes as the learning curve slope parameter is varied from 75% to 95%. The Mechanical Engineering department has a student team that is designing a formula car for national competition. The time required for the team to assemble the first car is 100 hours. Their improvement (or learning rate) is 0.8, which means that as output is doubled, their time to assemble a car is reduced by 20%. Use this information to determine, Solve, (a) the time it will take the team to assemble the 10th car. (b) the total time required to assemble the first 10 cars. (c) the estimated cumulative average assembly time for the first 10 cars. Solve by hand and by spreadsheet.arrow_forward