Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been pruned back to the red line. 0.9 O.85 0.8 0.75 0.7 0.65 On trainine data On validation data- On validation data (during pruning) 0.6 0.55 0.5 20 70 100 Size of tree (number of nodes) Figure 2: Pruned decision tree Refer to Figure 2. Let's say that we have a third dataset Dnew (from the same data distribution), which is not used for training or pruning. If we evaluate this new dataset, approximately what is the accuracy when the size of the tree is at 25 nodes, and why? Select one. Select one: Around 0.76 (slightly higher than the accuracy for validation data at 25 nodes) Around 0.73 (the same as the accuracy for validation data at 25 nodes) Around 0.70 (slightly lower than the accuracy for validation data at 25 nodes) None of the above Which of the following gives us the best approximation of the true error? Line corresponding to training data Line corresponding to validation data Line corresponding to new dataset Dnew Which of the following are valid ways to avoid overfitting? Select all that apply. Select all that apply: O Decrease the training set size. O Set a threshold for a minimum number of samples required to split at an internal node. O Prune the tree so that cross-validation error is minimal. O Maximize the tree depth. O None of the above.

Calculus For The Life Sciences
2nd Edition
ISBN:9780321964038
Author:GREENWELL, Raymond N., RITCHEY, Nathan P., Lial, Margaret L.
Publisher:GREENWELL, Raymond N., RITCHEY, Nathan P., Lial, Margaret L.
Chapter12: Probability
Section12.4: Discrete Random Variables; Applications To Decision Making
Problem 28E
icon
Related questions
Question

1

Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been
pruned back to the red line.
0.9
0.85
0.8
0.75
0.7
0.65
On training data
On validation data-
On validation data (during pruning)
0.6
0.55
0.5
10
20
30
40
50
60
70
80
90
100
Size of tree (number of nodes)
Figure 2: Pruned decision tree
Refer to Figure 2.
Let's say that we have a third dataset Dnew (from the same data
distribution), which is not used for training or pruning.
If we evaluate this new dataset, approximately what is the accuracy when the size of the tree is at 25
nodes, and why? Select one.
Select one:
Around 0.76 (slightly higher than the accuracy for validation data at 25 nodes)
Around 0.73 (the same as the accuracy for validation data at 25 nodes)
Around 0.70 (slightly lower than the accuracy for validation data at 25 nodes)
None of the above
Which of the following gives us the best approximation of the true error?
Line corresponding to training data
Line corresponding to validation data
Line corresponding to new dataset Dnew
Which of the following are valid ways to avoid overfitting? Select all that apply.
Select all that apply:
O Decrease the training set size.
O Set a threshold for a minimum number of samples required to split at an internal node.
O Prune the tree so that cross-validation error is minimal.
O Maximize the tree depth.
O None of the above.
Transcribed Image Text:Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been pruned back to the red line. 0.9 0.85 0.8 0.75 0.7 0.65 On training data On validation data- On validation data (during pruning) 0.6 0.55 0.5 10 20 30 40 50 60 70 80 90 100 Size of tree (number of nodes) Figure 2: Pruned decision tree Refer to Figure 2. Let's say that we have a third dataset Dnew (from the same data distribution), which is not used for training or pruning. If we evaluate this new dataset, approximately what is the accuracy when the size of the tree is at 25 nodes, and why? Select one. Select one: Around 0.76 (slightly higher than the accuracy for validation data at 25 nodes) Around 0.73 (the same as the accuracy for validation data at 25 nodes) Around 0.70 (slightly lower than the accuracy for validation data at 25 nodes) None of the above Which of the following gives us the best approximation of the true error? Line corresponding to training data Line corresponding to validation data Line corresponding to new dataset Dnew Which of the following are valid ways to avoid overfitting? Select all that apply. Select all that apply: O Decrease the training set size. O Set a threshold for a minimum number of samples required to split at an internal node. O Prune the tree so that cross-validation error is minimal. O Maximize the tree depth. O None of the above.
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 4 steps

Blurred answer
Similar questions
Recommended textbooks for you
Calculus For The Life Sciences
Calculus For The Life Sciences
Calculus
ISBN:
9780321964038
Author:
GREENWELL, Raymond N., RITCHEY, Nathan P., Lial, Margaret L.
Publisher:
Pearson Addison Wesley,
Glencoe Algebra 1, Student Edition, 9780079039897…
Glencoe Algebra 1, Student Edition, 9780079039897…
Algebra
ISBN:
9780079039897
Author:
Carter
Publisher:
McGraw Hill
Algebra: Structure And Method, Book 1
Algebra: Structure And Method, Book 1
Algebra
ISBN:
9780395977224
Author:
Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:
McDougal Littell
Big Ideas Math A Bridge To Success Algebra 1: Stu…
Big Ideas Math A Bridge To Success Algebra 1: Stu…
Algebra
ISBN:
9781680331141
Author:
HOUGHTON MIFFLIN HARCOURT
Publisher:
Houghton Mifflin Harcourt
Holt Mcdougal Larson Pre-algebra: Student Edition…
Holt Mcdougal Larson Pre-algebra: Student Edition…
Algebra
ISBN:
9780547587776
Author:
HOLT MCDOUGAL
Publisher:
HOLT MCDOUGAL