1. Generate dataset Write Python code to generate a regression dataset that contains 250 examples. In this dataset, each input x is drawn uniformly random from (0, 2) and the corresponding output is y=x3-3x2+2x+e, where € is Gaussian noise with zero mean and standard deviation 0.04. You don't need to define a function, but your code should create variables that contain all the inputs and outputs. You should also fix a random seed so that your results are reproducible. 2. Split dataset into train/test sets Write code to randomly split your dataset above into a train set and a test set. Your train set should contain 150 examples and your test set should contain 100 examples. 3. Define a K-NN model Write code to define a K-NN regression model with K-5 and the neighbors are weighted by the inverse of their distance. 4. Train your K-NN model Write code to train your K-NN model on your train set. 5. Make prediction with your K-NN model Write code to predict the outputs of your K-NN model on the test set. 6. Compute the MSE Write code to compute and print out the mean squared error (MSE) of your K-NN model on the test set 7. Visualize your K-NN model Write code to plot the predictions of your K-NN model on [0, 2]. You should generate 100 evenly spaced points on [0, 2], use your K-NN model to predict their outputs, and plot them as a line graph. Your plot must also contain the train set, the test set, and the legend. Play with the settings to make your plot clear and readable. 8. Train and test a decision tree model Write code to train a decision tree model on your train set above and then print out its MSE on the test set. Use sklearn's default setting for your model. 9. Visualize your decision tree model Repeat Question 7 with your trained decision tree model. Your code should generate a plot similar to that of Question 7. Do not plot the actual tree with graphviz. 10. Compare decision tree and K-NN ( According to the above results, which model is better on your dataset (K-NN or decision tree)? Why?
question 6
1. Set a random seed for reproducibility.
2. Generate a regression dataset with 250 examples:
a. Generate 250 random values for input (X) uniformly drawn from the range (0, 2).
b. Add Gaussian noise with zero mean and standard deviation 0.04 to create the output (y) using the formula: y = X^3 - 3*X^2 + 2*X + ε.
3. Split the dataset into a train set (150 examples) and a test set (100 examples) using train_test_split.
4. Define a K-NN regression model with K=5 and weighted by the inverse of distance.
5. Train the K-NN model on the train set.
6. Make predictions with the K-NN model on the test set.
7. Compute the Mean Squared Error (MSE) for the K-NN model by comparing its predictions to the true values in the test set.
8. Visualize the K-NN model:
a. Generate 100 evenly spaced points between 0 and 2.
b. Use the trained K-NN model to predict the outputs for these points.
c. Plot the train set, test set, and K-NN predictions on the same graph.
9. Train a Decision Tree regression model on the train set using sklearn's default settings.
10. Make predictions with the Decision Tree model on the test set.
11. Compute the MSE for the Decision Tree model on the test set.
12. Visualize the Decision Tree model in a manner similar to step 8.
13. Compare the MSE values of the K-NN and Decision Tree models.
a. If K-NN's MSE is lower, print "K-NN performs better on the dataset."
b. Otherwise, print "Decision Tree performs better on the dataset."
Step by step
Solved in 4 steps with 5 images