299

.jpg

School

Govt. College for the Elementary Teachers, Kasur *

*We aren’t endorsed by this school

Course

AI

Subject

Computer Science

Date

Nov 24, 2024

Type

jpg

Pages

1

Uploaded by hdhdbdn

Report
As you may notice when looking at the reports, the differences between the dummy models and a very good model are not as clear any more. Picking which class is declared the positive class has a big impact on the metrics. While the f-score for the dummy classification is 0.13 (vs. 0.89 for the logistic regression) on the “nine” class, for the “not nine” class it is 0.90 vs. 0.99, which both seem like reasonable results. Looking at all the numbers together paints a pretty accurate picture, though, and we can clearly see the superiority of the logistic regression model. Taking uncertainty into account The confusion matrix and the classification report provide a very detailed analysis of a particular set of predictions. However, the predictions themselves already threw away a lot of information that is contained in the model. As we discussed in Chap- ter 2, most classifiers provide a decision_function or a predict_proba method to assess degrees of certainty about predictions. Making predictions can be seen as thresholding the output of decision_function or predict_proba at a certain fixed point—in binary classification we use 0 for the decision function and 0.5 for predict_proba. The following is an example of an imbalanced binary classification task, with 400 points in the negative class classified against 50 points in the positive class. The train- ing data is shown on the left in Figure 5-12. We train a kernel SVM model on this data, and the plots to the right of the training data illustrate the values of the decision function as a heat map. You can see a black circle in the plot in the top center, which denotes the threshold of the decision_function being exactly zero. Points inside this circle will be classified as the positive class, and points outside as the negative class: In[49]: from mglearn.datasets import make_blobs X, y = make_blobs(n_samples=(400, 50), centers=2, cluster_std=[7.0, 2], random_state=22) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) svc = SVC(gamma=.05).fit(X_train, y_train) In[50]: mglearn.plots.plot_decision_threshold() 286 | Chapter5: Model Evaluation and Improvement
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help