(a) Use Naïve Bayes Multinomial Model and Naïve Bayes Bernoulli Model to respectively calculate how Doc 8 and Doc 9 given above will be classified. Please use add-one smoothing to process the conditional probabilities in the calculation T1 T2 T3 T4 T5 T6 T7 T8 doc8 3 1 0 4 1 0 2 1 doc9 0 0 3 0 1 5 0 1 (b) Redo the classification, use the K-Nearest-Neighbor approach for document categorization with K = 3 to classify the following two new documents. Show calculation details. Note: no need to normalized the vectors, use raw tf*idf for the weight of each term and use cosine similarity for computing similarities. (c) Redo the classification, use the Rocchio-Based vector space model to determine how Doc 8 and Doc 9 will be classified. As (b), use non-normali zed vectors, and raw tf*idf for the weights of each term and cosine similarity.

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

 Text categorization: given the following document-term matrix: (the value in the
matrix represents the frequency of a specific term in that document)
T1 T2 T3 T4 T5 T6 T7 T8
doc1 2 0 4 3 0 1 0 2
doc2 0 2 4 0 2 3 0 0
doc3 4 0 1 3 0 1 0 1
doc4 0 1 0 2 0 0 1 0
doc5 0 0 2 0 0 4 0 0
doc6 1 1 0 2 0 1 1 3
doc7 2 1 3 4 0 2 0 2
Assume that documents have been manually assigned to two pre-specified categories
as follows: Class_1 = {Doc1, Doc2, Doc5}, Class_2 = {Doc3, Doc4, Doc6, Doc7}
(a) Use Naïve Bayes Multinomial Model and Naïve Bayes Bernoulli Model to
respectively calculate how Doc 8 and Doc 9 given above will be classified. Please use
add-one smoothing to process the conditional probabilities in the calculation
T1 T2 T3 T4 T5 T6 T7 T8
doc8 3 1 0 4 1 0 2 1
doc9 0 0 3 0 1 5 0 1
(b) Redo the classification, use the K-Nearest-Neighbor approach for document
categorization with K = 3 to classify the following two new documents. Show
calculation details. Note: no need to normalized the vectors, use raw tf*idf for the
weight of each term and use cosine similarity for computing similarities.
(c) Redo the classification, use the Rocchio-Based vector space model to
determine how Doc 8 and Doc 9 will be classified. As (b), use non-normali zed vectors,
and raw tf*idf for the weights of each term and cosine similarity.

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY