Write a Java program that interacts with a user to process information retrieval queries. Your program should prompt the user for a data directory, then a query, then display the documents that contain the query term. The documents should be ranked in order of their TF-IDF score. There are several tasks that must be completed to complete this assignment. First, prompt for the directory containing the collection of data as with the earlier assignments. Then, you will need to build an inverted index or incidence matrix. Each entry in the inverted index should consist of a vocabulary word, the word’s document frequency, and the word’s postings. Each posting should contain a document ID and the term frequency of the word with respect to the document. Alternatively, you may build a (non-boolean) incidence matrix. This would contain a table where each row corresponds to a vocabulary word, and each column corresponds to a document. Each cell in the table contains the term frequency (which is an integer representing the number of times the row’s word appears in the column’s document). With that information, the term frequency and inverse document frequency can be calculated when needed. Next, you will need to build the permuterm index. This will contain the information in Assignment #2 where each permuterm points back to the original vocabulary term. Thus, you will need an array where each record contains a permuterm and the vocabulary term that generated it. Finally, you will need to build a querying component. The program should prompt the user for a query term. The system should then input a query. If the query contains an asterisk, your program should find the permuterm of the query where the asterisk is at the end. It should then search the permuterm index for the matching terms which will indicate the vocabulary terms to search in the inverted index/incidence matrix. At that point, your program can compute the TF-IDF score for each vocabulary term and return them to the user.  Sample Execution Enter the name of the collection directory: taglines Please enter your query: da*base Results COSC 5375 COSC 5360 COSC 4385 COSC 4373 COSC 3385 COSC 4315 COSC 1310

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

Write a Java program that interacts with a user to process information retrieval queries. Your
program should prompt the user for a data directory, then a query, then display the documents
that contain the query term. The documents should be ranked in order of their TF-IDF score.

There are several tasks that must be completed to complete this assignment. First, prompt for
the directory containing the collection of data as with the earlier assignments. Then, you will
need to build an inverted index or incidence matrix. Each entry in the inverted index should
consist of a vocabulary word, the word’s document frequency, and the word’s postings. Each
posting should contain a document ID and the term frequency of the word with respect to the
document.

Alternatively, you may build a (non-boolean) incidence matrix. This would contain a table
where each row corresponds to a vocabulary word, and each column corresponds to a
document. Each cell in the table contains the term frequency (which is an integer representing
the number of times the row’s word appears in the column’s document). With that
information, the term frequency and inverse document frequency can be calculated when
needed.

Next, you will need to build the permuterm index. This will contain the information in
Assignment #2 where each permuterm points back to the original vocabulary term. Thus, you
will need an array where each record contains a permuterm and the vocabulary term that
generated it.

Finally, you will need to build a querying component. The program should prompt the user for
a query term. The system should then input a query. If the query contains an asterisk, your
program should find the permuterm of the query where the asterisk is at the end. It should
then search the permuterm index for the matching terms which will indicate the vocabulary
terms to search in the inverted index/incidence matrix. At that point, your program can
compute the TF-IDF score for each vocabulary term and return them to the user. 

Sample Execution
Enter the name of the collection directory:
taglines
Please enter your query:
da*base
Results
COSC 5375
COSC 5360
COSC 4385
COSC 4373
COSC 3385
COSC 4315
COSC 1310



Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps with 1 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY