To compute the conditional probabilities you need to determine unigram and
bigram counts first (you can do this in a single pass through a file if you do things
carefully) and store them in a Binary Search Tree (BST). After that, you can compute
the conditional probabilities.
Input files
Test files can be found on (http://www.gutenberg.org/ebooks/). For example,
search for “Mark Twain.” Then click on any of his books. Next download the “Plain
Text UTF-8” format.
In addition, you should test your program on other input files as well, for which you
can hand-compute the correct answer.
Output files
Your program must accept the name of an input file as a command line argument.
Let's call the file name of this file fn. Your program must then produce as output the
following set of files:
• Your program must write the unigram counts to a file named fn.uni in which
each unigram is listed on a separate line, and each line contains just the
unigram and its count (an integer), separated by a single space.
• Your program must write the bigram counts to a file named fn.bi in which
each bigram is listed on a separate line, and each line contains just the
bigram and its count (an integer), separated by a single space.
• Your program must write the conditional probabilities to a file named fn.cp,
reported in the form P(WORD(k)|WORD(k-1)) = p, where p is the conditional
probability of WORD(k) given WORD(k-1).
Notes
• You may use any BST implementation found online at your own risk
(provided the source of this code is cited properly). 50 marks bonus would
be given if you implemented the BST yourself (only the functionalities
needed to complete your project).
• Your program should accept file name(s) as command line argument(s) (no
hard-coded file names in your code).
• Your code will be tested using the latest version of the GNU C compiler on a
Unix-based
may want to use Windows Subsystem or a Virtual Box under your own
responsibility!
• Your code must be well commented. When writing your comments, you
should focus on what the code does at a high level; for example, describe the
main steps of an
• A ReadMe.txt file (including instructions on how to compile and how to run
your program along with any known problems) must be submitted
Step by stepSolved in 2 steps
- A user is going to enter numbers one at a time, entering 'q' when finished. Put the numbers in a list, sort it in numerical order, and print out the list. Then print out the middle element of the list. (If the list has an even number of elements, print the one just after the middle.) Remember that a list 1st of numbers can be sorted numerically by calling 1st.sort(), and can be printed with print(1st). You can assume that every entry is either a valid integer or is the letter 'q'. Examples: If the input is 4 3 6 7 3 q The output is [3, 3, 4, 6, 7] 4 If input is 4 3 6 7 3 2 q The output is [2, 3, 3, 4, 6, 7] 4arrow_forwarddef makeRandomList(size): lyst = [] for count in range(size): while True: number = random.randint(1, size) if not number in lyst: lyst.append(number) break return lyst give me proper analysis of this code and Big Oarrow_forwardCan you help me write the code for this one please? Thank youarrow_forward
- Computer Science Using Java, write a simple Insertion Sort program that can read in integers from a text file (line by line) and sort them into another text file. Use inFile and outFile for the input and output files, respectively. Also make sure that the algorithm keeps track of the comparisons and exchanges performed by the sort so that they may be printed out in the console after the sort is completedarrow_forwardin python pleasearrow_forwardPythonarrow_forward
- Python’s for loop allows the programmer to add or remove items in the collection over which the loop is iterating. Some designers worry that changing the structure of a collection during iteration might cause program crashes. The remedy is to make the for loop read-only, by disallowing mutations to the collection during iteration. You can detect such mutations by keeping a count of them and determining if this count goes up at any point within the collection’s __iter__ method. When this happens, you can raise an exception to prevent the computation from going forward. In the arraybag.py file complete the following in the ArrayBag class: In the __init__ method, include a new instance variable named modCount, which is set to 0. In the __iter__ method include a temporary variable named modCount, which is set initially to the value of the instance variable self.modCount. Immediately after an item is yielded within the __iter__ method, you raise an exception if the values of the two mod…arrow_forwardPython Code Create a code that can plot a distance versus time graph by importing matplotlib and appending data from a text file to a list. Follow the algorithm: Import matplotlib. Create two empty lists: Time = [ ] and Distance = [ ] Open text file named Motion.txt (content attached). Append data from Motion.txt such that the first column is placed in Time list and the second column is placed in Distance list. Plot the lists (Distance vs Time Graph). You may use this following link as a source for matplotlib functions: https://datatofish.com/line-chart-python-matplotlib/ Show Plot.arrow_forwardThink about how you search a dictionary: you start in themiddle either go left or go right until you find the word you’re looking foror can’t narrow the word list down any further. It’s possible to do thesame in a list of numbers. Write an algorithm that, given a list of numbersand a secret number to find, tells you where in the list (index) you canfind that number or tells you the number isn’t in the list.arrow_forward
- Computer Networking: A Top-Down Approach (7th Edi...Computer EngineeringISBN:9780133594140Author:James Kurose, Keith RossPublisher:PEARSONComputer Organization and Design MIPS Edition, Fi...Computer EngineeringISBN:9780124077263Author:David A. Patterson, John L. HennessyPublisher:Elsevier ScienceNetwork+ Guide to Networks (MindTap Course List)Computer EngineeringISBN:9781337569330Author:Jill West, Tamara Dean, Jean AndrewsPublisher:Cengage Learning
- Concepts of Database ManagementComputer EngineeringISBN:9781337093422Author:Joy L. Starks, Philip J. Pratt, Mary Z. LastPublisher:Cengage LearningPrelude to ProgrammingComputer EngineeringISBN:9780133750423Author:VENIT, StewartPublisher:Pearson EducationSc Business Data Communications and Networking, T...Computer EngineeringISBN:9781119368830Author:FITZGERALDPublisher:WILEY