Database System Concepts
Database System Concepts
7th Edition
ISBN: 9780078022159
Author: Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher: McGraw-Hill Education
Bartleby Related Questions Icon

Related questions

Question

create a script that will parse data from Rotten Tomatoes, a movie reviews website. The work you have to do is identical to what we covered in lectures 4 and 5, albeit for a different website. Please read and follow the instructions below very carefully.


Step 1

Your script should begin by defining two variables (after importing libraries, etc)

  1. movie        a string variable indicating the movie for which reviews will be parsed
  2. pageNum  the number of review pages to parse

 

For example, to parse the first 3 pages of the Gangs of New York reviews, set movie ‘’gangs_of_new_york” and pageNum = 3. Your code should go to the movie’s All Critics reviews page of rotten tomatoes, and parse the first three pages of reviews. Pagination on rotten tomatoes happens by clicking on the “Next” button.

Step 2

For each review contained in each of the pages you requested, parse the following information

  1. The critic This should be 'NA' if the review doesn't have a critic’s name. 
  2. The rating. The rating should be 'rotten' , 'fresh', or 'NA' if the review doesn't have a rating.
  3. The source This should be 'NA' if the review doesn't have a source.
  4. The text. This  should be 'NA' if the review doesn't have text.
  5. The date. This should be  'NA' if the review doesn't have a date.

 

Continuing with our Gangs of New York example:



Step 3

After parsing the data, save them in a file that is called firstname_lastname_movie.txt

  • The file should include one line for each review. 
  • The reviews in the file should appear in the same order as they do on the website. 
  • The 5 values that you write for each movie should be written in the order listed in step 2. 
  • The 5 values should be separated by a TAB character ('\t'). 

 

For example, I would save my data to “apostolos_filippas_gangs_of_new_york.txt”. If I had to parse the first three pages of reviews for that movie, my .txt output would look and be named like this (parsed on 10/09/2021).



Expert Solution
Check Mark
Program approach
  1. Take 2 inputs from the user namely, movieName and no.of pages.
  2. Convert movieName to proper format after reviewing rotten Tomatoes website.

The movie name is all smalls and has _ between words.

  1. Convert movie name entered by user to above format.
  2. Format the link and store it in the url.
  3. Write a function that takes 2 parameters url and pages.
  4. Get all the reviews. Parse through the reviews as long as number of pages is greater than 0 while number of pages is decremented by 1 for each iteration.
  5. Extract the reviews and append to a list. Return the list.
  6. The extracted data has many fields. Extract the required fields to a dictionary
  7. Convert the dictionary to a frame and then to a .txt file. 
Knowledge Booster
Background pattern image
Computer Science
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
SEE MORE QUESTIONS
Recommended textbooks for you
Text book image
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Text book image
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Text book image
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
Text book image
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Text book image
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Text book image
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education