create a script that will parse data from Rotten Tomatoes, a movie reviews website. The work you have to do is identical to what we covered in lectures 4 and 5, albeit for a different website. Please read and follow the instructions below very carefully. Step 1 Your script should begin by defining two variables (after importing libraries, etc) movie a string variable indicating the movie for which reviews will be parsed pageNum the number of review pages to parse For example, to parse the first 3 pages of the Gangs of New York reviews, set movie ‘’gangs_of_new_york” and pageNum = 3. Your code should go to the movie’s All Critics reviews page of rotten tomatoes, and parse the first three pages of reviews. Pagination on rotten tomatoes happens by clicking on the “Next” button. Step 2 For each review contained in each of the pages you requested, parse the following information The critic This should be 'NA' if the review doesn't have a critic’s name. The rating. The rating should be 'rotten' , 'fresh', or 'NA' if the review doesn't have a rating. The source This should be 'NA' if the review doesn't have a source. The text. This should be 'NA' if the review doesn't have text. The date. This should be 'NA' if the review doesn't have a date. Continuing with our Gangs of New York example: Step 3 After parsing the data, save them in a file that is called firstname_lastname_movie.txt The file should include one line for each review. The reviews in the file should appear in the same order as they do on the website. The 5 values that you write for each movie should be written in the order listed in step 2. The 5 values should be separated by a TAB character ('\t'). For example, I would save my data to “apostolos_filippas_gangs_of_new_york.txt”. If I had to parse the first three pages of reviews for that movie, my .txt output would look and be named like this (parsed on 10/09/2021).
create a script that will parse data from Rotten Tomatoes, a movie reviews website. The work you have to do is identical to what we covered in lectures 4 and 5, albeit for a different website. Please read and follow the instructions below very carefully.
Step 1
Your script should begin by defining two variables (after importing libraries, etc)
- movie a string variable indicating the movie for which reviews will be parsed
- pageNum the number of review pages to parse
For example, to parse the first 3 pages of the Gangs of New York reviews, set movie ‘’gangs_of_new_york” and pageNum = 3. Your code should go to the movie’s All Critics reviews page of rotten tomatoes, and parse the first three pages of reviews. Pagination on rotten tomatoes happens by clicking on the “Next” button.
Step 2
For each review contained in each of the pages you requested, parse the following information
- The critic This should be 'NA' if the review doesn't have a critic’s name.
- The rating. The rating should be 'rotten' , 'fresh', or 'NA' if the review doesn't have a rating.
- The source This should be 'NA' if the review doesn't have a source.
- The text. This should be 'NA' if the review doesn't have text.
- The date. This should be 'NA' if the review doesn't have a date.
Continuing with our Gangs of New York example:
Step 3
After parsing the data, save them in a file that is called firstname_lastname_movie.txt
- The file should include one line for each review.
- The reviews in the file should appear in the same order as they do on the website.
- The 5 values that you write for each movie should be written in the order listed in step 2.
- The 5 values should be separated by a TAB character ('\t').
For example, I would save my data to “apostolos_filippas_gangs_of_new_york.txt”. If I had to parse the first three pages of reviews for that movie, my .txt output would look and be named like this (parsed on 10/09/2021).
- Take 2 inputs from the user namely, movieName and no.of pages.
- Convert movieName to proper format after reviewing rotten Tomatoes website.
The movie name is all smalls and has _ between words.
- Convert movie name entered by user to above format.
- Format the link and store it in the url.
- Write a function that takes 2 parameters url and pages.
- Get all the reviews. Parse through the reviews as long as number of pages is greater than 0 while number of pages is decremented by 1 for each iteration.
- Extract the reviews and append to a list. Return the list.
- The extracted data has many fields. Extract the required fields to a dictionary
- Convert the dictionary to a frame and then to a .txt file.
Trending now
This is a popular solution!
Step by step
Solved in 4 steps with 5 images