use python
Web Scraping
A marketing company would like to know if varsity (college) swimmers are (on average) taller than their volleyball counterparts. You have been asked to create a data driven solution in order to answer this question.
The following web pages contain the roster of the Bearcats’ men’s and women’s swimming and volleyball teams.
Men’s Swimming Team
https://athletics.baruch.cuny.edu/sports/mens-swimming-and-diving/roster
Men’s Volleyball Team
https://athletics.baruch.cuny.edu/sports/mens-volleyball/roster
Women’s Swimming Team
https://athletics.baruch.cuny.edu/sports/womens-swimming-and-diving/roster
Women’s Volleyball Team
https://athletics.baruch.cuny.edu/sports/womens-volleyball/roster
The height of each player is listed on all web pages.
1. Scrape the heights of all the players on the men’s swimming team and find the average.
2. Scrape the heights of all the players on the men’s volleyball team and find the average.
3. Scrape the heights of all the players on the women’s swimming team and find the average.
4. Scrape the heights of all the players on the women’s volleyball team and find the average.
5. Compare the averages between the two men’s teams. Write a few lines about your findings.
6. Compare the averages between the two women’s teams. Write a few lines about your findings.
7. Are you able to determine whether, in general, if the average swimmer is taller than the
average volleyball player? Write a few lines about your findings.
Hints:
Inspect the html on each page listed above. Determine which tagand class point to the players’ heights. Configure your web scraper accordingly. Follow the steps used in: https://github.com/avinashjairam/avinashjairam.github.io/blob/master/project_example.ipynb
After you have scraped the heights and have stored them in different lists, you may have to convert the data (the heights) from strings to a numeric type (and then perhaps to centimeters or meters?) to find the average.
Note:
The tasks listed here span many different topics in python. (There’s a huge clue in the previous sentence!)
Trending nowThis is a popular solution!
Step by stepSolved in 9 steps with 7 images
- Explain the role of hashing algorithms in data integrity and authentication.arrow_forwardRAID-capable storage reduces the need for Oracle database backups in the modern day. Due to the difficulty of reading handwriting, written material is preferred over scrawled information.arrow_forwardClustering: What is it? It can be used in the field of data mining in many ways.arrow_forward
- DATABASE MANAGEMENT ABC Walk-in Clinic is located in a large metropolitan city in Canada. The clinic staff consists of ten doctors, six nurses, five office secretaries, two administrative assistants and one manager. First time Patients have to visit the clinic personally and fill a registration form that contains their personal and health related information. An office secretary would then enter that information in the computer based information system. Patients may become a permanent patient (at any time) for one of the doctors at the clinic by filling up necessary forms (they are called enrolled patients) or they may choose to come walk-in for every visit. (They usually called walk-in patients) Enrolled Patients may book their appointments online or by calling, the office and one of the secretaries would then book their appointment with their doctor on a particular day/time. Any booked appointment may be cancelled up to 24 hours in advance after which the clinic charges a…arrow_forwardWhat are the steps that go into storing and retrieving data?arrow_forwardPlease do not give solution in image format thanku the registration department at L university provides a self-service system for students to register their courses online. Students are expected to log on at a rate of 20 students per hours according to a Poisson process, and stay connected for an average of 5 minutes. For this queuing system answer next the following questions 1) avg time in system 2) the avg number of students accessing the systemarrow_forward
- Please do not copy from googlearrow_forwardMt noy Modify the Octave script that you wrote for Task 1 to generate the following figure. It may look difficult at first, but if you really understand the code that you wrote for Task 1, it shouldn't be that difficult to generate the figure below.arrow_forwardgoogle colab [jupyter notebook] Amazon Musical Instrument Reviews General Readme on Projects Web commerce sites get a substantial amount of feedback from reviews users post on various websites. It is not practical to go through all this information by hand to determine whether a user liked a particular product or not. For our project we are going to use a dataset of Amazon Musical Instrument Reviews. The main reason I selected this dataset is that it is significantly smaller than the Amazon review datasets for movies, music, and books. This dataset has a bit over 221,000 reviews. The columns in the dataset are name description verified whether the reviewer bought the product from Amazon or not reviewTime time of the review reviewerID ID of the reviewer, e.g. A2SUAM1J3GNN3B asin ID of the product reviewerName name of the reviewer reviewText the text of the body of the review summary the test of the heading of the review unixReviewTime time of the review (Unix time)…arrow_forward
- DDL is a language used to define the structure of data.arrow_forwardPlease don't copyarrow_forwardComputer Science A study conducted by Netflix’s consumer relations department has revealed that customers are more likely to review horror and thriller movies during certain months of the year. To discourage seasonal variations in these movie ratings, Netflix has called for your help to analyze horror and thriller movie rating counts over the past 15 years. Your query should display each month of the year (name of the month not the number) and the total number of ratings given during that month. A third column titled “Promotion Recommendation” determines whether a promotion should be applied to increase the number of ratings given during that month. If a month has less than 2200 ratings the final column should display ‘10% discount promotion’, if between 2201 and 4500 then display ‘5% discount promotion’, and if more than 4500 ratings then output ‘No Promotion’. Order your results by months. Do not hardcode the current date. ORALCE CODE PLEASE.arrow_forward
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education