hw4
pdf
School
University of Oregon *
*We aren’t endorsed by this school
Course
102
Subject
Statistics
Date
Apr 27, 2024
Type
Pages
8
Uploaded by MajorKookaburaMaster1051
hw4
April 26, 2024
[ ]:
import
otter
grader
=
otter
.
Notebook()
1
Homework 4: Advanced operations in pandas
Due Date: 11:59PM on the date posted to Canvas
Collaboration Policy
Data science is a collaborative activity. While you may talk with others
about the homework, we ask that you
write your solutions individually
. If you do discuss the
assignments with other students please include their names below.
Collaborators:
list collaborators here
Grading
Grading is broken down into autograded answers and free response.
For autograded answers, the results of your code are compared to provided and/or hidden tests.
For autograded probability questions, the provided tests will only check that your
answer is within a reasonable range.
For free response, readers will evaluate how well you answered the question and/or fulfilled the
requirements of the question.
For plots, make sure to be as descriptive as possible: include titles, axes labels, and units wherever
applicable.
[ ]:
import
numpy
as
np
import
pandas
as
pd
import
matplotlib
import
matplotlib.pyplot
as
plt
import
seaborn
as
sns
'imports completed'
1.1
Introduction
The purpose of this module is to expand your ‘pandas’ skillset by performing various new and old
operations on ‘pandas’ dataframes. A lot of these operations will be things you’ve done before in
the
datascience
package, so you should reference the included notebook to translate between the
two if need be.
1
You are expected to answer all relevant questions programatically
i.e.
use indexing and func-
tions/methods to arrive to your answers. Your answers don’t need to be in one single line, you may
use as many intermediate steps as you need.
1.1.1
Question 1
Reading in data from file is made easy in the
pandas
package. We have included two datasets in
your assignment folder to read in, ‘broadway.csv’ and ‘diseases.txt’.
Question 1.1
Read in broadway using
pd.read_csv
.
[ ]:
broadway
= ...
broadway
.
head(
6
)
[ ]:
grader
.
check(
"q1_1"
)
Question 1.2
Now read in the diseases dataset. Diseases is not a
.csv
but a
.txt
file
i.e.
a plain-
text file. Because it’s not
.csv
, we can’t assume that the values are comma separated. Fortunately
pd.read_csv
can be used on any file. It may not parse the data correctly, but it may reveal the
values that do separate entries.
Identify the separator used in
diseases.txt
and use it to successfully read in your data with
pd.read_csv
.
[ ]:
separator
= ...
diseases
=
pd
.
read_csv(
"diseases.txt"
, sep
= ...
)
diseases
.
head(
6
)
[ ]:
grader
.
check(
"q1_2"
)
Question
1.3
Read
in
the
the
DataFrame
called
nst-est2016-alldata.csv
from
the
course
Github.
The
url
path
to
the
repository
is
https://github.com/oregon-data-
science/DSCI101/raw/main/data/. You should do this with
pd.read_csv
.
[ ]:
pop_census
= ...
[ ]:
grader
.
check(
"q1_3"
)
This DataFrame gives census-based population estimates for each state on both July 1, 2015 and
July 1, 2016. The last four columns describe the components of the estimated change in population
during this time interval.
For all questions below, assume that the word “states” refers
to all 52 rows including Puerto Rico & the District of Columbia.
The data was taken from
here
.
If you want to read more about the different column descriptions, click
here
!
The raw data is a bit messy - run the cell below to clean the DataFrame and make it easier to work
with.
2
[ ]:
# Don't change this cell; just run it.
pop_sum_level
=
pop_census[
'SUMLEV'
]
== 40
pop
=
pop_census[pop_sum_level]
# grab a numbered list of columns to use
columns_to_use
=
pop
.
columns[[
1
,
4
,
12
,
13
,
27
,
34
,
62
,
69
]]
pop
=
pop[columns_to_use]
pop
=
pop
.
rename(columns
=
{
'POPESTIMATE2015'
:
'2015'
,
'POPESTIMATE2016'
:
'2016'
,
'BIRTHS2016'
:
'BIRTHS'
,
'DEATHS2016'
:
'DEATHS'
,
'NETMIG2016'
:
'MIGRATION'
,
'RESIDUAL2016'
:
'OTHER'
})
#pop['REGION'].unique()
pop[
'REGION'
]
=
pop[
'REGION'
]
.
replace({
'1'
:
1
,
'2'
:
2
,
'3'
:
3
,
'4'
:
4
,
'X'
:
0
})
pop
.
head(
12
)
1.1.2
Question 2 - Census data
Question 2.1
Assign
us_birth_rate
to the total US annual birth rate during this time interval.
The annual birth rate for a year-long period is the total number of births in that period as a
proportion of the population size at the start of the time period.
Hint:
Which year corresponds to the start of the time period?
[ ]:
us_birth_rate
= ...
us_birth_rate
[ ]:
grader
.
check(
"q2_1"
)
Question 2.2
Assign
movers
to the number of states for which the
absolute value
(
np.abs
) of
the
annual rate of migration
was higher than 1%. The annual rate of migration for a year-long
period is the net number of migrations (in and out) as a proportion of the population size at the
start of the period.
The
MIGRATION
column contains estimated annual net migration counts by
state.
[ ]:
...
movers
= ...
movers
[ ]:
grader
.
check(
"q2_2"
)
Question 2.3
Assign
west_births
to the total number of births that occurred in region 4 (the
Western US).
Hint:
Make sure you double check the type of the values in the region column, and appropriately
filter (i.e. the types must match!).
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
[ ]:
west_births
= ...
west_births
[ ]:
grader
.
check(
"q2_3"
)
Question 4.
Assign
less_than_west_births
to the number of states that had a total population
in 2016 that was smaller than the
total number of births in region 4 (the Western US)
during this
time interval.
[ ]:
less_than_west_births
= ...
less_than_west_births
[ ]:
grader
.
check(
"q2_4"
)
Question 2.5
In the next question, you will be creating a visualization to understand the relationship between
birth and death rates. The annual death rate for a year-long period is the total number of deaths
in that period as a proportion of the population size at the start of the time period.
What visualization is most appropriate to see if there is an association between birth and death
rates during a given time interval?
1. Line Graph
2. Scatter Plot
3. Bar Chart
Assign
visualization
below to the number corresponding to the correct visualization.
[ ]:
visualization
= ...
[ ]:
grader
.
check(
"q2_5"
)
Question 2.6
In the code cell below, create a visualization that will help us determine if there is
an association between birth rate and death rate during this time interval. It may be helpful to
create an intermediate DataFrame here.
[ ]:
# Generate your chart in this cell
...
1.1.3
Question 3 - The diseases dataset
The U.S., as in many places, was once afflicted by many diseases (many of them viruses) that
are no longer prominent today due to the advent of vaccines.
Some of them such as Polio have
been effectively eradicated while others like Measles affect so few individuals that they are largely
irrelevant in the public health landscape. Notably, even though many of these diseases persist in the
population (
e.g.
measles, mumps and rubella), they are suffciently diluted by uninfected and/or
vaccinated individuals to undermine any potential for an outbreak.
Question 3.1
How many different diseases are represented in this dataset?
4
[ ]:
num_diseases
= ...
[ ]:
grader
.
check(
"q3_1"
)
Question 3.2
We have disease prevalence in terms of total individuals infected in a year in a state.
The absolute magnitude of infected individuals can be helpful, but it’ll be easier to directly compare
between diseases and states if we weight these values by total population. Create a new column in
diseases called “incidence_per” representing the disease incidence (“number”) as a percent of the
state’s population.
Hint
: If the variable is represented as a percent, then it should be between 0 and 100.
[ ]:
diseases[
"incidence_per"
]
= ...
[ ]:
grader
.
check(
"q3_2"
)
Question 3.3
Using this new column you created, identify the disease that afflicted the greatest
percentage of New York’s population in 1928. Provide your answer as a string.
[ ]:
...
worst_ny_disease_1928
= ...
worst_ny_disease_1928
[ ]:
grader
.
check(
"q3_3"
)
Question 3.4
Between the years 1928 and 1938 inclusive, which U.S. state had the highest
average incidence of polio as a percentage of its total population?
[ ]:
...
worst_polio_state
= ...
worst_polio_state
[ ]:
grader
.
check(
"q3_4"
)
Question 3.5
Identify the
first
year in which Polio was effectively eradicated in the US (fewer
than 100 total cases).
[ ]:
...
first_year_eradicated
= ...
first_year_eradicated
[ ]:
grader
.
check(
"q3_5"
)
Measles is a highly infectious viral disease that, historically, was once one of the most prominent
childhood illnesses globally.
5
Prior to the development of a vaccine for measles, it was more or less a fact of life for children. The
disease was a constant blight that perpetuated itself in large boom-bust cycles of disease outbreaks.
However, the first measles vaccine was approved for distribution in 1963, which would have dramatic
consequences for the future of measles’ presence in the public-health landscape.
The
𝑅
0
of a disease represents how many people we can expect to be infected by a single conta-
gious individual under average conditions in a uniformly susceptible population (no vaccinations or
aquired immunity). Measles has an
𝑅
0
= 18 - an incredibly high value that indicates it is among
the most infectious diseases that affect humans. For reference, the
𝑅
0
for a typical year’s flu is 1.
[ ]:
measles_sum
=
diseases[diseases[
"disease"
]
==
"MEASLES"
]
.
groupby([
"year"
])
.
↪
sum(
"number"
)
.
reset_index()
sns
.
lineplot(data
=
measles_sum, x
=
"year"
, y
=
"number"
)
plt
.
ylabel(
"Number of Cases (US)"
)
plt
.
axvline(x
= 1963
, color
=
"black"
, linestyle
=
"dashed"
);
Clearly the MMR vaccine was incredibly successful at reducing and eventually eliminating Measles
outbreaks.
1.1.4
Question 4 - The broadway dataset
The broadway dataset contains all plays put into production on Broadway between the years 1990
and 2016.
[ ]:
print
(
f"Over this time period there were
{
len
(broadway[
'Show.Name'
]
.
unique())
}
␣
↪
different shows put on Broadway."
)
That’s a lot of shows! Presumably there were some hits and some duds. Let’s separate the wheat
from the chaff and identify those shows that performed the best. But how do we define best?
Question 4.1
Create a
Series
of plays in order of most to least total gross.
[ ]:
broadway_grosses
= ...
[ ]:
grader
.
check(
"q4_1"
)
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 4.2
Now create a
Series
of plays in order of most to least average amount grossed
per
seat filled (Gross / Attendance)
[ ]:
...
broadway_gross_seat
= ...
broadway_gross_seat
[ ]:
grader
.
check(
"q4_2"
)
Question 4.3
Create a new variable representing date as a single continuous variable.
This
should combine year, month and day into a new column. Assume no leap years and that that there
are 30.44 days per month. Call this variable
date_continuous
Hint
: Think about how you can convert months into days and days into the same units as years.
[ ]:
broadway[
"date_continuous"
]
= ...
broadway[[
"Show.Name"
,
"date_continuous"
]]
[ ]:
grader
.
check(
"q4_3"
)
With this variable created, we can now identify the show that has had the longest tenure on
Broadway. To do this, we’ll define our own function called
span
, which will return the difference
between the max and minimum of a series. Using this function, we can find the length of time each
show spent on Broadway and identify the longest running plays.
[ ]:
def
span
(series):
return
max
(series)
-
min
(series)
[ ]:
broadway_length
=
(broadway[[
"Show.Name"
,
"date_continuous"
]]
.
groupby(
"Show.Name"
)
.
agg(span)
.
reset_index()
.
rename({
"date_continuous"
:
"total_tenure_years"
}, axis
= 1
)
.
sort_values(
"total_tenure_years"
, ascending
=
False
)
)
broadway_length
.
head()
Question 4.4
This is some handy information and we might find it useful to include the to-
tal tenure of a show in the original dataframe. Join the total tenure you just determined on the
original broadway frame using the
merge
function, ensuring that the new column is called “to-
tal_tenure_years”. Be sure that there is no information lost from the original dataframe in your
new, joined dataframe.
You should reference the help file for merge if you need guidance (
help(pd.merge)
).
[ ]:
broadway_merged
= ...
broadway_merged
.
head()
7
[ ]:
grader
.
check(
"q4_4"
)
1.2
Submission
Make sure you have run all cells in your notebook in order.
Then execute the following two
commands from the File menu:
• Save and Checkpoint
• Close and Halt
Then upload your .ipynb file to Canvas assignment HW4
To double-check your work, the cell below will rerun all of the autograder tests.
[ ]:
grader
.
check_all()
8
Related Documents
Related Questions
Español
Find an equation for the line below.
6-
4-
2-
2-
6-
Continue
Submit Assignment
© 2021 McGraw Hill LLC. All Rights Reserved. Terms of Use | Privacy Center | Accessibility
INTL
hp
%24
%
8.
arrow_forward
how would you find SV (number 8) ?
arrow_forward
7 see picture to solve
arrow_forward
pl4ease do not provide solution in image format thank you!
arrow_forward
help please answer in text form with proper workings and explanation for each and every part and steps with concept and introduction no AI no copy paste remember answer must be in proper format with all working
arrow_forward
Can you please help me with this
arrow_forward
Please explain each step clearly, and no excel formula should be used for solving this problem
arrow_forward
5lqloGU/edit
Sora
Zearn
Papa's Burgeria - Pl..
Apps
September equinox..
Request edit access
14
YoU'Re having a BBQ this
weekend, and you'Re IN chaRge
of hamburger Meat. Right Now
there is a sale where you cAN
buy 5 pounds of Meat 'for $10.
At this Rate, hoW Much meat
Could you buy fOR $30?
arrow_forward
Esp
An airplane travels 2028 kilometers against the wind in 3 hours and 2298 kilometers with the wind in the same amount of time. What is the rate of the plane in
still air and what is the rate of the wind?
Note that the ALEKS graphing calculator can be used to make computations easier.
km
Rate of the plane in still air:
團
km
Rate of the wind:
Explanation
Check
2020 McGraw-H Education. All Rights Reserved Terms of Use Privacy Accessibility
MacBook Air
F9
23
%24
2
3
5
6.
7
8
%3D
Q
W
E
R
Y
U
P
A
S
D
F
G
K
ck
C
В
N
M
alt
alt
trol
option
command
command
option
I
>
N
arrow_forward
M Inbox (1,031) - timothy.ho X
b My Questions | bartleby
M Inbox (3,442) - tphoke88 X
M Inbox (258) - thoke@pdx. X
G nba games today - Googl X +
A https://mail.google.com/mail/u/1/#inbox?projector=1
e starting next week (Mond..
7. A cannonball is fired from a cannon at the edge of a hill. See the figure below:
from long, dark days towar..
,031
y
ve already responded to o...
ment folder Week 11 projec...
a
g one of the nation's most ..
ve a good final, and a good..
Vo = 20 m/sec is the cannonball's initial speed, g = 10 m/sec? (gravity constant), a = 1/3 is the
initial trajectory angle, and 0 = 1/6 is the (constant) slope of the hill. Ignore wind resistance;
use our equations for trajectory motion here.
Privacy Policy
Offer valid until 12/12/202.
Find R, the vector that gives the position where the cannonball hits the hill (assuming the
cannon is at the origin).
forward to next term so I c...
er no matter where you are...
hanges potlucks tov drive.
10:40 AM
2 Type here to search
13%…
arrow_forward
help please answer in text form with proper workings and explanation for each and every part and steps with concept and introduction no AI no copy paste remember answer must be in proper format with all working
arrow_forward
O WorkforQuara X
CBM Student- Taki x
C Clever | Portal x
Math To Do, i x
C Clever Portal x
Sign in to Out X
eady.com/student/dashboard/home
Marlo is sitting on a dock playing with a ring on her finger at an elevation of 2- meters
above the surface of the water. The ring falls off her finger and lands at an elevation of
-2 meters. Which value best represents the distance the ring falls, in meters?
-3
-2
-1
1
3
4- meters
4 meters
2- meters
meters
Done
My Progress
Copyrighte 2021 by Curriculum Associates. All rights reserved. These materials, or any portion thereof, may not be reproduced or shared in any manner without express written consent of Curriculum Associates.
arrow_forward
ráphing.
2. 4x-2y = -8
y = 2x + 4
24
arrow_forward
Transshipment
please use excel sheet for the answer
arrow_forward
H3
arrow_forward
Part B
How many loads of laundry can you wash if you work 3 shifts?
Express your answer numerically as an integer.
> View Available Hint(s)
loads
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- pl4ease do not provide solution in image format thank you!arrow_forwardhelp please answer in text form with proper workings and explanation for each and every part and steps with concept and introduction no AI no copy paste remember answer must be in proper format with all workingarrow_forwardCan you please help me with thisarrow_forward
- Please explain each step clearly, and no excel formula should be used for solving this problemarrow_forward5lqloGU/edit Sora Zearn Papa's Burgeria - Pl.. Apps September equinox.. Request edit access 14 YoU'Re having a BBQ this weekend, and you'Re IN chaRge of hamburger Meat. Right Now there is a sale where you cAN buy 5 pounds of Meat 'for $10. At this Rate, hoW Much meat Could you buy fOR $30?arrow_forwardEsp An airplane travels 2028 kilometers against the wind in 3 hours and 2298 kilometers with the wind in the same amount of time. What is the rate of the plane in still air and what is the rate of the wind? Note that the ALEKS graphing calculator can be used to make computations easier. km Rate of the plane in still air: 團 km Rate of the wind: Explanation Check 2020 McGraw-H Education. All Rights Reserved Terms of Use Privacy Accessibility MacBook Air F9 23 %24 2 3 5 6. 7 8 %3D Q W E R Y U P A S D F G K ck C В N M alt alt trol option command command option I > Narrow_forward
- M Inbox (1,031) - timothy.ho X b My Questions | bartleby M Inbox (3,442) - tphoke88 X M Inbox (258) - thoke@pdx. X G nba games today - Googl X + A https://mail.google.com/mail/u/1/#inbox?projector=1 e starting next week (Mond.. 7. A cannonball is fired from a cannon at the edge of a hill. See the figure below: from long, dark days towar.. ,031 y ve already responded to o... ment folder Week 11 projec... a g one of the nation's most .. ve a good final, and a good.. Vo = 20 m/sec is the cannonball's initial speed, g = 10 m/sec? (gravity constant), a = 1/3 is the initial trajectory angle, and 0 = 1/6 is the (constant) slope of the hill. Ignore wind resistance; use our equations for trajectory motion here. Privacy Policy Offer valid until 12/12/202. Find R, the vector that gives the position where the cannonball hits the hill (assuming the cannon is at the origin). forward to next term so I c... er no matter where you are... hanges potlucks tov drive. 10:40 AM 2 Type here to search 13%…arrow_forwardhelp please answer in text form with proper workings and explanation for each and every part and steps with concept and introduction no AI no copy paste remember answer must be in proper format with all workingarrow_forwardO WorkforQuara X CBM Student- Taki x C Clever | Portal x Math To Do, i x C Clever Portal x Sign in to Out X eady.com/student/dashboard/home Marlo is sitting on a dock playing with a ring on her finger at an elevation of 2- meters above the surface of the water. The ring falls off her finger and lands at an elevation of -2 meters. Which value best represents the distance the ring falls, in meters? -3 -2 -1 1 3 4- meters 4 meters 2- meters meters Done My Progress Copyrighte 2021 by Curriculum Associates. All rights reserved. These materials, or any portion thereof, may not be reproduced or shared in any manner without express written consent of Curriculum Associates.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillElementary AlgebraAlgebraISBN:9780998625713Author:Lynn Marecek, MaryAnne Anthony-SmithPublisher:OpenStax - Rice UniversityHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
- Algebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningIntermediate AlgebraAlgebraISBN:9781285195728Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt