project01
pdf
School
University of Oregon *
*We aren’t endorsed by this school
Course
101
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
23
Uploaded by MasterGrouse3886
project01
March 20, 2024
0.1
Project 1: World Progress
In this project, you’ll explore data from
Gapminder.org
, a website dedicated to providing a fact-
based view of the world and how it has changed.
That site includes several data visualizations
and presentations, but also publishes the raw data that we will use in this project to recreate and
extend some of their most famous visualizations.
The Gapminder website collects data from many sources and compiles them into tables that describe
many countries around the world.
All of the data they aggregate are published in the
Systema
Globalis
. Their goal is “to compile all public statistics; Social, Economic and Environmental; into
a comparable total dataset.”
All data sets in this project are copied directly from the Systema
Globalis without any changes.
This project is dedicated to
Hans Rosling
(1948-2017), who championed the use of data to under-
stand and prioritize global development challenges.
0.1.1
Logistics
Deadline.
This project is due at 11:59pm on the due date in Canvas.
Late work will not be
accepted as per the course policies. It’s
much
better to be early than late, so start working now.
Rules.
Don’t share your code with anybody.
You are welcome to discuss questions with other
students, but don’t share the answers. The experience of solving the problems in this project will
prepare you for exams (and life).
If someone asks you for the answer, resist!
Instead, you can
demonstrate how you would solve a similar problem.
Support.
You are not alone! Come to offce hours, post on Slack/Canvas Chat, and talk to your
classmates.
If you want to ask about the details of your solution to a problem, make a private
Slack/Chat post and the staff will respond. Take advantage of the plentiful help hours provided by
the Learning Assistants.
Tests.
The tests that are given are
not comprehensive
and passing the tests for a question
does
not
mean that you answered the question correctly. Tests usually only check that your table has
the correct column labels.
However, more tests will be applied to verify the correctness of your
submission in order to assign your final score, so be careful and check your work! You might want
to create your own checks along the way to see if your answers make sense. Additionally, before
you submit, make sure that none of your cells take a very long time to run (several minutes).
Free Response Questions:
Make sure that you put the answers to the written questions in the
indicated cell we provide.
Advice.
Develop your answers incrementally. To perform a complicated table manipulation, break
1
it up into steps, perform each step on a different line, give a new name to each result, and check
that each intermediate result is what you expect. You can add any additional names or functions
you want to the provided cells.
Make sure that you are using distinct and meaningful variable
names throughout the notebook. Along that line,
DO NOT
reuse the variable names that we use
when we grade your answers. For example, in Question 1 of the Global Poverty section, we ask you
to assign an answer to
latest
. Do not reassign the variable name
latest
to anything else in your
notebook, otherwise there is the chance that our tests grade against what
latest
was reassigned
to.
You
never
have to use just one line in this project or any others. Use intermediate variables and
multiple lines as much as you would like!
To get started, load
datascience
,
numpy
,
plots
, and
otter
.
[2]:
from
datascience
import
*
import
numpy
as
np
%
matplotlib
inline
import
matplotlib.pyplot
as
plots
plots
.
style
.
use(
'fivethirtyeight'
)
import
otter
grader
=
otter
.
Notebook()
'imports complete'
[2]:
'imports complete'
0.2
1. Global Population Growth
The global population of humans reached 1 billion around 1800, 3 billion around 1960, and 7 billion
around 2011.
The potential impact of exponential population growth has concerned scientists,
economists, and politicians alike.
The UN Population Division estimates that the world population will likely continue to grow
throughout the 21st century, but at a slower rate, perhaps reaching 11 billion by 2100. However,
the UN does not rule out scenarios of more extreme growth.
In this section, we will examine some of the factors that influence population growth and how they
are changing around the world.
The first table we will consider is the total population of each country over time.
Run the cell
below.
[3]:
population
=
Table
.
read_table(
'population.csv'
)
population
.
show(
3
)
<IPython.core.display.HTML object>
Note:
The population csv file can also be found
here
. The data for this project was downloaded
in February 2017.
2
0.2.1
Bangladesh
In the
population
table, the
geo
column contains three-letter codes established by the
International
Organization for Standardization
(ISO) in the
Alpha-3
standard. We will begin by taking a close
look at Bangladesh. Inspect the standard to find the 3-letter code for Bangladesh.
Question
1.
Create
a
table
called
b_pop
that
has
two
columns
labeled
time
and
population_total
. The first column should contain the years from 1970 through 2015 (including
both 1970 and 2015) and the second should contain the population of Bangladesh in each of those
years.
[4]:
b_pop
=
population
.
where(
'geo'
,
'bgd'
)
.
select(
'time'
,
'population_total'
)
.
↪
where(
'time'
, are
.
between_or_equal_to(
1970
,
2015
))
b_pop
[4]:
time | population_total
1970 | 65048701
1971 | 66417450
1972 | 67578486
1973 | 68658472
1974 | 69837960
1975 | 71247153
1976 | 72930206
1977 | 74848466
1978 | 76948378
1979 | 79141947
… (36 rows omitted)
[5]:
grader
.
check(
"q1_1"
)
[5]:
q1_1 results: All test cases passed!
Run the following cell to create a table called
b_five
that has the population of Bangladesh every
five years.
At a glance, it appears that the population of Bangladesh has been growing quickly
indeed!
[6]:
b_pop
.
set_format(
'population_total'
, NumberFormatter)
fives
=
np
.
arange(
1970
,
2016
,
5
)
# 1970, 1975, 1980, ...
b_five
=
b_pop
.
sort(
'time'
)
.
where(
'time'
, are
.
contained_in(fives))
b_five
[6]:
time | population_total
1970 | 65,048,701
1975 | 71,247,153
1980 | 81,364,176
1985 | 93,015,182
1990 | 105,983,136
1995 | 118,427,768
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2000 | 131,280,739
2005 | 142,929,979
2010 | 151,616,777
2015 | 160,995,642
Question 2.
Assign
initial
to an array that contains the population for every five year interval
from 1970 to 2010. Then, assign
changed
to an array that contains the population for every five
year interval from 1975 to 2015.
You should use the
b_five
table to create both arrays, first
filtering the table to only contain the relevant years.
We have provided the code below that uses
initial
and
changed
in order to add a column to
b_five
called
annual_growth
. Don’t worry about the calculation of the growth rates; run the test
below to test your solution.
If you are interested in how we came up with the formula for growth rates, consult the
growth rates
section of the textbook.
[7]:
#b_five.bin('population_total', bins=np.arange(1970,2010,5)
initial
=
b_five
.
where(
'time'
, are
.
between_or_equal_to(
1970
,
2010
))
.
↪
column(
'population_total'
)
changed
=
b_five
.
where(
'time'
, are
.
between_or_equal_to(
1975
,
2015
))
.
↪
column(
'population_total'
)
b_1970_through_2010
=
b_five
.
where(
'time'
, are
.
below_or_equal_to(
2010
))
b_five_growth
=
b_1970_through_2010
.
with_column(
'annual_growth'
, (changed
/
↪
initial)
**0.2-1
)
b_five_growth
.
set_format(
'annual_growth'
, PercentFormatter)
[7]:
time | population_total | annual_growth
1970 | 65,048,701
| 1.84%
1975 | 71,247,153
| 2.69%
1980 | 81,364,176
| 2.71%
1985 | 93,015,182
| 2.64%
1990 | 105,983,136
| 2.25%
1995 | 118,427,768
| 2.08%
2000 | 131,280,739
| 1.71%
2005 | 142,929,979
| 1.19%
2010 | 151,616,777
| 1.21%
[8]:
grader
.
check(
"q1_2"
)
[8]:
q1_2 results: All test cases passed!
While the population has grown every five years since 1970, the annual growth rate decreased
dramatically from 1985 to 2005. Let’s look at some other information in order to develop a possible
explanation.
Run the next cell to load three additional tables of measurements about countries
over time.
4
[9]:
life_expectancy
=
Table
.
read_table(
'life_expectancy.csv'
)
child_mortality
=
Table
.
read_table(
'child_mortality.csv'
)
.
relabel(
2
,
␣
↪
'child_mortality_under_5_per_1000_born'
)
fertility
=
Table
.
read_table(
'fertility.csv'
)
The
life_expectancy
table contains a statistic that is often used to measure how long people live,
called
life expectancy at birth
. This number, for a country in a given year,
does not measure how
long babies born in that year are expected to live
. Instead, it measures how long someone would
live, on average, if the
mortality conditions
in that year persisted throughout their lifetime. These
“mortality conditions” describe what fraction of people at each age survived the year.
So, it is
a way of measuring the proportion of people that are staying alive, aggregated over different age
groups in the population.
Run the following cells below to see
life_expectancy
,
child_mortality
, and
fertility
. Refer
back to these tables as they will be helpful for answering further questions!
[10]:
life_expectancy
[10]:
geo
| time | life_expectancy_years
afg
| 1800 | 28.21
afg
| 1801 | 28.2
afg
| 1802 | 28.19
afg
| 1803 | 28.18
afg
| 1804 | 28.17
afg
| 1805 | 28.16
afg
| 1806 | 28.15
afg
| 1807 | 28.14
afg
| 1808 | 28.13
afg
| 1809 | 28.12
… (43847 rows omitted)
[11]:
child_mortality
[11]:
geo
| time | child_mortality_under_5_per_1000_born
afg
| 1800 | 468.6
afg
| 1801 | 468.6
afg
| 1802 | 468.6
afg
| 1803 | 468.6
afg
| 1804 | 468.6
afg
| 1805 | 468.6
afg
| 1806 | 470
afg
| 1807 | 470
afg
| 1808 | 470
afg
| 1809 | 470
… (40746 rows omitted)
[12]:
fertility
5
[12]:
geo
| time | children_per_woman_total_fertility
afg
| 1800 | 7
afg
| 1801 | 7
afg
| 1802 | 7
afg
| 1803 | 7
afg
| 1804 | 7
afg
| 1805 | 7
afg
| 1806 | 7
afg
| 1807 | 7
afg
| 1808 | 7
afg
| 1809 | 7
… (43402 rows omitted)
Question 3.
Perhaps population is growing more slowly because people aren’t living as long. Use
the
life_expectancy
table to draw a line graph with the years 1970 and later on the horizontal
axis that shows how the
life expectancy at birth
has changed in Bangladesh.
[13]:
#Fill in code here
plot
=
life_expectancy
.
where(
'geo'
,
'bgd'
)
.
where(
'time'
, are
.
↪
between_or_equal_to(
1970
,
2015
))
life_plot
=
plot
.
plot(
'time'
,
'life_expectancy_years'
)
life_plot
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 4.
Assuming everything else stays the same, do the trends in life expectancy in the graph
above directly explain why the population growth rate decreased from 1985 to 2010 in Bangladesh?
Why or why not?
Hint: What happened in Bangladesh in 1991, and does that event explain the overall change in
population growth rate?
In 1991, in Bangladesh there was a cyclone that killed 135,000 people. This does explain the sudden
drop in population growth rate due to the large amount of people that died
The
fertility
table contains a statistic that is often used to measure how many babies are being
born, the
total fertility rate
. This number describes the
number of children a woman would have
in her lifetime
, on average, if the current rates of birth by age of the mother persisted throughout
her child bearing years, assuming she survived through age 49.
Question 5.
Write a function
fertility_over_time
that takes the Alpha-3 code of a
country
and a
start
year. It returns a two-column table with labels
Year
and
Children per woman
that
7
can be used to generate a line chart of the country’s fertility rate each year, starting at the
start
year.
The plot should include the
start
year and all later years that appear in the
fertility
table.
Then, in the next cell, call your
fertility_over_time
function on the Alpha-3 code for Bangladesh
and the year 1970 in order to plot how Bangladesh’s fertility rate has changed since 1970. Note
that the function
fertility_over_time
should not return the plot itself.
The expression that
draws the line plot is provided for you; please don’t change it.
[14]:
def
fertility_over_time
(country, start):
"""Create a two-column table that describes a country's total fertility
␣
↪
rate each year."""
country_fertility
=
fertility
.
where(
'geo'
,are
.
equal_to(country))
.
↪
sort(
'time'
)
country_fertility_after_start
=
country_fertility
.
where(
'time'
, are
.
↪
above_or_equal_to(start))
rate
=
country_fertility_after_start
.
column(
2
)
year
=
country_fertility_after_start
.
column(
1
)
return
Table()
.
with_columns(
"Year"
, year,
"Children per woman"
, rate)
bangladesh_code
=
'bgd'
fertility_over_time(bangladesh_code,
1970
)
.
plot(
0
,
1
)
# You should *not* change
␣
↪
this line.
8
[15]:
grader
.
check(
"q1_5"
)
[15]:
q1_5 results: All test cases passed!
Question 6.
Assuming everything else is constant, do the trends in fertility in the graph above
help directly explain why the population growth rate decreased from 1985 to 2010 in Bangladesh?
Why or why not?
Write your answer here, replacing this text.
It has been observed that lower fertility rates are often associated with lower child mortality rates.
The link has been attributed to family planning: if parents can expect that their children will all
survive into adulthood, then they will choose to have fewer children. We can see if this association
is evident in Bangladesh by plotting the relationship between total fertility rate and
child mortality
rate per 1000 children
.
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 7.
Using both the
fertility
and
child_mortality
tables, draw a scatter diagram
that has Bangladesh’s total fertility on the horizontal axis and its child mortality on the vertical
axis with one point for each year, starting with 1970.
The expression that draws the scatter diagram is provided for you; please don’t change
it.
Instead, create a table called
post_1969_fertility_and_child_mortality
with the appro-
priate column labels and data in order to generate the chart correctly.
Use the label
Children
per woman
to describe total fertility and the label
Child deaths per 1000 born
to describe child
mortality.
[16]:
bgd_fertility
=
fertility
.
where(
'geo'
, are
.
equal_to(
'bgd'
))
.
where(
'time'
, are
.
↪
above_or_equal_to(
1970
))
.
select(
'time'
,
'children_per_woman_total_fertility'
)
bgd_child_mortality
=
child_mortality
.
where(
'geo'
, are
.
equal_to(
'bgd'
))
.
↪
where(
'time'
, are
.
above_or_equal_to(
1970
))
.
select(
'time'
,
␣
↪
'child_mortality_under_5_per_1000_born'
)
fertility_and_child_mortality
=
bgd_fertility
.
join(
'time'
, bgd_child_mortality)
post_1969_fertility_and_child_mortality
=
fertility_and_child_mortality
.
↪
relabeled(
'children_per_woman_total_fertility'
,
'Children per woman'
)
.
↪
relabeled(
'child_mortality_under_5_per_1000_born'
,
'Child deaths per 1000
␣
↪
born'
)
post_1969_fertility_and_child_mortality
.
scatter(
'Children per woman'
,
'Child
␣
↪
deaths per 1000 born'
)
# You should *not* change this line.
10
[17]:
grader
.
check(
"q1_7"
)
[17]:
q1_7 results: All test cases passed!
Question 8.
In one or two sentences, describe the association (if any) that is illustrated by this
scatter diagram. Does the diagram show that reduced child mortality causes parents to choose to
have fewer children?
Yes, because as child mortality increases, people will try to have more children to increase the
chances that their children survive. As the mortality rate decreases, there is a higher chance that
children will survive, so less reason to have more.
0.3
You are about half way through the project, congratulate yourself and keep
up the good work
0.3.1
The World
The change observed in Bangladesh since 1970 can also be observed in many other developing
countries: health services improve, life expectancy increases, and child mortality decreases. At the
same time, the fertility rate often plummets, and so the population growth rate decreases despite
11
increasing longevity.
Run the cell below to generate two overlaid histograms, one for 1960 and one for 2010, that show
the distributions of total fertility rates for these two years among all 201 countries in the
fertility
table.
[18]:
Table()
.
with_columns(
'1960'
, fertility
.
where(
'time'
,
1960
)
.
column(
2
),
'2010'
, fertility
.
where(
'time'
,
2010
)
.
column(
2
)
)
.
hist(bins
=
np
.
arange(
0
,
10
,
0.5
), unit
=
'child per woman'
)
_
=
plots
.
xlabel(
'Children per woman'
)
_
=
plots
.
ylabel(
'Percent per children per woman'
)
_
=
plots
.
xticks(np
.
arange(
10
))
Question 9.
Assign
fertility_statements
to an array of the numbers of each statement below
that can be correctly inferred from these histograms. 1. About the same number of countries had
a fertility rate between 3.5 and 4.5 in both 1960 and 2010. 1. In 2010, about 40% of countries had
a fertility rate between 1.5 and 2. 1. In 1960, less than 20% of countries had a fertility rate below
3. 1. More countries had a fertility rate above 3 in 1960 than in 2010. 1. At least half of countries
had a fertility rate between 5 and 8 in 1960. 1. At least half of countries had a fertility rate below
3 in 2010.
[19]:
fertility_statements
=
make_array(
1
,
3
,
4
,
5
,
6
)
[20]:
grader
.
check(
"q1_9"
)
[20]:
q1_9 results: All test cases passed!
Question 10.
Draw a line plot of the world population from 1800 through 2005.
The world
population is the sum of all the country’s populations.
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
[21]:
#Fill in code here
world_population
=
population
.
where(
'time'
, are
.
between_or_equal_to(
1800
,
2005
))
.
↪
group(
'time'
,
sum
)
.
plot(
0
,
2
)
Question
11.
Create
a
function
stats_for_year
that
takes
a
year
and
returns
a
ta-
ble of statistics.
The table it returns should have four columns:
geo
,
population_total
,
children_per_woman_total_fertility
, and
child_mortality_under_5_per_1000_born
. Each
row should contain one Alpha-3 country code and three statistics: population, fertility rate, and
child mortality for that
year
from the
population
,
fertility
and
child_mortality
tables. Only
include rows for which all three statistics are available for the country and year.
In addition, restrict the result to country codes that appears in
big_50
, an array of the 50 most
populous countries in 2010. This restriction will speed up computations later in the project.
After you write
stats_for_year
, try calling
stats_for_year
on any year between 1960 and 2010.
13
Try to understand the output of stats_for_year.
Hint
: The tests for this question are quite comprehensive, so if you pass the tests, your function
is probably correct. However, without calling your function yourself and looking at the output, it
will be very diffcult to understand any problems you have, so try your best to write the function
correctly and check that it works before you rely on the
ok
tests to confirm your work.
[22]:
# We first create a population table that only includes the
# 50 countries with the largest 2010 populations. We focus on
# these 50 countries only so that plotting later will run faster.
big_50
=
population
.
where(
'time'
, are
.
equal_to(
2010
))
.
sort(
"population_total"
,
␣
↪
descending
=
True
)
.
take(np
.
arange(
50
))
.
column(
'geo'
)
population_of_big_50
=
population
.
where(
'time'
, are
.
above(
1959
))
.
where(
'geo'
,
␣
↪
are
.
contained_in(big_50))
def
stats_for_year
(year):
"""Return a table of the stats for each country that year."""
p
=
population_of_big_50
.
where(
'time'
, are
.
equal_to(year))
.
drop(
'time'
)
f
=
fertility
.
where(
'time'
, are
.
equal_to(year))
.
drop(
'time'
)
c
=
child_mortality
.
where(
'time'
, are
.
equal_to(year))
.
drop(
'time'
)
stats_for_year
=
p
.
join(
'geo'
,f,
'geo'
)
.
join(
'geo'
,c,
'geo'
)
return
stats_for_year
[23]:
grader
.
check(
"q1_11"
)
[23]:
q1_11 results: All test cases passed!
Question
12.
Create a table called
pop_by_decade
with two columns called
decade
and
population
. It has a row for each
year
since 1960 that starts a decade. The
population
column
contains the total population of all countries included in the result of
stats_for_year(year)
for
the first
year
of the decade. For example, 1960 is the first year of the 1960’s decade. You should
see that these countries contain most of the world’s population.
Hint:
One approach is to define a function
pop_for_year
that computes this total population,
then
apply
it to the
decade
column. The
stats_for_year
function from the previous question
may be useful here.
This first test is just a sanity check for your helper function if you choose to use it. You will not
lose points for not implementing the function
pop_for_year
.
Note:
The cell where you will generate the
pop_by_decade
table is below the cell where you can
choose to define the helper function
pop_for_year
. You should define your
pop_by_decade
table
in the cell that starts with the table
decades
being defined.
[24]:
def
pop_for_year
(year):
total_pop
=
sum
(stats_for_year(year)
.
column(
1
))
return
total_pop
[25]:
grader
.
check(
"q1_12_0"
)
14
[25]:
q1_12_0 results: All test cases passed!
Now that you’ve defined your helper function (if you’ve chosen to do so), define the
pop_by_decade
table.
[26]:
decades
=
Table()
.
with_column(
'decade'
, np
.
arange(
1960
,
2011
,
10
))
pop_by_decade
=
decades
.
with_column(
'population'
, decades
.
apply(pop_for_year,
␣
↪
'decade'
))
pop_by_decade
.
set_format(
1
, NumberFormatter)
[26]:
decade | population
1960
| 2,624,944,597
1970
| 3,211,487,418
1980
| 3,880,722,003
1990
| 4,648,434,558
2000
| 5,367,553,063
2010
| 6,040,810,517
[27]:
grader
.
check(
"q1_12"
)
[27]:
q1_12 results: All test cases passed!
The
countries
table describes various characteristics of countries. The
country
column contains
the same codes as the
geo
column in each of the other data tables (
population
,
fertility
, and
child_mortality
). The
world_6region
column classifies each country into a region of the world.
Run the cell below to inspect the data.
[28]:
countries
=
Table
.
read_table(
'countries.csv'
)
.
where(
'country'
, are
.
↪
contained_in(population
.
group(
'geo'
)
.
column(
'geo'
)))
countries
.
select(
'country'
,
'name'
,
'world_6region'
)
[28]:
country
| name
| world_6region
afg
| Afghanistan
| south_asia
akr_a_dhe | Akrotiri and Dhekelia | europe_central_asia
alb
| Albania
| europe_central_asia
dza
| Algeria
| middle_east_north_africa
asm
| American Samoa
| east_asia_pacific
and
| Andorra
| europe_central_asia
ago
| Angola
| sub_saharan_africa
aia
| Anguilla
| america
atg
| Antigua and Barbuda
| america
arg
| Argentina
| america
… (245 rows omitted)
Question 13.
Create a table called
region_counts
that has two columns,
region
and
count
.
It should contain two columns: a region column and a count column that contains the number of
countries in each region that appear in the result of
stats_for_year(1960)
. For example, one row
15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
would have
south_asia
as its
world_6region
value and an integer as its
count
value: the number
of large South Asian countries for which we have population, fertility, and child mortality numbers
from 1960.
[29]:
region_counts
=
stats_for_year(
1960
)
.
join(
'geo'
,countries,
'country'
)
.
↪
group(
'world_6region'
)
.
relabeled(
0
,
'region'
)
region_counts
[29]:
region
| count
america
| 8
east_asia_pacific
| 10
europe_central_asia
| 10
middle_east_north_africa | 7
south_asia
| 5
sub_saharan_africa
| 10
[30]:
grader
.
check(
"q1_13"
)
[30]:
q1_13 results: All test cases passed!
The following scatter diagram compares total fertility rate and child mortality rate for each country
in 1960. The area of each dot represents the population of the country, and the color represents its
region of the world. Run the cell. Do you think you can identify any of the dots?
[31]:
from
functools
import
lru_cache
as
cache
# This cache annotation makes sure that if the same year
# is passed as an argument twice, the work of computing
# the result is only carried out once.
@cache
(
None
)
def
stats_relabeled
(year):
"""Relabeled and cached version of stats_for_year."""
return
stats_for_year(year)
.
relabel(
2
,
'Children per woman'
)
.
relabel(
3
,
␣
↪
'Child deaths per 1000 born'
)
def
fertility_vs_child_mortality
(year):
"""Draw a color scatter diagram comparing child mortality and fertility."""
with_region
=
stats_relabeled(year)
.
join(
'geo'
, countries
.
select(
'country'
,
␣
↪
'world_6region'
),
'country'
)
with_region
.
scatter(
2
,
3
, sizes
=1
, group
=4
, s
=500
)
plots
.
xlim(
0
,
10
)
plots
.
ylim(
-50
,
500
)
plots
.
title(year)
fertility_vs_child_mortality(
1960
)
16
Question 14.
Assign
scatter_statements
to an array of the numbers of each statement below
that can be inferred from this scatter diagram for 1960. 1. As a whole, the
europe_central_asia
region had the lowest child mortality rate. 1. The lowest child mortality rate of any country was
from an
east_asia_pacific
country. 1. Most countries had a fertility rate above 5. 1. There was
an association between child mortality and fertility.
1.
The two largest countries by population
also had the two highest child mortality rate.
[32]:
scatter_statements
=
make_array(
1
,
3
,
4
)
[33]:
grader
.
check(
"q1_14"
)
[33]:
q1_14 results: All test cases passed!
The result of the cell below is interactive. Drag the slider to the right to see how countries have
changed over time. You’ll find that the great divide between so-called “Western” and “developing”
countries that existed in the 1960’s has nearly disappeared. This shift in fertility rates is the reason
that the global population is expected to grow more slowly in the 21st century than it did in the
19th and 20th centuries.
Note:
Don’t worry if a red warning pops up when running the cell below. You’ll still be able to
run the cell!
[34]:
import
ipywidgets
as
widgets
_
=
widgets
.
interact(fertility_vs_child_mortality,
year
=
widgets
.
IntSlider(
min
=1960
,
max
=2015
, value
=1960
))
interactive(children=(IntSlider(value=1960, description='year', max=2015,
␣
↪
min=1960), Output()), _dom_classes=(…
17
Now is a great time to take a break and watch the same data presented by
Hans Rosling in a 2010
TEDx talk
with smoother animation and witty commentary.
0.4
2. Global Poverty
In 1800, 85% of the world’s 1 billion people lived in
extreme poverty
, defined by the United Nations as
“a condition characterized by severe deprivation of basic human needs, including food, safe drinking
water, sanitation facilities, health, shelter, education and information.”
A common measure of
extreme poverty is a person living on less than $1.25 per day.
In 2018, the proportion of people living in extreme poverty was estimated to be 8%.
Although
the world rate of extreme poverty has declined consistently for hundreds of years, the number of
people living in extreme poverty is still over 600 million. The United Nations recently adopted an
ambitious goal
: “By 2030, eradicate extreme poverty for all people everywhere.” In this section, we
will examine extreme poverty trends around the world.
First, load the population and poverty rate by country and year and the country descriptions.
While the
population
table has values for every recent year for many countries, the
poverty
table
only includes certain years for each country in which a measurement of the rate of extreme poverty
was available.
[35]:
population
=
Table
.
read_table(
'population.csv'
)
countries
=
Table
.
read_table(
'countries.csv'
)
.
where(
'country'
, are
.
↪
contained_in(population
.
group(
'geo'
)
.
column(
'geo'
)))
poverty
=
Table
.
read_table(
'poverty.csv'
)
poverty
.
show(
3
)
<IPython.core.display.HTML object>
Question 1.
Assign
latest_poverty
to a three-column table with one row for each country that
appears in the
poverty
table. The first column should contain the 3-letter code for the country. The
second column should contain the most recent year for which an extreme poverty rate is available
for the country. The third column should contain the poverty rate in that year.
Do not change
the last line, so that the labels of your table are set correctly.
Hint
: think about how
group
works: it does a sequential search of the table (from top to bottom)
and collects values in the array in the order in which they appear, and then applies a function to
that array. The
first
function may be helpful, but you are not required to use it.
[36]:
def
first
(values):
return
values
.
item(
0
)
latest_poverty
=
poverty
.
sort(
'time'
,
True
)
.
group(
'geo'
,first)
latest_poverty
=
latest_poverty
.
relabeled(
0
,
'geo'
)
.
relabeled(
1
,
'time'
)
.
↪
relabeled(
2
,
'poverty_percent'
)
# You should *not* change this line.
latest_poverty
[36]:
geo
| time | poverty_percent
ago
| 2009 | 43.37
alb
| 2012 | 0.46
18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
arg
| 2011 | 1.41
arm
| 2012 | 1.75
aus
| 2003 | 1.36
aut
| 2004 | 0.34
aze
| 2008 | 0.31
bdi
| 2006 | 81.32
bel
| 2000 | 0.5
ben
| 2012 | 51.61
… (135 rows omitted)
[37]:
grader
.
check(
"q2_1"
)
[37]:
q2_1 results: All test cases passed!
Question 2.
Using both
latest_poverty
and
population
, create a four-column table called
recent_poverty_total
with one row for each country in
latest_poverty
.
The four columns
should have the following labels and contents, in the following order: 1.
geo
contains the 3-letter
country code, 1.
poverty_percent
contains the most recent poverty percent, 1.
population_total
contains the population of the country in 2010, 1.
poverty_total
contains the number of people in
poverty
rounded to the nearest integer
, based on the 2010 population and most recent poverty
rate.
[38]:
poverty_and_pop
=
latest_poverty
.
drop(
'time'
)
.
join(
'geo'
, population
.
↪
where(
'time'
,
2010
))
.
drop(
'time'
)
recent_poverty_total
=
poverty_and_pop
.
with_column(
'poverty_total'
, np
.
↪
round(poverty_and_pop
.
column(
1
)
*
poverty_and_pop
.
column(
2
)
/100
))
recent_poverty_total
[38]:
geo
| poverty_percent | population_total | poverty_total
ago
| 43.37
| 21219954
| 9.20309e+06
alb
| 0.46
| 2901883
| 13349
arg
| 1.41
| 41222875
| 581243
arm
| 1.75
| 2963496
| 51861
aus
| 1.36
| 22162863
| 301415
aut
| 0.34
| 8391986
| 28533
aze
| 0.31
| 9099893
| 28210
bdi
| 81.32
| 9461117
| 7.69378e+06
bel
| 0.5
| 10929978
| 54650
ben
| 51.61
| 9509798
| 4.90801e+06
… (135 rows omitted)
[39]:
grader
.
check(
"q2_2"
)
[39]:
q2_2 results: All test cases passed!
Question 3.
Assign the name
poverty_percent
to the known percentage of the world’s 2010
population that were living in extreme poverty. Assume that the
poverty_total
numbers in the
recent_poverty_total
table describe
all
people in 2010 living in extreme poverty.
You should
19
find a number that is above the 2018 global estimate of 8%, since many country-specific poverty
rates are older than 2018.
Hint
:
The sum of the
population_total
column in the
recent_poverty_total
table is not
the
world
population,
because
only
a
subset
of
the
world’s
countries
are
included
in
the
recent_poverty_total
table (only some countries have known poverty rates). Use the
population
table to compute the world’s 2010 total population..
[40]:
poverty_percent
= 100*
sum
(recent_poverty_total
.
column(
'poverty_total'
))
/
↪
sum
(population
.
where(
'time'
,
2010
)
.
column(
'population_total'
))
poverty_percent
[40]:
14.299370218520854
[41]:
grader
.
check(
"q2_3"
)
[41]:
q2_3 results: All test cases passed!
The
countries
table includes not only the name and region of countries, but also their positions
on the globe.
[42]:
countries
.
select(
'country'
,
'name'
,
'world_4region'
,
'latitude'
,
'longitude'
)
[42]:
country
| name
| world_4region | latitude | longitude
afg
| Afghanistan
| asia
| 33
| 66
akr_a_dhe | Akrotiri and Dhekelia | europe
| nan
| nan
alb
| Albania
| europe
| 41
| 20
dza
| Algeria
| africa
| 28
| 3
asm
| American Samoa
| asia
| -11.056
| -171.082
and
| Andorra
| europe
| 42.5078
| 1.52109
ago
| Angola
| africa
| -12.5
| 18.5
aia
| Anguilla
| americas
| 18.2167
| -63.05
atg
| Antigua and Barbuda
| americas
| 17.05
| -61.8
arg
| Argentina
| americas
| -34
| -64
… (245 rows omitted)
Question 4.
Using both
countries
and
recent_poverty_total
, create a five-column table called
poverty_map
with one row for every country in
recent_poverty_total
. The five columns should
have the following labels and contents: 1.
latitude
contains the country’s latitude, 1.
longitude
contains the country’s longitude, 1.
name
contains the country’s name, 1.
region
contains the
country’s region from the
world_4region
column of
countries
, 1.
poverty_total
contains the
country’s poverty total.
[43]:
poverty_map
=
countries
.
relabel(
'world_4region'
,
'region'
)
.
select(
'country'
,
␣
↪
'latitude'
,
'longitude'
,
'name'
,
'region'
)
.
join(
'country'
,
␣
↪
recent_poverty_total
.
select(
0
,
3
),
'geo'
)
.
drop(
'country'
)
poverty_map
20
[43]:
latitude | longitude | name
| region
| poverty_total
-12.5
| 18.5
| Angola
| africa
| 9.20309e+06
41
| 20
| Albania
| europe
| 13349
-34
| -64
| Argentina
| americas | 581243
40.25
| 45
| Armenia
| europe
| 51861
-25
| 135
| Australia
| asia
| 301415
47.3333
| 13.3333
| Austria
| europe
| 28533
40.5
| 47.5
| Azerbaijan | europe
| 28210
-3.5
| 30
| Burundi
| africa
| 7.69378e+06
50.75
| 4.5
| Belgium
| europe
| 54650
9.5
| 2.25
| Benin
| africa
| 4.90801e+06
… (135 rows omitted)
[44]:
grader
.
check(
"q2_4"
)
[44]:
q2_4 results: All test cases passed!
Run the cell below to draw a map of the world in which the areas of circles represent the number
of people living in extreme poverty. Double-click on the map to zoom in.
[45]:
# It may take a few seconds to generate this map.
colors
=
{
'africa'
:
'blue'
,
'europe'
:
'black'
,
'asia'
:
'red'
,
'americas'
:
␣
↪
'green'
}
scaled
=
poverty_map
.
with_columns(
'poverty_total'
,
1e-4 *
poverty_map
.
column(
'poverty_total'
),
'region'
, poverty_map
.
apply(colors
.
get,
'region'
)
)
Circle
.
map_table(scaled)
[45]:
<datascience.maps.Map at 0x7fab75ca08d0>
Although people live in extreme poverty throughout the world (with more than 5 million in the
United States), the largest numbers are in Asia and Africa.
Question 5.
Assign
largest
to a two-column table with the
name
(not the 3-letter code) and
poverty_total
of the 10 countries with the largest number of people living in extreme poverty.
[46]:
largest
=
poverty_map
.
sort(
'poverty_total'
, descending
=
True
)
.
↪
set_format(
'poverty_total'
, NumberFormatter)
.
take(np
.
arange(
10
))
.
select(
2
,
4
)
largest
.
set_format(
'poverty_total'
, NumberFormatter)
[46]:
name
| poverty_total
India
| 290,881,638.00
Nigeria
| 98,891,167.00
China
| 83,944,643.00
Bangladesh
| 65,574,256.00
Congo, Dem. Rep. | 57,841,438.00
Indonesia
| 39,141,326.00
21
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Ethiopia
| 32,213,991.00
Pakistan
| 21,663,595.00
Tanzania
| 19,847,979.00
Madagascar
| 18,480,426.00
[47]:
grader
.
check(
"q2_5"
)
[47]:
q2_5 results: All test cases passed!
Question 6.
Write a function called
poverty_timeline
that takes
the name of a country
(not
the geo code) as its argument. It should draw a line plot of the number of people living in poverty
in that country with time on the horizontal axis. The line plot should have a point for each row in
the
poverty
table for that country. To compute the population living in poverty from a poverty
percentage, multiply by the population of the country
in that year
.
Hint:
To make your plot, you will first need to make a table.
Hint:
This question is long. Feel free to create cells and experiment.
[48]:
def
population_for_country_in_year
(row_of_poverty_table):
return
population
.
where(
'time'
, row_of_poverty_table
.
item(
'time'
))
.
↪
where(
'geo'
, row_of_poverty_table
.
item(
'geo'
))
.
column(
'population_total'
)
.
↪
item(
0
)
def
poverty_timeline
(country):
geo
=
countries
.
where(
'name'
, country)
.
column(
'country'
)
.
item(
0
)
country_poverty
=
poverty
.
where(
'geo'
, geo)
Table()
.
with_columns(
'Year'
, country_poverty
.
column(
1
),
'Number in
␣
↪
poverty'
, country_poverty
.
column(
2
)
/ 100 *
country_poverty
.
↪
apply(population_for_country_in_year))
.
plot(
0
,
1
)
Finally, draw the timelines below to see how the world is changing. You can check your work by
comparing your graphs to the ones on
gapminder.org
.
poverty_timeline(‘India’)
poverty_timeline(‘Nigeria’)
poverty_timeline(‘China’)
poverty_timeline(‘United States’)
Although the number of people living in extreme poverty has been increasing in Nigeria and the
United States, the massive decreases in China and India have shaped the overall trend that extreme
poverty is decreasing worldwide, both in percentage and in absolute number.
To learn more, watch
Hans Rosling in a 2015 film
about the UN goal of eradicating extreme poverty
from the world.
Below, we’ve also added an interactive dropdown menu for you to visualize
poverty_timeline
graphs for other countries. Note that each dropdown menu selection may take a few seconds to
run.
[49]:
# Just run this cell
all_countries
=
poverty_map
.
column(
'name'
)
_
=
widgets
.
interact(poverty_timeline, country
=
list
(all_countries))
22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
interactive(children=(Dropdown(description='country', options=('Angola',
␣
↪
'Albania', 'Argentina', 'Armenia', 'A…
You’re finished!
Congratulations on mastering data visualization and table manipulation.
Be sure to run the tests and verify that they all pass, then
Save
your changes, then
Download
your file to your host machine (if you are using jupyterhub), then submit your file to the project 1
Canvas
assignment by 11:59pm on the due date.
[ ]:
[ ]:
23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you

Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Recommended textbooks for you
- Algebra and Trigonometry (MindTap Course List)AlgebraISBN:9781305071742Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Elementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage Learning

Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning