final_exam_math208_2021_Q12_sols

.pdf

School

McGill University *

*We aren’t endorsed by this school

Course

208

Subject

Industrial Engineering

Date

Apr 3, 2024

Type

pdf

Pages

19

Uploaded by SargentBarracuda3621

Report
Question 1 [50 points] data(midwest) midwest_modified<-midwest %>% select(county,state,popdensity, popwhite,popblack, popamerindian,popasian, popother,inmetro) The data for this question comes from a modified version of the midwest dataset from the ggplot library. str(midwest_modified) tibble [437 × 9] (S3: tbl_df/tbl/data.frame) $ county : chr [1:437] "ADAMS" "ALEXANDER" "BOND" "BOONE" ... $ state : chr [1:437] "IL" "IL" "IL" "IL" ... $ popdensity : num [1:437] 1271 759 681 1812 324 ... $ popwhite : int [1:437] 63917 7054 14477 29344 5264 35157 5298 16519 13384 1465 06 ... $ popblack : int [1:437] 1702 3496 429 127 547 50 1 111 16 16559 ... $ popamerindian: int [1:437] 98 19 35 46 14 65 8 30 8 331 ... $ popasian : int [1:437] 249 48 16 150 5 195 15 61 23 8033 ... $ popother : int [1:437] 124 9 34 1139 6 221 0 84 6 1596 ... $ inmetro : int [1:437] 0 0 0 1 0 0 0 0 0 1 ... midwest_modified %>% slice(1:5) %>% select(county:popblack) county <chr> state <chr> popdensity <dbl> popwhite <int> popblack <int> ADAMS IL 1270.9615 63917 1702 ALEXANDER IL 759.0000 7054 3496 BOND IL 681.4091 14477 429 BOONE IL 1812.1176 29344 127 BROWN IL 324.2222 5264 547 5 rows midwest_modified %>% slice(1:5) %>% select(county,popamerindian:popother) county popamerindian popasian popother
<chr> <int> <int> <int> ADAMS 98 249 124 ALEXANDER 19 48 9 BOND 35 16 34 BOONE 46 150 1139 BROWN 14 5 6 5 rows The dataset contains population data from midwest counties in five states in the United States from an unspecified year. There are identifying variables for both the county (the name) and the state (the postal abbreviation). The variable popdensity is a measure of density (population per unspecified area units). The variable inmetro is equal to 1 if the county is classified as a metropolitan area and 0 otherwise. The other variables contain counts of population size within self-identified racial classifications. CONTINUED ON NEXT PAGE a. [5 pts] Write a line of code that will generate the following tibble (or data.frame ) containing the highest population density from each state: state <chr> Highest_Pop_Den <dbl> IL 88018.40 IN 34659.09 MI 60333.91 OH 54313.08 WI 63951.67 5 rows Solution: midwest_modified %>% group_by(state) %>% summarise(Highest_Pop_Den=max(popdensity)) state <chr> Highest_Pop_Den <dbl> IL 88018.40 IN 34659.09 MI 60333.91 OH 54313.08
WI 63951.67 5 rows b. [5 pts] Write a line of code that adds a new column to the midwest_modified tibble called Metro where the elements of that column are equal to a string “Metro” if inmetro is equal to 1 and “NonMetro” if inmetro is equal to 0. The first five rows are given below for the county , state , inmetro and Metro columns: county <chr> state <chr> inmetro <int> Metro <chr> ADAMS IL 0 NonMetro ALEXANDER IL 0 NonMetro BOND IL 0 NonMetro BOONE IL 1 Metro BROWN IL 0 NonMetro 5 rows Solution: midwest_modified<-midwest_modified %>% mutate(Metro=ifelse(inmetro==1,"Metro","NonMet ro")) head(midwest_modified %>% select(county, state,inmetro,Metro),5) county <chr> state <chr> inmetro <int> Metro <chr> ADAMS IL 0 NonMetro ALEXANDER IL 0 NonMetro BOND IL 0 NonMetro BOONE IL 1 Metro BROWN IL 0 NonMetro 5 rows c. [5 pts] Write a line of code that will generate the following tibble (or data.frame ) containing the highest population density from each state for metropolitan and non-metropolitan counties separately, using the modified tibble from part (b). dens_table
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
state <chr> Metro <chr> Highest_Pop_Den <dbl> IL Metro 88018.397 IL NonMetro 2309.320 IN Metro 34659.087 IN NonMetro 3090.375 MI Metro 60333.914 MI NonMetro 2250.645 OH Metro 54313.077 OH NonMetro 5484.214 WI Metro 63951.667 WI NonMetro 2343.750 1-10 of 10 rows Solution: dens_table<-midwest_modified %>% group_by(state,Metro) %>% summarise(Highest_Pop_Den =max(popdensity)) dens_table state <chr> Metro <chr> Highest_Pop_Den <dbl> IL Metro 88018.397 IL NonMetro 2309.320 IN Metro 34659.087 IN NonMetro 3090.375 MI Metro 60333.914 MI NonMetro 2250.645 OH Metro 54313.077 OH NonMetro 5484.214 WI Metro 63951.667 WI NonMetro 2343.750 1-10 of 10 rows
CONTINUED ON NEXT PAGE
d. [5 pts] Assume the tibble from part (c) is called dens_table as above. Now write a line of code that produces a tibble which arranges the data above so that we have separate columns for “Metro” and “NonMetro”, as below: state <chr> Metro <dbl> NonMetro <dbl> IL 88018.40 2309.320 IN 34659.09 3090.375 MI 60333.91 2250.645 OH 54313.08 5484.214 WI 63951.67 2343.750 5 rows Solution: result_tibble<-dens_table %>% pivot_wider(id_cols=state,names_from=Metro, values_from=Highest_Pop_Den) head(result_tibble) state <chr> Metro <dbl> NonMetro <dbl> IL 88018.40 2309.320 IN 34659.09 3090.375 MI 60333.91 2250.645 OH 54313.08 5484.214 WI 63951.67 2343.750 5 rows Now we will work with only a modified version of the population counts for each county. e. [5 pts] Write a line of code to add a new variable to the data frame named HighDens which is equal to “High” if the population density for the county is higher than 1500 and “Not High” if the population density for the county is lower than 1500. Below are the first 5 rows of the data for the county , popdensity and HighDens columns: county <chr> popdensity <dbl> HighDens <chr> ADAMS 1270.9615 NotHigh ALEXANDER 759.0000 NotHigh
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
BOND 681.4091 NotHigh BOONE 1812.1176 High BROWN 324.2222 NotHigh 5 rows Then we will compute the total number of people in each combination of state , inmetro and HighDens using the code below: pop_xtabs<-xtabs( I(popwhite+popblack+popamerindian+popasian+popother)~ state+Metro+HighDens,data=midwest_modified) pop_xtabs , , HighDens = High Metro state Metro NonMetro IL 9323624 405933 IN 3728008 689565 MI 7697643 354081 OH 8811604 1078957 WI 3004347 386892 , , HighDens = NotHigh Metro state Metro NonMetro IL 250175 1450870 IN 234438 892148 MI 0 1243573 OH 98555 857999 WI 326825 1173705 CONTINUED ON NEXT PAGE
f. [5 pts] What will the code pop_xtabs["IL",1,2] return as output? Solution: pop_xtabs["IL",1,2] [1] 250175 g. [5 pts] Using only the pop_xtabs object above, write a line of code to find the total number of people in areas high density (i.e. HighDens is “High”) as below: High NotHigh 35480654 6528288 Solution: apply(pop_xtabs,3,sum) High NotHigh 35480654 6528288 h. [10 pts] Using only the pop_xtabs object above, write a line of code that computes the total population in the combination of State and HighDens to return the output below: HighDens state High NotHigh IL 9729557 1701045 IN 4417573 1126586 MI 8051724 1243573 OH 9890561 956554 WI 3391239 1500530 Solution: apply(pop_xtabs,c(1,3),sum) HighDens state High NotHigh IL 9729557 1701045 IN 4417573 1126586 MI 8051724 1243573 OH 9890561 956554 WI 3391239 1500530 i. [5 pts] Using only the pop_xtabs object above, write a line of code (or multiple lines of code) that computes the percentage of individuals in High and Low density in each state as below:
HighDens state High NotHigh IL 85.11850 14.881500 IN 79.67977 20.320233 MI 86.62148 13.378518 OH 91.18149 8.818511 WI 69.32541 30.674588 Solution: apply(pop_xtabs,c(1,3),sum)/apply(pop_xtabs,c(1),sum)*100 HighDens state High NotHigh IL 85.11850 14.881500 IN 79.67977 20.320233 MI 86.62148 13.378518 OH 91.18149 8.818511 WI 69.32541 30.674588 ##OR ## prop.table(apply(pop_xtabs,c(1,3),sum),1)*100 END OF QUESTION 1 Question 2 [50 points] We will re-use the same midwest_modified data that was used in Question 1, with all the modifications from the other question parts. The description is repeated below for your convenience. str(midwest_modified)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
spec_tbl_df [437 × 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame) $ county : chr [1:437] "ADAMS" "ALEXANDER" "BOND" "BOONE" ... $ state : chr [1:437] "IL" "IL" "IL" "IL" ... $ popdensity : num [1:437] 1271 759 681 1812 324 ... $ popwhite : num [1:437] 63917 7054 14477 29344 5264 ... $ popblack : num [1:437] 1702 3496 429 127 547 ... $ popamerindian: num [1:437] 98 19 35 46 14 65 8 30 8 331 ... $ popasian : num [1:437] 249 48 16 150 5 ... $ popother : num [1:437] 124 9 34 1139 6 ... $ inmetro : num [1:437] 0 0 0 1 0 0 0 0 0 1 ... $ Metro : chr [1:437] "NonMetro" "NonMetro" "NonMetro" "Metro" ... $ HighDens : chr [1:437] "NotHigh" "NotHigh" "NotHigh" "High" ... - attr(*, "spec")= .. cols( .. county = col_character(), .. state = col_character(), .. popdensity = col_double(), .. popwhite = col_double(), .. popblack = col_double(), .. popamerindian = col_double(), .. popasian = col_double(), .. popother = col_double(), .. inmetro = col_double(), .. Metro = col_character(), .. HighDens = col_character() .. ) - attr(*, "problems")=<externalptr> midwest_modified %>% slice(1:5) %>% select(county:popblack) county <chr> state <chr> popdensity <dbl> popwhite <dbl> popblack <dbl> ADAMS IL 1270.9615 63917 1702 ALEXANDER IL 759.0000 7054 3496 BOND IL 681.4091 14477 429 BOONE IL 1812.1176 29344 127 BROWN IL 324.2222 5264 547 5 rows midwest_modified %>% slice(1:5) %>% select(county,popamerindian:HighDens)
county <chr> popamerindian <dbl> popasian <dbl> popother <dbl> inmetro <dbl> Metro <chr> HighDens <chr> ADAMS 98 249 124 0 NonMetro NotHigh ALEXANDER 19 48 9 0 NonMetro NotHigh BOND 35 16 34 0 NonMetro NotHigh BOONE 46 150 1139 1 Metro High BROWN 14 5 6 0 NonMetro NotHigh 5 rows The dataset contains population data from midwest counties in five states in the United States from an unspecified year. There are identifying variables for both the county (the name) and the state (the postal abbreviation). The variable popdensity is a measure of density (population per unspecified area units). The variable inmetro is equal to 1 if the county is classified as a metropolitan area and 0 otherwise. The other variables contain counts of population size within self-identified racial classifications. CONTINUED ON NEXT PAGE a. [6 pts] Below are partially obscured code and three plots of the values of the log (base 10) of the population density for all counties: p1<-ggplot(midwest_modified,aes(x=popdensity)) + geom_XXXXX(nbins=30,fill="white",col ="black") + ggtitle("Plot 1") + theme_bw() + scale_x_log10() p2<-ggplot(midwest_modified,aes(x=popdensity)) + geom_YYYYY() + ggtitle("Plot 2") + theme_bw()+ scale_x_log10() p3<-ggplot(midwest_modified,aes(x=popdensity)) + geom_ZZZZZ() + ggtitle("Plot 3") + theme_bw()+ scale_x_log10() grid.arrange(grobs=list(p1,p2,p3),nrow=3,ncol=1) CONTINUED ON NEXT PAGE
Identify these three plots by name: Plot 1 Plot 2 Plot 3 Solution: Plot 1: Histogram Plot 2: Density plot Plot 3: Boxplot b. [10 pts] Now we make the same plots, but for each state. Do you believe there is evidence of an association between state and population density? In particular, do we see di ff erences in the distributions of population density by state?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
grid.arrange(grobs=list(p2,p3),nrow=2,ncol=1)
Solution: Yes, there is moderate evidence of an association. In particular, Ohio seems to have more counties with higher population densities than the other states. We also see that the spread for Michigan is much larger than the other states. CONTINUED ON NEXT PAGE
c. [4 pts] Which plot(s) do you think best shows the association between state and population density? Which plot(s) do you think does not shows the association between state and population density as clearly? Explain your answer and reasoning in a few sentences. Solution: The boxplots and histograms probably are the best. The boxplots show the di ff erences in central location better; the histograms show the di ff erences in spread and shape better. The density plot is less useful since it is very busy and it is influenced by the outliers. d. [5 pts] Which of the following plots could be used to assess the association between the popwhite and popblack variables? List all that apply (or say None if none would be appropriate). A. 2-d density plot B. Barplot C. Boxplot D. 2-d histogram Solution: A, D We now would like to make plots to take a di ff erent look at the population variables. Unfortunately, the format of the midwest_modified data needs to be further changed so that we can use it in a ggplot . e. [5 pts] Write a line of code that will create a new tibble converts the midwest_modified_new to “long” format where each row contains a population count for a specific racial group called Count , and the variable from where that count originated (e.g. popwhite ) as well as the state , county , and Metro information for that population group. You should not include the columns for HighDens , inmetro or popdensity . The first 10 rows of the new tibble are below midwest_modified_new %>% slice(1:10) county <chr> state <chr> Metro <chr> Race_Variable <chr> Count <dbl> ADAMS IL NonMetro popwhite 63917 ADAMS IL NonMetro popblack 1702 ADAMS IL NonMetro popamerindian 98 ADAMS IL NonMetro popasian 249 ADAMS IL NonMetro popother 124 ALEXANDER IL NonMetro popwhite 7054 ALEXANDER IL NonMetro popblack 3496 ALEXANDER IL NonMetro popamerindian 19 ALEXANDER IL NonMetro popasian 48 ALEXANDER IL NonMetro popother 9 1-10 of 10 rows Solution:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
midwest_modified_new<-midwest_modified %>% select(-popdensity,-inmetro,-HighDens) %>% pivot_longer(cols=popwhite:popother,names_to="Race_Variable", values_to="Count") CONTINUED ON NEXT PAGE
Below is a figure along with the code (partially obscured) which generated it. ggplot(midwest_modified_new,aes(x=______,fill=______,y=______, )) + geom_XXXXXX(stat="identity") + ggtitle("Plot f") + theme_bw() f. [5 pts] What are the missing geometry and aesthetics that generated the figure on the previous page (that is, what are the words that are missing in the code above for Plot f)? Solution: geom_bar is the geometry. The aesthetics are: aes(x=Metro, fill=Race_Variable,y=Count) . g. [5 pts] Note that the plot in part (f) is a bit di cult to use because it contains the counts, rather than the relative proportions. Write a line of code (or lines of code) to create a new tibble called metro_race_summaries which contains each racial population count and proportion relative to the level of the Metro variable as below: metro_race_summaries Metro <chr> Race_Variable <chr> Race_Count <dbl> Proportion <dbl> Metro popamerindian 99145 0.002961743
Metro popasian 538463 0.016085421 Metro popblack 4672825 0.139590573 Metro popother 668449 0.019968473 Metro popwhite 27496337 0.821393790 NonMetro popamerindian 50794 0.005952150 NonMetro popasian 34210 0.004008801 NonMetro popblack 144611 0.016945828 NonMetro popother 36402 0.004265665 NonMetro popwhite 8267706 0.968827556 1-10 of 10 rows Solution: metro_race_summaries<-midwest_modified_new %>% group_by(Metro,Race_Variable) %>% summarise(Race_Count = sum(Count)) %>% ungroup() %>% group_by(Metro) %>% mutate(Proportion=Race_Count/sum(Race_Count)) h. [5 pts] Using the tibble from (g), write a line of code that created the barplot below.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Solution: ggplot(metro_race_summaries, aes(x=Metro,fill=Race_Variable,y=Proportion,)) + geom_bar(stat="identity") + ggtitle("Plot g") + theme_bw() i. [5 pts] Based on the plot in part (h), would you conclude that there the population distribution of race varies between Metro and NonMetro areas? Explain your answer in a few sentences. Solution: Absolutely, there are substantially more white people in the nonMetro areas and fewer non-white people.