Lab-7_ Intro to ggplot2

.pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

33B

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

23

Uploaded by ProfProton22246

Report
Get started with "ggplot" and its main functions: ggplot() : creates an object of class "ggplot" aes() : aesthetic function to map variables into visual attributes geom_...() : geometric-object functions to indicate what to plot theme() : primary function to specify secondary elements in a graph facet_...() : functions to create facets Write your code and content in a qmd (quarto markdown) file. You can use the provided source qmd file (bCourses). Name this file as lab07-first-last.qmd , where first and last are your first and last names (e.g. lab07-gaston-sanchez.qmd ). Submit both your qmd and HTML files to the corresponding assignment submission in bCourses. Please note that submitting only one of the files will result in an automatic 10% deduction. Also, if you submit the incorrect files you will receive no credit. We’ll be working with the tibble starwars from "ggplot2" . These exercises are meant to help you: gain familiarity with the aes() function learn about the various geoms , or geometric objects, and recognize them understand why and how to facet try out different plot themes You should have already loaded the package tidyverse. Type View(starwars) in the console to take a look at starwars . You can also look at the documentation by typing in ?starwars . Stat 33B Gaston Sanchez Lab-7: Intro to ggplot2 AUTHOR Learning Objectives General Instructions library (tidyverse) Graphics with "ggplot2"
Let’s first take a look at the relationship between height and mass with a scatterplot. This involves using geom_point() Warning: Removed 28 rows containing missing values (`geom_point()`). a. Does anything happen if you don’t name the arguments to aes() , i.e. you type in aes(height, mass) ? What if you type in aes(height, mass) ? Aesthetic mapping: * `x` -> `height` * `y` -> `mass` 1 Aesthetic Mapping Function aes() ggplot ( data = starwars, aes ( x = height, y = mass)) + geom_point () 1.1 Your Turn: # your code aes (height, mass)
Aesthetic mapping: * `x` -> `mass` * `y` -> `height` b. Let’s restrict the data set to just individuals of male sex , with height greater than 150 cms and mass above 100 kilos. Create a table males , and make another scatterplot of height and mass a. Refer to the table males . Let’s label each male individual using their name by adding a geom_text() layer, and don’t forget to put aes() in its argument. #i think the spec meant for me to switch the order of the varaibles aes (mass, height) # your code males <- filter (starwars, sex == "male" & height > 150 & mass > 100 ) ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () 1.2 Your Turn: using geom_text()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
b. Some of the names are not fully displayed. Modify your code above by increasing the x-axis limits, and also possibly the y-axis limits, so that all names are fully displayed without being truncated. Hint : take a look at the function xlim() . # your code ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () + geom_text ( aes ( label= nam # your code ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () + geom_text ( aes ( label= nam
c. As you can tell, the names overlap with the points. Modify your code above by using the nudge_y argument in geom_text() so that the names don’t overlap with the points. Does it go inside or outside of aes() ? # your code ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () + geom_text ( aes ( label= nam
d. Now, replace geom_text() with geom_label() . What difference do you notice? Did you have to modify the arguments to aes() at all? # your code ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () + geom_label ( aes ( label= name), nudge_y = 3 ) + xlim ( 170 , 240 ) + ylim ( 100 , 170 )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
e. Next, cut and paste the aes(x = height, y = mass) from the argument of ggplot() to the argument of geom_point() . Do you run into an error? What if you copy the x and y arguments over to the aes() function in geom_text() ? Error in `geom_label()`: ! Problem while setting up geom. Error occurred in the 2nd layer. Caused by error in `compute_geom_1()`: ! `geom_label()` requires the following missing aesthetics: x and y #the labels negate all my hard work with the nudge_y function from part c. I didn't need # your code ggplot ( data = males) + geom_point ( aes ( x = height, y = mass)) + geom_label ( aes ( label= name), nudge_y = 3 ) + xlim ( 170 , 240 ) + ylim ( 100 , 170 ) #that results in an error geom_label()` requires the following missing aesthetics: x and # your code #if I copy the x and y arguments to the aes() funtion in geom_text,: ggplot ( data = males) +
Error in `geom_point()`: ! Problem while setting up geom. Error occurred in the 1st layer. Caused by error in `compute_geom_1()`: ! `geom_point()` requires the following missing aesthetics: x and y Let’s go back to the full data set. Warning: Removed 28 rows containing missing values (`geom_point()`). geom_point () + geom_label ( aes ( x = height, y = mass, label= name), nudge_y = 3 ) + xlim ( 170 , 240 ) + ylim ( 100 , 170 ) #i get a similar error; geom_pointrequires the following missing aesthetics: x and y 2 Adding color ggplot ( data = starwars, aes ( x = height, y = mass)) + geom_point ()
a. First, make all the points blue. Should you use aes() ? Warning: Removed 28 rows containing missing values (`geom_point()`). b. Next, let’s color the points by gender . Should you use aes() this time? Warning: Removed 28 rows containing missing values (`geom_point()`). 2.1 Your turn: color-coding points # your code here ggplot ( data = starwars, aes ( x = height, y = mass)) + geom_point ( color= "blue" ) #don't use aes() # your code ggplot ( data = starwars, aes ( x = height, y = mass)) + geom_point ( aes ( color= gender))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c. Try coloring the points off of other variables. Which variables display a reasonable amount of colors? Which ones display far too many? Warning: Removed 28 rows containing missing values (`geom_point()`). # your code here base <- ggplot ( data = starwars, aes ( x = height, y = mass)) base + geom_point ( aes ( color= hair_color)) #too many colors
Warning: Removed 28 rows containing missing values (`geom_point()`). base + geom_point ( aes ( color= skin_color)) #too many colors
Warning: Removed 28 rows containing missing values (`geom_point()`). base + geom_point ( aes ( color= eye_color)) #too many colors
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Warning: Removed 28 rows containing missing values (`geom_point()`). base + geom_point ( aes ( color= species)) #way too many colors
Warning: Removed 28 rows containing missing values (`geom_point()`). base + geom_point ( aes ( color= homeworld)) #way too many colors
Let’s add a regression line to our scatterplot of the males table. `geom_smooth()` using formula = 'y ~ x' ##honestly, I think only sex and gender have a reasonable number of colors 3 Adding smoothers males <- starwars %>% filter (sex == 'male' & height > 150 & mass > 100 ) ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () + geom_smooth ( method = 'lm' )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
geom_smooth() with the argument method = 'lm' plots a least squares regression line for height on mass . The translucent gray band is a confidence interval for the predictions of mass . a. Modify the code above by adding a vertical line at using geom_vline() . Does it require aes() ? `geom_smooth()` using formula = 'y ~ x' 3.1 Yout Turn # your code here ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () + geom_smooth ( method = 'lm' ) + geom_vline ( xintercept = 200 )
b. Now try adding a vertical line at the mean of height . Does it require aes() this time? `geom_smooth()` using formula = 'y ~ x' ##no, it doesn't require aes() # your code here ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () + geom_smooth ( method = 'lm' ) + geom_vline ( xintercept = mean (males $ height))
c. Do the same with a horizontal line at the mean of mass . Play around with color and linewidth. Should those arguments go inside or outside of aes() ? `geom_smooth()` using formula = 'y ~ x' #still doesn't require aes() # your code here ggplot ( data = males, aes ( x = height, y = mass)) + geom_point () + geom_smooth ( method = 'lm' ) + geom_hline ( yintercept = mean (males $ mass), color= "green" , l
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Let’s consider gender values "masculine" and "femenine" # A tibble: 2 × 1 gender <chr> 1 masculine 2 feminine In starwars2 are only two unique values for gender . Let’s compare the relationship between height and mass between gender using facet_wrap() . ##outisde of aes() 4 Using facets starwars2 = starwars %>% filter ( ! is.na (gender)) distinct (starwars2, gender) ggplot ( data = starwars2, aes ( x = height, y = mass)) + geom_point () +
Warning: Removed 25 rows containing missing values (`geom_point()`). a. Using starwars , create a table humans_droids by selecting columns name , height , mass , and species , of individuals that are either "Human" or "Droid" (see species column). And then create a scatterplot of height and mass with facets based on species . Warning: Removed 15 rows containing missing values (`geom_point()`). facet_wrap ( ~ gender) 4.1 Your Turn: # your code humans_droids <- select (starwars, name, height, mass, species) %>% filter (species == "Human" | species == "Droid" ) ggplot ( data = humans_droids, aes ( x = height, y = mass)) + geom_point () + facet_wrap ( ~ species)
b. facet_grid() works slightly differently from facet_wrap . The latter takes in only one variable, which always goes behind the ~ , and it ‘wraps’ the plots left to right, top to bottom. In contrast, facet_grid() allows you to facet into just rows or just columns. columns: facet_grid(. ~ species) rows: facet_grid(species ~ .) The . is a placeholder for a variable. Create scatterplots with facet_grid() using both options: row facets, and column facets Warning: Removed 15 rows containing missing values (`geom_point()`). # facet into columns ggplot ( data = humans_droids, aes ( x = height, y = mass)) + geom_point () + facet_grid (. ~ species)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Warning: Removed 15 rows containing missing values (`geom_point()`). # facet into rows ggplot ( data = humans_droids, aes ( x = height, y = mass)) + geom_point () + facet_grid (species ~ .)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help