Lab #2 - Data and Variables - Julia Smithers

pdf

School

Texas Tech University *

*We aren’t endorsed by this school

Course

3314

Subject

Political Science

Date

Apr 3, 2024

Type

pdf

Pages

5

Report

Uploaded by SargentAtom13526

POLS 3314 Lab 2 Data and Variables Download the .Rdata file titled “Lab 2 Data” from Blackboard. Open it to populate it into RStudio. You should see nine variables loaded on the right pane. All R commands are in brackets below { }. Do not use the brackets in the R command line (they’re just to separate the commands from the text here). 1. country – the unit of analysis 2. year – the dataset is a cross-section of countries for the year 2013 3. region – the geographic region where each country is located 4. life_expectancy – the total life expectancy at birth, measured in years 5. gdp_growth – annual growth of gdp, measured as a % 6. cellphone_subscriptions – the count of mobile phone subscriptions per 100 people 7. women_businesslaw_score – an additive score ranging from 1 to 10 recording women’s engagement in business and law industries 8. annual_precipitation – average precipitation, measured in depth by discrete millimeters 9. disaster_risk_reduction – a score ranging from 1 (worst) to 5 (best) tracking a country’s progress in reducing risks related to natural disasters Remember always to include the libraries used at the beginning of your R script file: { # Install the packages (if necessary) install.packages("questionr") install.packages("ggplot2") # Load the libraries library(questionr) library(ggplot2) } 1. Identify the nominal variable in the list. What is the most appropriate measure of central tendency for this variable? Nominal Variable: Region Most appropriate measure of central tendency: mode 2. Run a frequency table for the nominal variable { freq(data$varname,cum = TRUE, total = TRUE)}. Which category are you most interested in? What percentage of cases falls into that category? The North America Region - 0.90% for the cases that falls into that category. 3. Generate a bar graph for the nominal variable { ggplot(data = data, aes(x = region)) + geom_bar() + scale_x_discrete(limit = c(1, 2, 3, 4, 5, 6, 7),
POLS 3314 Lab 2 labels = c('E. Asia / Pacific','Europe / C. Asia','S. America / Caribbean','M. East / N. Africa','N. America','S. Asia','Sub-Saharan Africa')) + theme(axis.text.x = element_text(angle = 45, vjust = .5, hjust = .5)) }. Copy / paste the barplot of the nominal variable here: 4. Identify the two ordinal variables in the list. Select one and describe the crucial junctures featured in the rank statistics (min, max, median, IQR). {summary(data$ varname )} 1. Women business law score 2. Disaster risk reduction: Min: 1.00 Max: 5.00 Median: 3.00 IGQ: 1st QR: 3.00 3rd QU: 4.00 5. Generate a bar graph for your ordinal variable. Copy / paste it into this document. Describe what kind of distribution (modality) you find. { ggplot(data, aes(x = varname)) + geom_bar() + scale_x_continuous(breaks=seq(min,max,1))}
POLS 3314 Lab 2 The distribution (modality) of the graph is negatively skewed, and is presented by a unimodal modality. 6. Identify the numerical variables in the list. Select one and report its median and mean. { summary(data$ varname )} 1. GDP Growth: - Median: 3.360 - Mean: 3.278 2. Life expectancy 3. Annual precipitation 4. Cell phone subscriptions 7. For the same numerical variable, report the variance and the standard deviation. { var(data$varname, na.rm=TRUE) sd(data$varname, na.rm=TRUE)} GDP Growth: Variance: 25.23 SD: 5.02
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
POLS 3314 Lab 2 8. Generate a histogram (with kernel density lines if you’re feeling ambitious) for your chosen numerical variable. Copy / paste it here. { ggplot(data, aes(x = varname)) + geom_histogram()} or {hist(data$varname)} 9. Generate a box plot for the variable cellphone_subscriptions . {boxplot(data$cellphone_subscriptions)} Copy / paste it here. Describe what you see. What do the outliers imply (how should they be interpreted)? Observing the graph, you can interrupt: - the values between 25 and 75 in the first quartile are individuals with cell phone subscriptions - the values between 75 and 100 - the median - in the second quartile are individuals with cell phone subscriptions - the third quartile the values between 100 and 125 are represented by individuals with cell phone subscriptions - the values between 125 and 200 are the fourth quartile with cell phone subscriptions. Also, the graph shows that: - minimum is 25 - the median is 100 - the maximum is 200
POLS 3314 Lab 2 - The outliers (245 and 300) lie in the extreme of the data 10. Based on your knowledge of measurement metrics, which variable in the list is the most precise and therefore conveys the most information? The most precise variable is the numerical variables. They are the values that tell us the exact quantity of a characteristic. Knowing this, life expectancy is the most precise and conveys the most information.