Lab #2 - Data and Variables - Julia Smithers
pdf
School
Texas Tech University *
*We aren’t endorsed by this school
Course
3314
Subject
Political Science
Date
Apr 3, 2024
Type
Pages
5
Uploaded by SargentAtom13526
POLS 3314
Lab 2
Data and Variables
Download the .Rdata file titled “Lab 2 Data” from Blackboard. Open it to populate it into RStudio.
You should see nine variables loaded on the right pane. All R commands are in brackets below {
}. Do not use the brackets in the R command line (they’re just to separate the commands from
the text here).
1.
country – the unit of analysis
2.
year – the dataset is a cross-section of countries for the year 2013
3.
region – the geographic region where each country is located
4.
life_expectancy – the total life expectancy at birth, measured in years
5.
gdp_growth – annual growth of gdp, measured as a %
6.
cellphone_subscriptions – the count of mobile phone subscriptions per 100 people
7.
women_businesslaw_score – an additive score ranging from 1 to 10 recording women’s
engagement in business and law industries
8.
annual_precipitation – average precipitation, measured in depth by discrete millimeters
9.
disaster_risk_reduction – a score ranging from 1 (worst) to 5 (best) tracking a country’s
progress in reducing risks related to natural disasters
Remember always to include the libraries used at the beginning of your R script file:
{
# Install the packages (if necessary)
install.packages("questionr")
install.packages("ggplot2")
# Load the libraries
library(questionr)
library(ggplot2)
}
1.
Identify the nominal variable in the list. What is the most appropriate measure of central
tendency for this variable?
Nominal Variable: Region
Most appropriate measure of central tendency: mode
2.
Run a frequency table for the nominal variable { freq(data$varname,cum = TRUE, total =
TRUE)}. Which category are you most interested in? What percentage of cases falls into that
category?
The North America Region - 0.90% for the cases that falls into that category.
3.
Generate a bar graph for the nominal variable {
ggplot(data = data, aes(x = region)) +
geom_bar() +
scale_x_discrete(limit = c(1, 2, 3, 4, 5, 6, 7),
POLS 3314
Lab 2
labels = c('E. Asia / Pacific','Europe / C. Asia','S. America / Caribbean','M. East / N.
Africa','N. America','S. Asia','Sub-Saharan Africa')) +
theme(axis.text.x = element_text(angle = 45, vjust = .5, hjust = .5))
}.
Copy / paste the barplot of the nominal variable here:
4.
Identify the two ordinal variables in the list. Select
one
and describe the crucial junctures
featured in the rank statistics (min, max, median, IQR). {summary(data$
varname
)}
1. Women business law score
2. Disaster risk reduction:
Min: 1.00
Max: 5.00
Median: 3.00
IGQ: 1st
QR: 3.00
3rd QU: 4.00
5.
Generate a bar graph for your ordinal variable. Copy / paste it into this document. Describe
what kind of distribution (modality) you find. {
ggplot(data, aes(x = varname)) +
geom_bar() +
scale_x_continuous(breaks=seq(min,max,1))}
POLS 3314
Lab 2
The distribution (modality) of the graph is negatively skewed, and is presented by a
unimodal modality.
6.
Identify the numerical variables in the list. Select
one
and report its median and mean. {
summary(data$
varname
)}
1. GDP Growth:
-
Median: 3.360
-
Mean: 3.278
2. Life expectancy
3. Annual precipitation
4. Cell phone subscriptions
7.
For the same numerical variable, report the variance and the standard deviation. {
var(data$varname, na.rm=TRUE)
sd(data$varname, na.rm=TRUE)}
GDP Growth: Variance: 25.23 SD: 5.02
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
POLS 3314
Lab 2
8.
Generate a histogram (with kernel density lines if you’re feeling ambitious) for your chosen
numerical variable. Copy / paste it here. { ggplot(data, aes(x = varname)) +
geom_histogram()} or {hist(data$varname)}
9.
Generate a box plot for the variable
cellphone_subscriptions
.
{boxplot(data$cellphone_subscriptions)} Copy / paste it here. Describe what you see. What
do the outliers imply (how should they be interpreted)?
Observing the graph, you can interrupt:
-
the values between
25 and 75
in the first quartile are individuals with cell phone
subscriptions
-
the values between
75 and 100
- the median - in the second quartile are individuals with
cell phone subscriptions
-
the third quartile the values between
100 and 125
are represented by individuals with cell
phone subscriptions
-
the values between
125 and 200
are the fourth quartile with cell phone subscriptions.
Also, the graph shows that:
-
minimum is 25
-
the median is 100
-
the maximum is 200
POLS 3314
Lab 2
-
The outliers (245 and 300) lie in the extreme of the data
10. Based on your knowledge of measurement metrics, which variable in the list is the most
precise and therefore conveys the most information?
The most precise variable is the numerical variables.
They are the values that tell us the exact quantity of a characteristic. Knowing this, life
expectancy is the most precise and conveys the most information.