ALY6015_FinalProject_Patel
.docx
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6015
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
docx
Pages
42
Uploaded by BailiffJaguarPerson718
FINAL REPORT
ALY 6015 Intermediate Analytics Hardh Patel
Date: December 4
th
, 2023
Instructor:
Sergiy Shevchenko
INTRODUCTION
The U.S. Census Bureau carries out a monthly inquiry known as the Current Population Survey (CPS), gathering data on a plethora of demographic and economic characteristics that influence the American populace. This trove of information from the CPS is instrumental in shedding light on the nation's social and economic dynamics, serving as an indispensable tool
for both decision-makers and scholars.
This preliminary assessment aims to lay out our early observations and offer a detailed examination of the CPS data collection. Our objective is to deepen our comprehension of the prevailing conditions affecting individuals in the U.S. by exploring a variety of demographic and economic elements, including but not limited to age, sex, educational background, earnings, and employment circumstances. In addition, our examination will delve into various
population cohorts to pinpoint any imbalances and monitor evolution over periods.
Our present scrutiny is anchored in the data gleaned from the CPS for November 2022, encapsulating details on upwards of 123,000 individuals. This dataset encapsulates an array of demographic and economic attributes, spanning age, sex, ethnicity, educational achievements, financial status, professional classifications, and sectoral engagement.
To dissect the data, we employed exploratory methods, scrutinizing the likelihood distributions and condensed metrics for the assorted variables. To further elucidate our findings, we crafted graphical representations such as histograms and point diagrams, which serve to underscore recurring patterns and trajectories within the dataset.
EXPLORATORY DATA ANALYSIS DESCRIPTION
Employing descriptive statistical methods has underscored the importance of refining the data
and deepening our understanding of the involved metrics. Within this dataset, we found 123,009 entries across 388 distinct variables.
Our initial step in extracting pertinent conclusions and insights was to purify the dataset. This
process entailed the elimination of incomplete entries and the enhancement of the variables at
hand. For instance, employment status was subdivided into several groups, including those who are retired, employed, or unable to work. Furthermore, we distinguished variables relating to geographic region and cultural background to facilitate a more granular examination and utilized summary tables in our exploratory data analysis. The income range
for families was quantified by assigning a random number within the specified range for each
entry. The variable pertaining to educational attainment was also scrutinized, and a new category was established to delineate the various educational qualifications observed.
Upon the completion of the data purification phase, we will proceed to the analytical segment
of this document, wherein we intend to meticulously analyze the survey data. Our goal is to present an exhaustive narrative of the data gathering methodology, spotlighting any significant patterns and tendencies within the dataset. We will apply statistical indicators such
as the mean, median, and mode to discern the data distribution and evaluate the central tendencies of the numerical variables. This rigorous analysis is a crucial element of our endeavor, as it will facilitate the extraction of meaningful conclusions and insights from the data.
To better understand the disparity and interconnections among different metrics, we segmented the data into various subgroups. This strategy was instrumental in generating significant insights..
SUBSET 1: Region, Gender, and Metropolitan Status
Table 1: Descriptive summary on distribution of gender and metropolitan status
.
Midwest
Northeast
South
West
n = 19621
n = 15867
n = 36936
n = 27313
Gender
Female
9,901 (50.5%)
8,173 (51.5%)
19,260 (52.1%)
13,765 (50.4%)
Male
9,720 (49.5%)
7,694 (48.5%)
17,676 (47.9%)
13,548 (49.6%)
Metropolitan
Status
Metropolitan
14,713 (75%)
13,547 (85.4%)
30,034 (81.3%)
22,119 (81%)
Non
-
Metropolitan
4,908 (25%)
2,264 (14.3%)
6,355 (17.2%)
4,766 (17.4%)
Not
Identified
0 (0%)
56 (0.4%)
547 (1.5%)
428 (1.6%)
Table 1 illustrates the distribution of participants by sex and urban categorization within four major regions: the Midwest, Northeast, South, and West. The data indicates a higher count of
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
female participants compared to male participants. It was also observed that urban areas had a
higher representation in the dataset compared to rural areas. Moreover, the urban status of certain records remained unspecified.
SUBSET 2: Employment Status and Net Income
Table 2: Descriptive Statistics of Employment Status and Family Income
Employment
Status
Min
q1
Median
Mean
q3
Max
Disabled
2,009
13,873.0
29,049.0
45,083.62
57,154.00
299,936
Employed
-
Absent
2,025
43,870.0
83,466.0
102,292.02
139,974.50
298,709
Employed
-
At
Work
2,032
52,739.0
89,868.0
110,859.50
148,164.00
299,974
Other
2,009
38,080.5
73,363.0
95,847.31
134,490.00
299,976
Retired
2,017
28,881.0
51,678.5
70,097.58
88,759.75
299,852
Unemployed
- Looking
2,003
24,120.0
49,802.0
71,111.90
92,639.50
299,961
Unemployed
- On
Layoff
3,311
28,139.0
50,643.0
68,954.27
94,178.00
292,466
Table 2 presents the aggregate income levels in relation to the employment status of household members. From this table, it's clear that those who are employed report the highest
average income in comparison to other employment categories. On the other hand, individuals who are disabled report the lowest average income. However, the data also indicates that those with disabilities, while having the lowest average income, still receive the
highest incomes within their category when compared to peers in other job statuses.
SUBSET 3: Education Status and Net Income
Table 3: Descriptive Statistics on Education Status and Family Income
Education Staus
min
q1
median
mean
q3
max
College
2,002
34,008.25
59,836.5
78,655.07
99,648.5
299,963
Doctorate
2,017
45,689.00
87,418.0
108,170.59
151,540.5
299,991
Elementary
Education
2,043
19,841.00
37,425.0
55,823.03
68,732.0
299,916
Graduation
2,044
63,904.25
107,603.5
125,568.09
179,661.5
299,994
High
School
2,014
26,750.00
51,699.0
77,136.54
101,440.0
299,889
Table 3 delineates the correlation between educational achievements and the net earnings of the population. It is discernible from the data that individuals holding master's, professional, and bachelor's degrees are at the higher end of the earnings spectrum compared to those with other levels of education. This pattern of income distribution across different educational qualifications of family members provides a predictive framework for household net income.
SUBSET 4: Occupation and Total working hours
Table 4: Descriptive Statistics on Occupation and number of total working hours
Occupation
q1
median
mean
q3
max
Construction and Extraction
40.00
40
34.80478
40
134
Farming, Fishing and Forestry
6.00
40
28.84741
40
99
Installation, Maintenance and Repair
40.00
40
38.30394
40
85
Management and Business
40.00
40
38.39611
45
120
Office and Administrative Job
35.00
40
34.42317
40
100
Production Occupation
40.00
40
37.29208
40
99
Occupation
q1
median
mean
q3
max
Professional Occupation
36.00
40
36.00613
40
140
Sales Department
25.00
40
33.91423
40
138
Service Occupation
20.00
40
30.12495
40
127
Transportation
26.75
40
33.99396
40
139
Table 4 outlines the interplay and specific data concerning employment categories and the cumulative working hours as recorded in the dataset. Fields such as installation, maintenance,
and repair, along with management and business sectors, display the highest employment rates. Notably, professional roles are cited for having the greatest maximum working hours, yet the total number of individuals in such professions is comparatively low. These insights are pivotal for gauging the employment conditions of the populace.
The aforementioned data were methodically transformed from their original form to the current format to facilitate a more lucid comprehension and to conduct a descriptive analysis.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Research Questions
1. How are individuals dispersed across the various regions of the United States according on their gender and metropolitan status? 2. How does employment status affect net family income in household of US? 3. What are the good predictor variables for predicting net family income? 4. What is the relationship between net family income and its predictor variables?
ANALYSIS
Following an in-depth analysis, we gained insights into the interconnections between variables and their impact on other aspects within the dataset. The subsequent visuals provide
a deeper interpretation of these relationships.:
Figure 1: Visualization on average income based on gender.
The study of the gender-based average income in the United States census statistics for November 2022 shows that men dominate the income landscape.
Males made an average of $98000 while females made $90000.
The gender pay gap may indicate that there is still substantial gender-based wage disparities in the employment.
Figure 2: Visualization on relation of education level with employment status
Based on an analysis of census data, the graph above shows that education level seems to have a substantial impact on job position. Compared to other degree holders,
diploma graduates have the best job rates. The most popular degree among people is a
graduate diploma, followed by a doctoral degree.
The two job statuses with the highest prevalence in the community are employed-at-
work and retired. People with better education levels, such as bachelor's degrees, graduate diplomas, and master's degrees, are more likely to have the employed-at-
work classification.
According to the statistics, people with doctorates are mostly classified as having "Other" job statuses, which is unclear and calls for more research.
These results imply that diploma graduates may have more employment possibilities because of their better degrees of education.
Figure 3: Visualization on ethnicity of population based on region
The study of census population statistics on the spread of ethnicity across the United States is shown in the graph above, which demonstrates that Whites predominate in every area. Whites are more prevalent than other races in all regions, suggesting a major racial disparity, with Whites possibly having greater access to resources and chances.
Black people make up a larger percentage of the population in the South than in other parts of the country, which may be due to regional variations in political, cultural, and
societal variables.
Further investigation could examine the underlying causes of the concentration of ethnic groups in particular geographic areas and suggest possible tactics for fostering greater variety and equity throughout the nation. To ensure that policies and resources
are allocated fairly across various ethnic groups and locations, the data may also be used to influence decision-making in several areas, such as jobs, healthcare, and education.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Figure 4: Visualization on average income based on marital status and region.
The above graph shows how the typical income is distributed across the Midwest, Northeast, South, and West areas based on marital status. The plot supports the census
data's finding that married people who can see their partner earn more on average than
people with other marriage statuses.
Widowed people have the lowest incomes, which makes sense given that it takes more than one earner to lead the revenue charts. The dispersal of people across regions is almost similar, as evidenced by the data, which was evenly gathered from the four distinct regions, but married people with spouses present again have predominated in every region by outnumbering other classifications.
MODELS
LINEAR REGRESSION
To predict the total income of U.S. households, we employed a linear regression model. The dependent variable in our model was the household's net income (HEFAMINC), and we used
a variety of independent variables to predict it: the highest level of education attained (PEEDUCA), primary occupation (PRMJOC1), racial identity (PTDTRACE), regional location (PEREG), status as an urban or rural resident (GTMETAT), and the total number of people in the household (HRNUMHOU), among Hypothesis
:
H0: Whether different six factors defined above have no impact on net income.
H1: Whether different six factors defined above have impact on net income.
Table 5: Linear Regression Results summary
Estimate
Std. Error
t-value Pr(>|t|)
(Intercept)
28254.47 3219.62 8.776 < 2e-16 Edu_LevelHigh School
-4460.11 691.71 -6.448 1.14e-10 occupationSalesDepartment -877.14 3478.16 -0.252 0.800900 ethnicityAsian 29151.13 2067.001
4.103 < 2e-16 ethnicityBlack
1660.39 1988.39 0.835 0.403695 ethnicityHawaiian 9934.45 3497.88 2.840 0.004510 ethnicityWhite 20524.79 1880.35 10.915 < 2e-16 regionWest 2280.80 630.19 3.619 0.000296 metro_statusNon-Metropolitian
-17402.73 554.28 -31.397 < 2e-16 metro_statusNot Identified
-16629.58 2092.51 -7.947 1.93e-15. prtage
-261.78 12.76 -20.523 < 2e-16 genderMale 3995.88 437.24 9.139 < 2e-16 marital
_
statusSeparated
-2994.38 2738.78 -1.093 0.274252 marital_statusWidowed
18847.95 2505.96 7.521 5.47e-14 healthNot-Healthy
1926.00 1363.95 1.412 0.157930 R-squared/ Adjusted R-squared:
0.2094/ 0.209 F
-
statistic
: 498.3 p
-
value
: < 2.2e-16
The overview of the linear model for predicting the net income of families in the United States is provided above.
We can infer from the above table that the t-values are larger than 2 and -2 for all the variables to forecast net income, indicating the importance of the variables. Additionally, except for the ethnicity variable, where it is more than 0.05 and is therefore not significant, all the variables' p-values are less than the significance threshold of 0.05 but greater than 0.05.
These factors can account for 20.9% percent of the variance in net income, according to the model's R-squared value, which is relatively low given that there are 382 additional variables. As a result, additional factors may also be important in predicting
this result.
RIDGE REGRESSION We further performed Ridge Regression on the dataset to compare the results from the linear regression.
Firstly, we split the dataset into two parts train data and test data in ration of 70% and 30% respectively.
The dataset was converted into matrix form because to perform ridge regression it inputs the dataset in matrix form only. Then we calculated the optimal Lambda value to penalize
the coefficients to decrease the slope.
Lambda.Min
2035.914
Lambda.1se
20838.21
Next, the model fitting was performed by utilizing the glmnet function from the
glmnet package, setting the alpha parameter to zero. The lambda value corresponding to the minimum criteria, identified as the most suitable for the model, was selected for fitting.
Figure 5: Lambda Plot for Ridge regression
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
It can be clearly observed from above plot that since ridge regression does not eliminates the variable it considers all the predictors to predict the model fit. In this case, all 52 variables are considered in the model fit and results are shown below for some of the predictors.
Table 5: Ridge Regression Results Summary
s0
(Intercept)
46738.41475
Edu_LevelHigher Education
30561.13414
OccupationSales Department
16989.40402
EthnicityAsian 21459.94807
EthnicityHawaiian 1360.60239
EthnicityWhite 13187.00637
RegionSouth -4416.05436
RegionWest 2416.21647
Metro_statusNon-Metropolitian
-17198.62418
Age -264.44731
GenderMale 4305.75837
Marital_StatusWidowed
8135.14909
DisabilityNot-Disabled
12438.36708
Total_Paid_Employees
1132.38484
HealthNot-Healthy
2317.80400
The R-Squared Value for the model was 21.4% which means that using these predictors variation of 21.4% can be explained in predicting family income. Also, as there are total 388 variables the certain variation can be explained by other variables.
LASSO REGRESSION We then proceeded to apply Lasso Regression to our dataset for a comparative analysis with the outcomes from linear and Ridge regression models.
•
The dataset was initially partitioned into training and testing segments in a 70:30 ratio.
•
For Lasso Regression, it was necessary to convert the dataset into a matrix format. Following this, we concentrated on identifying the optimal Lambda value, which is key in penalizing the coefficients to effectively reduce the slope.
Lambda.Min
17.30025
Lambda.1se
1037.123
To fit the model, training was carried out using the glmnet function from the glmnet package,
setting the alpha parameter to 1. The minimum lambda value, which represents the most effective for the model, was chosen for this fit.Figure 6: Lambda Plot for Lasso Regression
The plot above provides a clear indication that Lasso regression effectively eliminates non-
essential variables, focusing only on the most significant predictors for model fitting. In this instance, all 52 variables were initially included in the model fitting process. The results presented below pertain to a selection of these predictors. However, when considering a lambda value corresponding to 1 standard error, the optimal model retained 24 variables. It's important to note that for our analysis, we opted for the optimal lambda value with no standard error considered.
Table 6: Lasso Regression Results Summary
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
s0
(Intercept)
32977.860
Edu_LevelHigher Education
-5386.74337
OccupationSales Department
-54372.83241
EthnicityAsian 25929.54673
EthnicityHawaiian 5187.3698
EthnicityWhite 18013.32298
RegionSouth -4385.03818
RegionWest 2382.04990
Metro_statusNon-Metropolitian
-14493.13258
Age -265.44731
GenderMale 4319.75837
Marital_StatusWidowed
15910.14909
DisabilityNot-Disabled
12385.36708
Total_Paid_Employees
1152.38484
HealthNot-Healthy
2150.80400
The R-Squared value for our model stands at 25.20%. This figure signifies that the predictors we have utilized can explain approximately 25.20% of the variation in predicting family income. However, it's essential to acknowledge that there are a total of 388 variables in our dataset. This suggests that there may be other variables outside of our model that can explain additional variations in family income.
MODEL COMPARISON
Table 7: Model Comparison Table
Linear
Regression
Model
Ridge
Regressio
n Model
Lasso
Regression
Model
R-
Square
d
21.2%
21.4%
25.20%
RMSE
66216.98
66009.61
63672.01
Upon conducting an analysis of the three modeling techniques employed, we computed the R-squared and RMSE (Root Mean Squared Error) values for each model to determine the most suitable approach for predicting family net income. Among these models, the Lasso Regression model emerged as the top performer, boasting the highest R-squared value at 25.20% and the lowest RMSE value of 63672.01. These results suggest that the Lasso Regression model is the optimal choice for predicting family net income.
CONCLUSION
An examination of the data on ethnic distribution and income differences by gender in the United States reveals significant inequities in both areas. The gender income gap is particularly pronounced, with men earning, on average, $8,000 more annually than women, suggesting persistent wage inequalities in the labor market influenced by factors such as employment opportunities available to different genders.
Furthermore, the predominant representation of White Americans points to a pronounced ethnic gap, with the implication that White individuals may have more access to resources and opportunities. These findings underscore the necessity for initiatives aimed at addressing racial and gender-based disparities, enhancing diversity, and promoting inclusivity nationwide.
From the analysis conducted, it is evident that three different regression techniques—Ridge, Linear, and Lasso Regression—were employed to estimate family net income. Given the R-
squared values adjusted for each method—21.2% for Linear Regression, 21.4% for Ridge Regression, and 25.20% for Lasso Regression—it becomes apparent that Lasso Regression provides a more precise model for predicting family net income.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
REFERENCE
United States Census Data. Retrieved on Apr 02, 2023. https://www.census.gov/data/datasets/time-series/demo/cps/cps-basic.html
Top 50 visualizations with ggplot. Retrieved on Apr 02, 2023. http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
The data tho. Retrieved on Apr 02, 2023. How to write descriptive statistics. http://r-
statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
APPENDIX
LIBRARY(TIDYVERSE)
LIBRARY(PSYCH)
LIBRARY(READR)
LIBRARY(DPLYR)
LIBRARY(RCOLORBREWER)
LIBRARY(GGPLOT2)
LIBRARY(GGALLY)
LIBRARY(GGPUBR)
LIBRARY(FASTDUMMIES)
LIBRARY(GGIRAPHEXTRA)
LIBRARY(RESHAPE2)
LIBRARY(CORRPLOT)
LIBRARY(CATOOLS)
LIBRARY(CAR)
LIBRARY(GGCORRPLOT)
LIBRARY(MASS)
LIBRARY(VTABLE)
LIBRARY(OFFICER)
LIBRARY(FLEXTABLE)
LIBRARY(PATCHWORK)
LIBRARY(STARGAZER)
LIBRARY(MASS)
LIBRARY(LEAPS)
LIBRARY(FURNITURE)
LIBRARY(KNITR)
LIBRARY(GLMNET)
LIBRARY(METRICS)
LIBRARY(SJPLOT)
LIBRARY(SJMISC)
LIBRARY(SJLABELLED)
LIBRARY(TIDYR)
OLD_CENSUS <- READ.CSV("/USERS/KUSHAGRABUBNA/DOWNLOADS/NOV22PUB.CSV")
OLD_CENSUS <- OLD_CENSUS %>% DROP_NA()
CENSUS_DATA <- READ.CSV("/USERS/KUSHAGRABUBNA/DOWNLOADS/NOV22PUB.CSV")
CENSUS_DATA <- NA.OMIT(CENSUS_DATA)
CENSUS_DATA
NCOL <- NCOL(CENSUS_DATA)
NROW <- NROW(CENSUS_DATA)
FOR (I IN 1:NROW){
CENSUS_DATA$EMPLOYMENT_STATUS[I] <- IF(CENSUS_DATA$PEMLR[I] == 1){
"EMPLOYED-AT WORK"
}ELSE IF(CENSUS_DATA$PEMLR[I] == 2){
"EMPLOYED-ABSENT"
}ELSE IF(CENSUS_DATA$PEMLR[I] == 3){
"UNEMPLOYED- ON LAYOFF"
}ELSE IF(CENSUS_DATA$PEMLR[I] == 4){
"UNEMPLOYED- LOOKING"
}ELSE IF(CENSUS_DATA$PEMLR[I] == 5){
"RETIRED"
}ELSE IF(CENSUS_DATA$PEMLR[I] == 6){
"DISABLED"
}ELSE{
"OTHER"
}
}
FOR (I IN 1:NROW){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CENSUS_DATA$MARITAL_STATUS[I] <- IF(CENSUS_DATA$PEMARITL[I] == 1){
"MARRIED-SPOUSE PRESENT"
}ELSE IF(CENSUS_DATA$PEMARITL[I] == 2){
"MARRIED-SPOUSE ABSENT"
}ELSE IF(CENSUS_DATA$PEMARITL[I] == 3){
"WIDOWED"
}ELSE IF(CENSUS_DATA$PEMARITL[I] == 4){
"DIVORCED"
}ELSE IF(CENSUS_DATA$PEMARITL[I] == 5){
"SEPARATED"
}ELSE{
"NEVER MARRIED"
}
}
FOR (I IN 1:NROW){
CENSUS_DATA$OCCUPATION[I] <- IF(CENSUS_DATA$PRMJOCC1[I] == 1){
"MANAGEMENT AND BUSINESS"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 2){
"PROFESSIONAL OCCUPATION"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 3){
"SERVICE OCCUPATION"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 4){
"SALES DEPARTMENT"
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 5){
"OFFICE AND ADMINISTRATIVE JOB"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 6){
"FARMING, FISHING AND FORESTRY"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 7){
"CONSTRUCTION AND EXTRACTION"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 8){
"INSTALLATION, MAINTAINENCE AND REPAIR"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 9){
"PRODUCTION OCCUPATION"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 10){
"TRANSPORTATION"
}ELSE{
"ARMED FORCES"
}
}
FOR (I IN 1:NROW){
CENSUS_DATA$MARITAL_STATUS[I] <- IF(CENSUS_DATA$PEMARITL[I] <= 2){
"MARRIED"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 3){
"WIDOWED"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 4){
"DIVORCED"
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 5){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"SEPARATED"
}ELSE {
"NEVER MARRIED"
}
}
CENSUS_DATA$GENDER <- IFELSE(CENSUS_DATA$PESEX == 1, "MALE","FEMALE")
CENSUS_DATA$DISABILITY <- IFELSE(CENSUS_DATA$PRDISFLG == 1, "DISABLED","NOT-DISABLED")
CENSUS_DATA$HEALTH <- IFELSE(CENSUS_DATA$PEDISREM == 1, "HEALTHY", "NOT-HEALTHY")
CENSUS_DATA$CERTIFIED <- IFELSE(CENSUS_DATA$PECERT1 == 1, "CERTIFIED", "NOT-CERTIFIED")
FOR (I IN 1:NROW){
CENSUS_DATA$EDU_STATUS[I] <- IF(CENSUS_DATA$PEEDUCA[I] == 31){
"LESS THAN 1 GRADE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 32){
"1, 2, 3 OR 4 GRADE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 33){
"5 OR 6 GRADE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 34){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"7 OR 8 GRADE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 35){
"9 GRADE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 36){
"10 GRADE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 37){
"11 GRADE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 38){
"12 GRADE "
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 39){
"GRAD DIPLOMA"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 40){
"COLLEGE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 41){
"ASSOCIATE DEGREE- OCCUPATIONAL"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 42){
"ASSOCIATE DEGREE- ACADEMIC"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 43){
"BACHELOR'S DEGREE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 44){
"MASTER'S DEGREE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 45){
"PROFESSIONAL SCHOOL"
}ELSE{
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"DOCTORATE DEGREE"
}
}
FOR (I IN 1:NROW){
CENSUS_DATA$EDU_NEWSTATUS[I] <- IF(CENSUS_DATA$PEEDUCA[I] == 31)
{
"ELEMENTARY EDUCATION"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 32){
"ELEMENTARY EDUCATION"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 33){
"ELEMENTARY EDUCATION"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 34){
"ELEMENTARY EDUCATION"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 35){
"HIGH SCHOOL"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 36){
"HIGH SCHOOL"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 37){
"HIGH SCHOOL"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 38){
"HIGH SCHOOL"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 39){
"COLLEGE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 40){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"COLLEGE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 41){
"COLLEGE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 42){
"COLLEGE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 43){
"GRADUATION"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 44){
"GRADUATION"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 45){
"DOCTORATE"
}ELSE{
"DOCTORATE"
}
}
FOR (I IN 1:NROW){
CENSUS_DATA$EDU_LEVEL[I] <- IF(CENSUS_DATA$PEEDUCA[I] <= 38){
"HIGH SCHOOL"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 39 ){
"COLLEGE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 40){
"COLLEGE"
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 41){
"COLLEGE"
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 42){
"COLLEGE"
}ELSE {
"HIGHER EDUCATION"
}
}
FOR (I IN 1:NROW){
CENSUS_DATA$FAM_INCOME[I] <- IF(CENSUS_DATA$HEFAMINC[I] == 1){
SAMPLE(2000:5000,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 2){
SAMPLE(5000:7499,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 3){
SAMPLE(7500:9999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 4){
SAMPLE(10000:12499,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 5){
SAMPLE(12500:14999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 6){
SAMPLE(15000:19999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 7){
SAMPLE(20000:24999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 8){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SAMPLE(25000:29999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 9){
SAMPLE(30000:34999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 10){
SAMPLE(35000:39999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 11){
SAMPLE(40000:49999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 12){
SAMPLE(50000:59999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 13){
SAMPLE(60000:74999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 14){
SAMPLE(75000:99999,1)
}ELSE IF(CENSUS_DATA$HEFAMINC[I] == 15){
SAMPLE(100000:149999,1)
}ELSE{
SAMPLE(150000:300000,1)
}
}
FOR (I IN 1:NROW){
CENSUS_DATA$ETHNICITY[I] <- IF(CENSUS_DATA$PTDTRACE[I] == 01){
"WHITE"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 02){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"BLACK"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 03){
"AMERICAN INDIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 04){
"ASIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 05){
"HAWAIIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 06){
"WHITE-BLACK"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 07){
"WHITE-AMERICAN INDIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 08){
"WHITE-ASIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 09){
"WHITE-HAWAIIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 10){
"BLACK-AMERICAN INDIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 11){
"BLACK-ASIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 12){
"BLACK-HAWAIIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 13){
"AMERICAN INDIAN-ASIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 14){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"AMERICAN INDIAN-HAWAIIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 15){
"ASIAN-HAWAIIAN"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 16){
"W-B-AI"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 17){
"W-B-A"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 18){
"W-B-H"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 19){
"W-AI-A"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 20){
"W-AI-H"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 21){
"W-A-H"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 22){
"B-AI-A"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 23){
"W-B-AI-A"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 24){
"W-AI-A-H"
}ELSE IF(CENSUS_DATA$PTDTRACE[I] == 25){
"OTHER 3 COMBINATIONS"
}ELSE{
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"OTHER 4 COMBINATIONS"
}
}
FOR (I IN 1:NROW){
CENSUS_DATA$REGION[I] <- IF(CENSUS_DATA$GEREG[I] == 1){
"NORTHEAST"
}ELSE IF(CENSUS_DATA$GEREG[I] == 2){
"MIDWEST"
}ELSE IF(CENSUS_DATA$GEREG[I] == 3){
"SOUTH"
}ELSE{
"WEST"
}
}
FOR (I IN 1:NROW){
CENSUS_DATA$METRO_STATUS[I] <- IF(CENSUS_DATA$GTMETSTA[I] == 1){
"METROPOLITIAN"
}ELSE IF(CENSUS_DATA$GTMETSTA[I] == 2){
"NON-METROPOLITIAN"
}ELSE{
"NOT IDENTIFIED"
}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
}
FURNITURE::TABLE1(CENSUS_DATA,
"GENDER" = GENDER, "METROPOLITIAN STATUS" = METRO_STATUS,
SPLITBY = ~REGION,
TEST = TRUE,
NA.RM = TRUE,
FORMAT_NUMBER = TRUE
) -> TAB11
TAB11
TAB12 <- CENSUS_DATA %>% GROUP_BY(EMPLOYMENT_STATUS) %>% SUMMARIZE(MIN = MIN(FAM_INCOME),
Q1 = QUANTILE(FAM_INCOME, 0.25),
MEDIAN = MEDIAN(FAM_INCOME),
MEAN = MEAN(FAM_INCOME),
Q3 = QUANTILE(FAM_INCOME, 0.75),
MAX = MAX(FAM_INCOME))
TAB12
TAB13 <- CENSUS_DATA %>% GROUP_BY(EDU_STATUS) %>%
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SUMMARIZE(MIN = MIN(FAM_INCOME),
Q1 = QUANTILE(FAM_INCOME, 0.25),
MEDIAN = MEDIAN(FAM_INCOME),
MEAN = MEAN(FAM_INCOME),
Q3 = QUANTILE(FAM_INCOME, 0.75),
MAX = MAX(FAM_INCOME))
TAB13
TAB14 <- CENSUS_DATA %>% GROUP_BY(OCCUPATION) %>% SUMMARIZE(
Q1 = QUANTILE(PEHRUSLT, 0.25),
MEDIAN = MEDIAN(PEHRUSLT),
MEAN = MEAN(PEHRUSLT),
Q3 = QUANTILE(PEHRUSLT, 0.75),
MAX = MAX(PEHRUSLT))
TAB14[-1,]
TAB15 <- CENSUS_DATA %>% GROUP_BY(EDU_NEWSTATUS) %>% SUMMARIZE(MIN = MIN(FAM_INCOME),
Q1 = QUANTILE(FAM_INCOME, 0.25),
MEDIAN = MEDIAN(FAM_INCOME),
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
MEAN = MEAN(FAM_INCOME),
Q3 = QUANTILE(FAM_INCOME, 0.75),
MAX = MAX(FAM_INCOME))
TAB15
GGPLOT(CENSUS_DATA, AES(X=GENDER, Y= FAM_INCOME))+
GEOM_BAR(STAT = "SUMMARY", WIDTH = 0.5, FILL="TOMATO3") +
THEME(AXIS.TEXT.X = ELEMENT_TEXT(SIZE = 15),
AXIS.TEXT.Y = ELEMENT_TEXT(SIZE = 15),
AXIS.TITLE.X = ELEMENT_TEXT(SIZE = 20),
AXIS.TITLE.Y = ELEMENT_TEXT(SIZE = 20),
TITLE = ELEMENT_TEXT(SIZE = 20)) +
LABS(TITLE = "MALES DOMINATES THE CENSUS WITH AVERAGE INCOME",
CAPTION="SOURCE: CPS SURVEY NOV 2022") +
XLAB("GENDER") +
YLAB("AVERAGE INCOME")
G <- GGPLOT(CENSUS_DATA, AES(EDU_STATUS))
G + GEOM_BAR(AES(FILL=EMPLOYMENT_STATUS), WIDTH = 0.5) + THEME(AXIS.TEXT.X = ELEMENT_TEXT(ANGLE=60, VJUST=0.6, SIZE = 15),
AXIS.TEXT.Y = ELEMENT_TEXT(SIZE = 15),
AXIS.TITLE.X = ELEMENT_TEXT(SIZE = 20),
AXIS.TITLE.Y = ELEMENT_TEXT(SIZE = 20),
TITLE = ELEMENT_TEXT(SIZE = 20),
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
LEGEND.TEXT = ELEMENT_TEXT(SIZE = 15)) +
LABS(TITLE="DIPLOMA GRADUATES ARE MOST EMPLOYED COMPARED TO OTHER DEGREE HOLDERS ", CAPTION="SOURCE: CPS SURVEY NOV 2022") +
SCALE_FILL_DISCRETE(NAME = "EMPLOYMENT STATUS") +
XLAB("EDUCATION LEVEL") +
YLAB("POPULATION")
G1 <- GGPLOT(CENSUS_DATA, AES(REGION))
G1 + GEOM_BAR(AES(FILL=ETHNICITY), WIDTH = 0.5) + THEME(AXIS.TEXT.X = ELEMENT_TEXT(SIZE = 15),
AXIS.TEXT.Y = ELEMENT_TEXT(SIZE = 15),
AXIS.TITLE.X = ELEMENT_TEXT(SIZE = 20),
AXIS.TITLE.Y = ELEMENT_TEXT(SIZE = 20),
TITLE = ELEMENT_TEXT(SIZE = 20),
LEGEND.TEXT = ELEMENT_TEXT(SIZE = 15)) +
LABS(TITLE="WHITES DOMINATES THE ETHNICITY IN ALL REGIONS ACROSS UNITED STATES", CAPTION="SOURCE: CPS SURVEY NOV 2022") +
SCALE_FILL_DISCRETE(NAME = "ETHNICITY") +
XLAB("REGION") +
YLAB("POPULATION") +
GUIDES(FILL = GUIDE_LEGEND(NCOL = 1))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
GGPLOT(CENSUS_DATA, AES(X=MARITAL_STATUS, Y= FAM_INCOME,FILL=REGION))+
GEOM_BAR(STAT = "SUMMARY", WIDTH = 0.5, POSITION = 'DODGE') +
THEME(AXIS.TEXT.X = ELEMENT_TEXT(ANGLE=45, VJUST=0.6, SIZE = 15),
AXIS.TEXT.Y = ELEMENT_TEXT(SIZE = 15),
AXIS.TITLE.X = ELEMENT_TEXT(SIZE = 20),
AXIS.TITLE.Y = ELEMENT_TEXT(SIZE = 20),
TITLE = ELEMENT_TEXT(SIZE = 20),
LEGEND.TEXT = ELEMENT_TEXT(SIZE = 15)) +
LABS(TITLE = "MARRIED COUPLES WITH SPOUSE DOMINATES THE CENSUS WITH AVERAGE INCOME",
CAPTION="SOURCE: CPS SURVEY NOV 2022") +
SCALE_FILL_DISCRETE(NAME = "REGION") +
XLAB("MARITAL STATUS") +
YLAB("AVERAGE INCOME")
TAB11DF <- AS.DATA.FRAME(TAB11)
#FLEXTABLE(TAB11DF) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE1.DOCX')
#FLEXTABLE(TAB12) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE2.DOCX')
#FLEXTABLE(TAB13) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE3.DOCX')
#FLEXTABLE(TAB14) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE4.DOCX')
FLEXTABLE(TAB15) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE5.DOCX')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
NEWCENSUS12 <- DATA.FRAME(CENSUS_DATA$FAM_INCOME,CENSUS_DATA$PEEDUCA,CEN
SUS_DATA$PRMJOCC1,CENSUS_DATA$PTDTRACE,CENSUS_DATA$GEREG,C
ENSUS_DATA$GTMETSTA,CENSUS_DATA$HRNUMHOU,CENSUS_DATA$PRT
AGE,CENSUS_DATA$PESEX,CENSUS_DATA$PEMARITL,CENSUS_DATA$PRDI
SFLG,CENSUS_DATA$PTNMEMP1,CENSUS_DATA$PEDISREM,CENSUS_DATA
$PECERT1)
COLNAMES(NEWCENSUS12) <- C("FAM_INCOME","EDU_LEVEL","OCCUPATION","ETHNICITY","REGION",
"METRO_STATUS","TOTAL_PEOPLE","AGE","GENDER","MARITAL_STATU
S","DISABILITY","TOTAL_PAID_EMPLOYEES","HEALTH","CERTIFIED")
CORRRRR<- COR(NEWCENSUS12)
CORRPLOT(CORRRRR, METHOD = 'CIRCLE', ) # CORRELATION PLOT FOR SIX IMPORTANT VARIABLES
NEWCENSUS <- DATA.FRAME(CENSUS_DATA$FAM_INCOME,CENSUS_DATA$EDU_LEVEL,CE
NSUS_DATA$OCCUPATION,CENSUS_DATA$ETHNICITY,CENSUS_DATA$REGI
ON,CENSUS_DATA$METRO_STATUS,CENSUS_DATA$HRNUMHOU,CENSUS_D
ATA$PRTAGE,CENSUS_DATA$GENDER,CENSUS_DATA$MARITAL_STATUS,C
ENSUS_DATA$DISABILITY,CENSUS_DATA$PTNMEMP1,CENSUS_DATA$HEAL
TH,CENSUS_DATA$CERTIFIED)
COLNAMES(NEWCENSUS) <- C("FAM_INCOME","EDU_LEVEL","OCCUPATION","ETHNICITY","REGION",
"METRO_STATUS","TOTAL_PEOPLE","AGE","GENDER","MARITAL_STATU
S","DISABILITY","TOTAL_PAID_EMPLOYEES","HEALTH","CERTIFIED")
# REGRESSION MODEL
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
LM1 <- LM(FAM_INCOME~EDU_LEVEL+OCCUPATION+ETHNICITY+REGION+METR
O_STATUS+HRNUMHOU+PRTAGE+GENDER+MARITAL_STATUS+DISABILIT
Y+PTNMEMP1+HEALTH+CERTIFIED, DATA = CENSUS_DATA)
SUMMARY(LM1)
# RIDGE REGRESSION MODEL
SET.SEED(123)
TRAININDEX <- SAMPLE(X=NROW(NEWCENSUS),SIZE = NROW(NEWCENSUS)*0.7)
TRAINDATA <- NEWCENSUS[TRAININDEX,]
TESTDATA <- NEWCENSUS[-TRAININDEX,]
TRAIN_X <- MODEL.MATRIX(FAM_INCOME~. , TRAINDATA)[,-1]
TEST_X <- MODEL.MATRIX(FAM_INCOME~. , TESTDATA)[,-1]
TRAIN_Y <- TRAINDATA$FAM_INCOME
TEST_Y <- TESTDATA$FAM_INCOME
SET.SEED(123)
LAMBDA <- CV.GLMNET(TRAIN_X, TRAIN_Y, ALPHA=0 , NFOLDS = 10)
PLOT(LAMBDA)
LAMBDAMIN <- LAMBDA$LAMBDA.MIN
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
LAMBDA1SE <- LAMBDA$LAMBDA.1SE
LAMBDAMIN
LAMBDA1SE
# FITTING RIDGE MODEL BASED ON LAMBDA
MODEL <- GLMNET(TRAIN_X,TRAIN_Y, ALPHA = 0)
PLOT(MODEL, XVAR = "LAMBDA")
MODEL <- GLMNET(TRAIN_X,TRAIN_Y, ALPHA = 1)
PLOT(MODEL, XVAR = "LAMBDA")
# MODEL FOR LAMBDA MIN
MODELMINRIDGE <- GLMNET(TRAIN_X,TRAIN_Y, ALPHA = 0, LAMBDA = LAMBDAMIN)
COEF(MODELMINRIDGE)
TRAIN_PREDICT_RIDGE <- PREDICT(MODELMINRIDGE, NEWX = TRAIN_X)
TRAIN_RMSE_RIDGE <- RMSE(TRAIN_Y, TRAIN_PREDICT_RIDGE)
TRAIN_RMSE_RIDGE
# LASSO REGRESSION MODEL
SET.SEED(123)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
TRAININDEX1 <- SAMPLE(X=NROW(NEWCENSUS),SIZE = NROW(NEWCENSUS)*0.7)
TRAINDATA1 <- NEWCENSUS[TRAININDEX1,]
TESTDATA1 <- NEWCENSUS[-TRAININDEX1,]
TRAIN_X1 <- MODEL.MATRIX(FAM_INCOME~. , TRAINDATA1)[,-1]
TEST_X1 <- MODEL.MATRIX(FAM_INCOME~. , TESTDATA1)[,-1]
TRAIN_Y1 <- TRAINDATA1$FAM_INCOME
TEST_Y1 <- TESTDATA1$FAM_INCOME
SET.SEED(123)
LAMBDA1 <- CV.GLMNET(TRAIN_X1, TRAIN_Y1, ALPHA=1 , NFOLDS = 10)
PLOT(LAMBDA1)
LAMBDAMIN1 <- LAMBDA1$LAMBDA.MIN
LAMBDA1SE1 <- LAMBDA1$LAMBDA.1SE
LAMBDAMIN1
LAMBDA1SE1
MODELMINRIDGE1 <- GLMNET(TRAIN_X,TRAIN_Y, ALPHA = 1, LAMBDA = LAMBDAMIN1)
COEF(MODELMINRIDGE1)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
TRAIN_PREDICT <- PREDICT(MODELMINRIDGE1, NEWX = TRAIN_X)
TRAIN_RMSE <- RMSE(TRAIN_Y, TRAIN_PREDICT)
TRAIN_RMSE
PREDICTV <- PREDICT(LM1, NEWX = TRAINX)
PREDICT_RMSE <- RMSE(CENSUS_DATA$FAM_INCOME, PREDICTV)
PREDICT_RMSE
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help