ALY6015_FinalProject_Patel

.docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

6015

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

42

Uploaded by BailiffJaguarPerson718

Report
FINAL REPORT ALY 6015 Intermediate Analytics Hardh Patel Date: December 4 th , 2023 Instructor: Sergiy Shevchenko
INTRODUCTION The U.S. Census Bureau carries out a monthly inquiry known as the Current Population Survey (CPS), gathering data on a plethora of demographic and economic characteristics that influence the American populace. This trove of information from the CPS is instrumental in shedding light on the nation's social and economic dynamics, serving as an indispensable tool for both decision-makers and scholars. This preliminary assessment aims to lay out our early observations and offer a detailed examination of the CPS data collection. Our objective is to deepen our comprehension of the prevailing conditions affecting individuals in the U.S. by exploring a variety of demographic and economic elements, including but not limited to age, sex, educational background, earnings, and employment circumstances. In addition, our examination will delve into various population cohorts to pinpoint any imbalances and monitor evolution over periods. Our present scrutiny is anchored in the data gleaned from the CPS for November 2022, encapsulating details on upwards of 123,000 individuals. This dataset encapsulates an array of demographic and economic attributes, spanning age, sex, ethnicity, educational achievements, financial status, professional classifications, and sectoral engagement. To dissect the data, we employed exploratory methods, scrutinizing the likelihood distributions and condensed metrics for the assorted variables. To further elucidate our findings, we crafted graphical representations such as histograms and point diagrams, which serve to underscore recurring patterns and trajectories within the dataset. EXPLORATORY DATA ANALYSIS DESCRIPTION Employing descriptive statistical methods has underscored the importance of refining the data and deepening our understanding of the involved metrics. Within this dataset, we found 123,009 entries across 388 distinct variables. Our initial step in extracting pertinent conclusions and insights was to purify the dataset. This process entailed the elimination of incomplete entries and the enhancement of the variables at hand. For instance, employment status was subdivided into several groups, including those who are retired, employed, or unable to work. Furthermore, we distinguished variables relating to geographic region and cultural background to facilitate a more granular examination and utilized summary tables in our exploratory data analysis. The income range
for families was quantified by assigning a random number within the specified range for each entry. The variable pertaining to educational attainment was also scrutinized, and a new category was established to delineate the various educational qualifications observed. Upon the completion of the data purification phase, we will proceed to the analytical segment of this document, wherein we intend to meticulously analyze the survey data. Our goal is to present an exhaustive narrative of the data gathering methodology, spotlighting any significant patterns and tendencies within the dataset. We will apply statistical indicators such as the mean, median, and mode to discern the data distribution and evaluate the central tendencies of the numerical variables. This rigorous analysis is a crucial element of our endeavor, as it will facilitate the extraction of meaningful conclusions and insights from the data. To better understand the disparity and interconnections among different metrics, we segmented the data into various subgroups. This strategy was instrumental in generating significant insights.. SUBSET 1: Region, Gender, and Metropolitan Status Table 1: Descriptive summary on distribution of gender and metropolitan status . Midwest Northeast South West n = 19621 n = 15867 n = 36936 n = 27313 Gender Female 9,901 (50.5%) 8,173 (51.5%) 19,260 (52.1%) 13,765 (50.4%) Male 9,720 (49.5%) 7,694 (48.5%) 17,676 (47.9%) 13,548 (49.6%) Metropolitan Status Metropolitan 14,713 (75%) 13,547 (85.4%) 30,034 (81.3%) 22,119 (81%) Non - Metropolitan 4,908 (25%) 2,264 (14.3%) 6,355 (17.2%) 4,766 (17.4%) Not Identified 0 (0%) 56 (0.4%) 547 (1.5%) 428 (1.6%) Table 1 illustrates the distribution of participants by sex and urban categorization within four major regions: the Midwest, Northeast, South, and West. The data indicates a higher count of
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
female participants compared to male participants. It was also observed that urban areas had a higher representation in the dataset compared to rural areas. Moreover, the urban status of certain records remained unspecified. SUBSET 2: Employment Status and Net Income Table 2: Descriptive Statistics of Employment Status and Family Income Employment Status Min q1 Median Mean q3 Max Disabled 2,009 13,873.0 29,049.0 45,083.62 57,154.00 299,936 Employed - Absent 2,025 43,870.0 83,466.0 102,292.02 139,974.50 298,709 Employed - At Work 2,032 52,739.0 89,868.0 110,859.50 148,164.00 299,974 Other 2,009 38,080.5 73,363.0 95,847.31 134,490.00 299,976 Retired 2,017 28,881.0 51,678.5 70,097.58 88,759.75 299,852 Unemployed - Looking 2,003 24,120.0 49,802.0 71,111.90 92,639.50 299,961 Unemployed - On Layoff 3,311 28,139.0 50,643.0 68,954.27 94,178.00 292,466 Table 2 presents the aggregate income levels in relation to the employment status of household members. From this table, it's clear that those who are employed report the highest average income in comparison to other employment categories. On the other hand, individuals who are disabled report the lowest average income. However, the data also indicates that those with disabilities, while having the lowest average income, still receive the highest incomes within their category when compared to peers in other job statuses.
SUBSET 3: Education Status and Net Income Table 3: Descriptive Statistics on Education Status and Family Income Education Staus min q1 median mean q3 max College 2,002 34,008.25 59,836.5 78,655.07 99,648.5 299,963 Doctorate 2,017 45,689.00 87,418.0 108,170.59 151,540.5 299,991 Elementary Education 2,043 19,841.00 37,425.0 55,823.03 68,732.0 299,916 Graduation 2,044 63,904.25 107,603.5 125,568.09 179,661.5 299,994 High School 2,014 26,750.00 51,699.0 77,136.54 101,440.0 299,889 Table 3 delineates the correlation between educational achievements and the net earnings of the population. It is discernible from the data that individuals holding master's, professional, and bachelor's degrees are at the higher end of the earnings spectrum compared to those with other levels of education. This pattern of income distribution across different educational qualifications of family members provides a predictive framework for household net income. SUBSET 4: Occupation and Total working hours Table 4: Descriptive Statistics on Occupation and number of total working hours Occupation q1 median mean q3 max Construction and Extraction 40.00 40 34.80478 40 134 Farming, Fishing and Forestry 6.00 40 28.84741 40 99 Installation, Maintenance and Repair 40.00 40 38.30394 40 85 Management and Business 40.00 40 38.39611 45 120 Office and Administrative Job 35.00 40 34.42317 40 100 Production Occupation 40.00 40 37.29208 40 99
Occupation q1 median mean q3 max Professional Occupation 36.00 40 36.00613 40 140 Sales Department 25.00 40 33.91423 40 138 Service Occupation 20.00 40 30.12495 40 127 Transportation 26.75 40 33.99396 40 139 Table 4 outlines the interplay and specific data concerning employment categories and the cumulative working hours as recorded in the dataset. Fields such as installation, maintenance, and repair, along with management and business sectors, display the highest employment rates. Notably, professional roles are cited for having the greatest maximum working hours, yet the total number of individuals in such professions is comparatively low. These insights are pivotal for gauging the employment conditions of the populace. The aforementioned data were methodically transformed from their original form to the current format to facilitate a more lucid comprehension and to conduct a descriptive analysis.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Research Questions 1. How are individuals dispersed across the various regions of the United States according on their gender and metropolitan status? 2. How does employment status affect net family income in household of US? 3. What are the good predictor variables for predicting net family income? 4. What is the relationship between net family income and its predictor variables? ANALYSIS Following an in-depth analysis, we gained insights into the interconnections between variables and their impact on other aspects within the dataset. The subsequent visuals provide a deeper interpretation of these relationships.: Figure 1: Visualization on average income based on gender. The study of the gender-based average income in the United States census statistics for November 2022 shows that men dominate the income landscape. Males made an average of $98000 while females made $90000.
The gender pay gap may indicate that there is still substantial gender-based wage disparities in the employment. Figure 2: Visualization on relation of education level with employment status Based on an analysis of census data, the graph above shows that education level seems to have a substantial impact on job position. Compared to other degree holders, diploma graduates have the best job rates. The most popular degree among people is a graduate diploma, followed by a doctoral degree. The two job statuses with the highest prevalence in the community are employed-at- work and retired. People with better education levels, such as bachelor's degrees, graduate diplomas, and master's degrees, are more likely to have the employed-at- work classification. According to the statistics, people with doctorates are mostly classified as having "Other" job statuses, which is unclear and calls for more research. These results imply that diploma graduates may have more employment possibilities because of their better degrees of education.
Figure 3: Visualization on ethnicity of population based on region The study of census population statistics on the spread of ethnicity across the United States is shown in the graph above, which demonstrates that Whites predominate in every area. Whites are more prevalent than other races in all regions, suggesting a major racial disparity, with Whites possibly having greater access to resources and chances. Black people make up a larger percentage of the population in the South than in other parts of the country, which may be due to regional variations in political, cultural, and societal variables. Further investigation could examine the underlying causes of the concentration of ethnic groups in particular geographic areas and suggest possible tactics for fostering greater variety and equity throughout the nation. To ensure that policies and resources are allocated fairly across various ethnic groups and locations, the data may also be used to influence decision-making in several areas, such as jobs, healthcare, and education.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Figure 4: Visualization on average income based on marital status and region. The above graph shows how the typical income is distributed across the Midwest, Northeast, South, and West areas based on marital status. The plot supports the census data's finding that married people who can see their partner earn more on average than people with other marriage statuses. Widowed people have the lowest incomes, which makes sense given that it takes more than one earner to lead the revenue charts. The dispersal of people across regions is almost similar, as evidenced by the data, which was evenly gathered from the four distinct regions, but married people with spouses present again have predominated in every region by outnumbering other classifications.
MODELS LINEAR REGRESSION To predict the total income of U.S. households, we employed a linear regression model. The dependent variable in our model was the household's net income (HEFAMINC), and we used a variety of independent variables to predict it: the highest level of education attained (PEEDUCA), primary occupation (PRMJOC1), racial identity (PTDTRACE), regional location (PEREG), status as an urban or rural resident (GTMETAT), and the total number of people in the household (HRNUMHOU), among Hypothesis : H0: Whether different six factors defined above have no impact on net income. H1: Whether different six factors defined above have impact on net income. Table 5: Linear Regression Results summary Estimate Std. Error t-value Pr(>|t|) (Intercept) 28254.47 3219.62 8.776 < 2e-16 Edu_LevelHigh School -4460.11 691.71 -6.448 1.14e-10 occupationSalesDepartment -877.14 3478.16 -0.252 0.800900 ethnicityAsian 29151.13 2067.001 4.103 < 2e-16 ethnicityBlack 1660.39 1988.39 0.835 0.403695 ethnicityHawaiian 9934.45 3497.88 2.840 0.004510 ethnicityWhite 20524.79 1880.35 10.915 < 2e-16 regionWest 2280.80 630.19 3.619 0.000296 metro_statusNon-Metropolitian -17402.73 554.28 -31.397 < 2e-16 metro_statusNot Identified -16629.58 2092.51 -7.947 1.93e-15. prtage -261.78 12.76 -20.523 < 2e-16 genderMale 3995.88 437.24 9.139 < 2e-16 marital _ statusSeparated -2994.38 2738.78 -1.093 0.274252 marital_statusWidowed 18847.95 2505.96 7.521 5.47e-14 healthNot-Healthy 1926.00 1363.95 1.412 0.157930 R-squared/ Adjusted R-squared: 0.2094/ 0.209 F - statistic : 498.3 p - value : < 2.2e-16
The overview of the linear model for predicting the net income of families in the United States is provided above. We can infer from the above table that the t-values are larger than 2 and -2 for all the variables to forecast net income, indicating the importance of the variables. Additionally, except for the ethnicity variable, where it is more than 0.05 and is therefore not significant, all the variables' p-values are less than the significance threshold of 0.05 but greater than 0.05. These factors can account for 20.9% percent of the variance in net income, according to the model's R-squared value, which is relatively low given that there are 382 additional variables. As a result, additional factors may also be important in predicting this result. RIDGE REGRESSION We further performed Ridge Regression on the dataset to compare the results from the linear regression. Firstly, we split the dataset into two parts train data and test data in ration of 70% and 30% respectively. The dataset was converted into matrix form because to perform ridge regression it inputs the dataset in matrix form only. Then we calculated the optimal Lambda value to penalize the coefficients to decrease the slope. Lambda.Min 2035.914 Lambda.1se 20838.21 Next, the model fitting was performed by utilizing the glmnet function from the glmnet package, setting the alpha parameter to zero. The lambda value corresponding to the minimum criteria, identified as the most suitable for the model, was selected for fitting. Figure 5: Lambda Plot for Ridge regression
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
It can be clearly observed from above plot that since ridge regression does not eliminates the variable it considers all the predictors to predict the model fit. In this case, all 52 variables are considered in the model fit and results are shown below for some of the predictors. Table 5: Ridge Regression Results Summary
s0 (Intercept) 46738.41475 Edu_LevelHigher Education 30561.13414 OccupationSales Department 16989.40402 EthnicityAsian 21459.94807 EthnicityHawaiian 1360.60239 EthnicityWhite 13187.00637 RegionSouth -4416.05436 RegionWest 2416.21647 Metro_statusNon-Metropolitian -17198.62418 Age -264.44731 GenderMale 4305.75837 Marital_StatusWidowed 8135.14909 DisabilityNot-Disabled 12438.36708 Total_Paid_Employees 1132.38484 HealthNot-Healthy 2317.80400 The R-Squared Value for the model was 21.4% which means that using these predictors variation of 21.4% can be explained in predicting family income. Also, as there are total 388 variables the certain variation can be explained by other variables. LASSO REGRESSION We then proceeded to apply Lasso Regression to our dataset for a comparative analysis with the outcomes from linear and Ridge regression models. The dataset was initially partitioned into training and testing segments in a 70:30 ratio. For Lasso Regression, it was necessary to convert the dataset into a matrix format. Following this, we concentrated on identifying the optimal Lambda value, which is key in penalizing the coefficients to effectively reduce the slope. Lambda.Min 17.30025 Lambda.1se 1037.123
To fit the model, training was carried out using the glmnet function from the glmnet package, setting the alpha parameter to 1. The minimum lambda value, which represents the most effective for the model, was chosen for this fit.Figure 6: Lambda Plot for Lasso Regression The plot above provides a clear indication that Lasso regression effectively eliminates non- essential variables, focusing only on the most significant predictors for model fitting. In this instance, all 52 variables were initially included in the model fitting process. The results presented below pertain to a selection of these predictors. However, when considering a lambda value corresponding to 1 standard error, the optimal model retained 24 variables. It's important to note that for our analysis, we opted for the optimal lambda value with no standard error considered. Table 6: Lasso Regression Results Summary
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
s0 (Intercept) 32977.860 Edu_LevelHigher Education -5386.74337 OccupationSales Department -54372.83241 EthnicityAsian 25929.54673 EthnicityHawaiian 5187.3698 EthnicityWhite 18013.32298 RegionSouth -4385.03818 RegionWest 2382.04990 Metro_statusNon-Metropolitian -14493.13258 Age -265.44731 GenderMale 4319.75837 Marital_StatusWidowed 15910.14909 DisabilityNot-Disabled 12385.36708 Total_Paid_Employees 1152.38484 HealthNot-Healthy 2150.80400 The R-Squared value for our model stands at 25.20%. This figure signifies that the predictors we have utilized can explain approximately 25.20% of the variation in predicting family income. However, it's essential to acknowledge that there are a total of 388 variables in our dataset. This suggests that there may be other variables outside of our model that can explain additional variations in family income.
MODEL COMPARISON Table 7: Model Comparison Table Linear Regression Model Ridge Regressio n Model Lasso Regression Model R- Square d 21.2% 21.4% 25.20% RMSE 66216.98 66009.61 63672.01 Upon conducting an analysis of the three modeling techniques employed, we computed the R-squared and RMSE (Root Mean Squared Error) values for each model to determine the most suitable approach for predicting family net income. Among these models, the Lasso Regression model emerged as the top performer, boasting the highest R-squared value at 25.20% and the lowest RMSE value of 63672.01. These results suggest that the Lasso Regression model is the optimal choice for predicting family net income.
CONCLUSION An examination of the data on ethnic distribution and income differences by gender in the United States reveals significant inequities in both areas. The gender income gap is particularly pronounced, with men earning, on average, $8,000 more annually than women, suggesting persistent wage inequalities in the labor market influenced by factors such as employment opportunities available to different genders. Furthermore, the predominant representation of White Americans points to a pronounced ethnic gap, with the implication that White individuals may have more access to resources and opportunities. These findings underscore the necessity for initiatives aimed at addressing racial and gender-based disparities, enhancing diversity, and promoting inclusivity nationwide. From the analysis conducted, it is evident that three different regression techniques—Ridge, Linear, and Lasso Regression—were employed to estimate family net income. Given the R- squared values adjusted for each method—21.2% for Linear Regression, 21.4% for Ridge Regression, and 25.20% for Lasso Regression—it becomes apparent that Lasso Regression provides a more precise model for predicting family net income.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
REFERENCE United States Census Data. Retrieved on Apr 02, 2023. https://www.census.gov/data/datasets/time-series/demo/cps/cps-basic.html Top 50 visualizations with ggplot. Retrieved on Apr 02, 2023. http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html The data tho. Retrieved on Apr 02, 2023. How to write descriptive statistics. http://r- statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html APPENDIX LIBRARY(TIDYVERSE) LIBRARY(PSYCH) LIBRARY(READR) LIBRARY(DPLYR) LIBRARY(RCOLORBREWER) LIBRARY(GGPLOT2) LIBRARY(GGALLY) LIBRARY(GGPUBR) LIBRARY(FASTDUMMIES) LIBRARY(GGIRAPHEXTRA) LIBRARY(RESHAPE2) LIBRARY(CORRPLOT) LIBRARY(CATOOLS) LIBRARY(CAR) LIBRARY(GGCORRPLOT) LIBRARY(MASS)
LIBRARY(VTABLE) LIBRARY(OFFICER) LIBRARY(FLEXTABLE) LIBRARY(PATCHWORK) LIBRARY(STARGAZER) LIBRARY(MASS) LIBRARY(LEAPS) LIBRARY(FURNITURE) LIBRARY(KNITR) LIBRARY(GLMNET) LIBRARY(METRICS) LIBRARY(SJPLOT) LIBRARY(SJMISC) LIBRARY(SJLABELLED) LIBRARY(TIDYR) OLD_CENSUS <- READ.CSV("/USERS/KUSHAGRABUBNA/DOWNLOADS/NOV22PUB.CSV") OLD_CENSUS <- OLD_CENSUS %>% DROP_NA() CENSUS_DATA <- READ.CSV("/USERS/KUSHAGRABUBNA/DOWNLOADS/NOV22PUB.CSV") CENSUS_DATA <- NA.OMIT(CENSUS_DATA)
CENSUS_DATA NCOL <- NCOL(CENSUS_DATA) NROW <- NROW(CENSUS_DATA) FOR (I IN 1:NROW){ CENSUS_DATA$EMPLOYMENT_STATUS[I] <- IF(CENSUS_DATA$PEMLR[I] == 1){ "EMPLOYED-AT WORK" }ELSE IF(CENSUS_DATA$PEMLR[I] == 2){ "EMPLOYED-ABSENT" }ELSE IF(CENSUS_DATA$PEMLR[I] == 3){ "UNEMPLOYED- ON LAYOFF" }ELSE IF(CENSUS_DATA$PEMLR[I] == 4){ "UNEMPLOYED- LOOKING" }ELSE IF(CENSUS_DATA$PEMLR[I] == 5){ "RETIRED" }ELSE IF(CENSUS_DATA$PEMLR[I] == 6){ "DISABLED" }ELSE{ "OTHER" } } FOR (I IN 1:NROW){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CENSUS_DATA$MARITAL_STATUS[I] <- IF(CENSUS_DATA$PEMARITL[I] == 1){ "MARRIED-SPOUSE PRESENT" }ELSE IF(CENSUS_DATA$PEMARITL[I] == 2){ "MARRIED-SPOUSE ABSENT" }ELSE IF(CENSUS_DATA$PEMARITL[I] == 3){ "WIDOWED" }ELSE IF(CENSUS_DATA$PEMARITL[I] == 4){ "DIVORCED" }ELSE IF(CENSUS_DATA$PEMARITL[I] == 5){ "SEPARATED" }ELSE{ "NEVER MARRIED" } } FOR (I IN 1:NROW){ CENSUS_DATA$OCCUPATION[I] <- IF(CENSUS_DATA$PRMJOCC1[I] == 1){ "MANAGEMENT AND BUSINESS" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 2){ "PROFESSIONAL OCCUPATION" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 3){ "SERVICE OCCUPATION" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 4){ "SALES DEPARTMENT"
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
}ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 5){ "OFFICE AND ADMINISTRATIVE JOB" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 6){ "FARMING, FISHING AND FORESTRY" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 7){ "CONSTRUCTION AND EXTRACTION" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 8){ "INSTALLATION, MAINTAINENCE AND REPAIR" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 9){ "PRODUCTION OCCUPATION" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 10){ "TRANSPORTATION" }ELSE{ "ARMED FORCES" } } FOR (I IN 1:NROW){ CENSUS_DATA$MARITAL_STATUS[I] <- IF(CENSUS_DATA$PEMARITL[I] <= 2){ "MARRIED" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 3){ "WIDOWED" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 4){ "DIVORCED" }ELSE IF(CENSUS_DATA$PRMJOCC1[I] == 5){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
"SEPARATED" }ELSE { "NEVER MARRIED" } } CENSUS_DATA$GENDER <- IFELSE(CENSUS_DATA$PESEX == 1, "MALE","FEMALE") CENSUS_DATA$DISABILITY <- IFELSE(CENSUS_DATA$PRDISFLG == 1, "DISABLED","NOT-DISABLED") CENSUS_DATA$HEALTH <- IFELSE(CENSUS_DATA$PEDISREM == 1, "HEALTHY", "NOT-HEALTHY") CENSUS_DATA$CERTIFIED <- IFELSE(CENSUS_DATA$PECERT1 == 1, "CERTIFIED", "NOT-CERTIFIED") FOR (I IN 1:NROW){ CENSUS_DATA$EDU_STATUS[I] <- IF(CENSUS_DATA$PEEDUCA[I] == 31){ "LESS THAN 1 GRADE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 32){ "1, 2, 3 OR 4 GRADE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 33){ "5 OR 6 GRADE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 34){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
"7 OR 8 GRADE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 35){ "9 GRADE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 36){ "10 GRADE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 37){ "11 GRADE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 38){ "12 GRADE " }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 39){ "GRAD DIPLOMA" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 40){ "COLLEGE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 41){ "ASSOCIATE DEGREE- OCCUPATIONAL" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 42){ "ASSOCIATE DEGREE- ACADEMIC" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 43){ "BACHELOR'S DEGREE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 44){ "MASTER'S DEGREE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 45){ "PROFESSIONAL SCHOOL" }ELSE{
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
"DOCTORATE DEGREE" } } FOR (I IN 1:NROW){ CENSUS_DATA$EDU_NEWSTATUS[I] <- IF(CENSUS_DATA$PEEDUCA[I] == 31) { "ELEMENTARY EDUCATION" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 32){ "ELEMENTARY EDUCATION" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 33){ "ELEMENTARY EDUCATION" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 34){ "ELEMENTARY EDUCATION" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 35){ "HIGH SCHOOL" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 36){ "HIGH SCHOOL" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 37){ "HIGH SCHOOL" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 38){ "HIGH SCHOOL" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 39){ "COLLEGE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 40){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
"COLLEGE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 41){ "COLLEGE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 42){ "COLLEGE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 43){ "GRADUATION" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 44){ "GRADUATION" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 45){ "DOCTORATE" }ELSE{ "DOCTORATE" } } FOR (I IN 1:NROW){ CENSUS_DATA$EDU_LEVEL[I] <- IF(CENSUS_DATA$PEEDUCA[I] <= 38){ "HIGH SCHOOL" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 39 ){ "COLLEGE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 40){ "COLLEGE"
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
}ELSE IF(CENSUS_DATA$PEEDUCA[I] == 41){ "COLLEGE" }ELSE IF(CENSUS_DATA$PEEDUCA[I] == 42){ "COLLEGE" }ELSE { "HIGHER EDUCATION" } } FOR (I IN 1:NROW){ CENSUS_DATA$FAM_INCOME[I] <- IF(CENSUS_DATA$HEFAMINC[I] == 1){ SAMPLE(2000:5000,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 2){ SAMPLE(5000:7499,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 3){ SAMPLE(7500:9999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 4){ SAMPLE(10000:12499,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 5){ SAMPLE(12500:14999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 6){ SAMPLE(15000:19999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 7){ SAMPLE(20000:24999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 8){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
SAMPLE(25000:29999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 9){ SAMPLE(30000:34999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 10){ SAMPLE(35000:39999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 11){ SAMPLE(40000:49999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 12){ SAMPLE(50000:59999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 13){ SAMPLE(60000:74999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 14){ SAMPLE(75000:99999,1) }ELSE IF(CENSUS_DATA$HEFAMINC[I] == 15){ SAMPLE(100000:149999,1) }ELSE{ SAMPLE(150000:300000,1) } } FOR (I IN 1:NROW){ CENSUS_DATA$ETHNICITY[I] <- IF(CENSUS_DATA$PTDTRACE[I] == 01){ "WHITE" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 02){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
"BLACK" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 03){ "AMERICAN INDIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 04){ "ASIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 05){ "HAWAIIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 06){ "WHITE-BLACK" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 07){ "WHITE-AMERICAN INDIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 08){ "WHITE-ASIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 09){ "WHITE-HAWAIIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 10){ "BLACK-AMERICAN INDIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 11){ "BLACK-ASIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 12){ "BLACK-HAWAIIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 13){ "AMERICAN INDIAN-ASIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 14){
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
"AMERICAN INDIAN-HAWAIIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 15){ "ASIAN-HAWAIIAN" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 16){ "W-B-AI" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 17){ "W-B-A" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 18){ "W-B-H" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 19){ "W-AI-A" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 20){ "W-AI-H" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 21){ "W-A-H" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 22){ "B-AI-A" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 23){ "W-B-AI-A" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 24){ "W-AI-A-H" }ELSE IF(CENSUS_DATA$PTDTRACE[I] == 25){ "OTHER 3 COMBINATIONS" }ELSE{
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
"OTHER 4 COMBINATIONS" } } FOR (I IN 1:NROW){ CENSUS_DATA$REGION[I] <- IF(CENSUS_DATA$GEREG[I] == 1){ "NORTHEAST" }ELSE IF(CENSUS_DATA$GEREG[I] == 2){ "MIDWEST" }ELSE IF(CENSUS_DATA$GEREG[I] == 3){ "SOUTH" }ELSE{ "WEST" } } FOR (I IN 1:NROW){ CENSUS_DATA$METRO_STATUS[I] <- IF(CENSUS_DATA$GTMETSTA[I] == 1){ "METROPOLITIAN" }ELSE IF(CENSUS_DATA$GTMETSTA[I] == 2){ "NON-METROPOLITIAN" }ELSE{ "NOT IDENTIFIED" }
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
} FURNITURE::TABLE1(CENSUS_DATA, "GENDER" = GENDER, "METROPOLITIAN STATUS" = METRO_STATUS, SPLITBY = ~REGION, TEST = TRUE, NA.RM = TRUE, FORMAT_NUMBER = TRUE ) -> TAB11 TAB11 TAB12 <- CENSUS_DATA %>% GROUP_BY(EMPLOYMENT_STATUS) %>% SUMMARIZE(MIN = MIN(FAM_INCOME), Q1 = QUANTILE(FAM_INCOME, 0.25), MEDIAN = MEDIAN(FAM_INCOME), MEAN = MEAN(FAM_INCOME), Q3 = QUANTILE(FAM_INCOME, 0.75), MAX = MAX(FAM_INCOME)) TAB12 TAB13 <- CENSUS_DATA %>% GROUP_BY(EDU_STATUS) %>%
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
SUMMARIZE(MIN = MIN(FAM_INCOME), Q1 = QUANTILE(FAM_INCOME, 0.25), MEDIAN = MEDIAN(FAM_INCOME), MEAN = MEAN(FAM_INCOME), Q3 = QUANTILE(FAM_INCOME, 0.75), MAX = MAX(FAM_INCOME)) TAB13 TAB14 <- CENSUS_DATA %>% GROUP_BY(OCCUPATION) %>% SUMMARIZE( Q1 = QUANTILE(PEHRUSLT, 0.25), MEDIAN = MEDIAN(PEHRUSLT), MEAN = MEAN(PEHRUSLT), Q3 = QUANTILE(PEHRUSLT, 0.75), MAX = MAX(PEHRUSLT)) TAB14[-1,] TAB15 <- CENSUS_DATA %>% GROUP_BY(EDU_NEWSTATUS) %>% SUMMARIZE(MIN = MIN(FAM_INCOME), Q1 = QUANTILE(FAM_INCOME, 0.25), MEDIAN = MEDIAN(FAM_INCOME),
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
MEAN = MEAN(FAM_INCOME), Q3 = QUANTILE(FAM_INCOME, 0.75), MAX = MAX(FAM_INCOME)) TAB15 GGPLOT(CENSUS_DATA, AES(X=GENDER, Y= FAM_INCOME))+ GEOM_BAR(STAT = "SUMMARY", WIDTH = 0.5, FILL="TOMATO3") + THEME(AXIS.TEXT.X = ELEMENT_TEXT(SIZE = 15), AXIS.TEXT.Y = ELEMENT_TEXT(SIZE = 15), AXIS.TITLE.X = ELEMENT_TEXT(SIZE = 20), AXIS.TITLE.Y = ELEMENT_TEXT(SIZE = 20), TITLE = ELEMENT_TEXT(SIZE = 20)) + LABS(TITLE = "MALES DOMINATES THE CENSUS WITH AVERAGE INCOME", CAPTION="SOURCE: CPS SURVEY NOV 2022") + XLAB("GENDER") + YLAB("AVERAGE INCOME") G <- GGPLOT(CENSUS_DATA, AES(EDU_STATUS)) G + GEOM_BAR(AES(FILL=EMPLOYMENT_STATUS), WIDTH = 0.5) + THEME(AXIS.TEXT.X = ELEMENT_TEXT(ANGLE=60, VJUST=0.6, SIZE = 15), AXIS.TEXT.Y = ELEMENT_TEXT(SIZE = 15), AXIS.TITLE.X = ELEMENT_TEXT(SIZE = 20), AXIS.TITLE.Y = ELEMENT_TEXT(SIZE = 20), TITLE = ELEMENT_TEXT(SIZE = 20),
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
LEGEND.TEXT = ELEMENT_TEXT(SIZE = 15)) + LABS(TITLE="DIPLOMA GRADUATES ARE MOST EMPLOYED COMPARED TO OTHER DEGREE HOLDERS ", CAPTION="SOURCE: CPS SURVEY NOV 2022") + SCALE_FILL_DISCRETE(NAME = "EMPLOYMENT STATUS") + XLAB("EDUCATION LEVEL") + YLAB("POPULATION") G1 <- GGPLOT(CENSUS_DATA, AES(REGION)) G1 + GEOM_BAR(AES(FILL=ETHNICITY), WIDTH = 0.5) + THEME(AXIS.TEXT.X = ELEMENT_TEXT(SIZE = 15), AXIS.TEXT.Y = ELEMENT_TEXT(SIZE = 15), AXIS.TITLE.X = ELEMENT_TEXT(SIZE = 20), AXIS.TITLE.Y = ELEMENT_TEXT(SIZE = 20), TITLE = ELEMENT_TEXT(SIZE = 20), LEGEND.TEXT = ELEMENT_TEXT(SIZE = 15)) + LABS(TITLE="WHITES DOMINATES THE ETHNICITY IN ALL REGIONS ACROSS UNITED STATES", CAPTION="SOURCE: CPS SURVEY NOV 2022") + SCALE_FILL_DISCRETE(NAME = "ETHNICITY") + XLAB("REGION") + YLAB("POPULATION") + GUIDES(FILL = GUIDE_LEGEND(NCOL = 1))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
GGPLOT(CENSUS_DATA, AES(X=MARITAL_STATUS, Y= FAM_INCOME,FILL=REGION))+ GEOM_BAR(STAT = "SUMMARY", WIDTH = 0.5, POSITION = 'DODGE') + THEME(AXIS.TEXT.X = ELEMENT_TEXT(ANGLE=45, VJUST=0.6, SIZE = 15), AXIS.TEXT.Y = ELEMENT_TEXT(SIZE = 15), AXIS.TITLE.X = ELEMENT_TEXT(SIZE = 20), AXIS.TITLE.Y = ELEMENT_TEXT(SIZE = 20), TITLE = ELEMENT_TEXT(SIZE = 20), LEGEND.TEXT = ELEMENT_TEXT(SIZE = 15)) + LABS(TITLE = "MARRIED COUPLES WITH SPOUSE DOMINATES THE CENSUS WITH AVERAGE INCOME", CAPTION="SOURCE: CPS SURVEY NOV 2022") + SCALE_FILL_DISCRETE(NAME = "REGION") + XLAB("MARITAL STATUS") + YLAB("AVERAGE INCOME") TAB11DF <- AS.DATA.FRAME(TAB11) #FLEXTABLE(TAB11DF) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE1.DOCX') #FLEXTABLE(TAB12) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE2.DOCX') #FLEXTABLE(TAB13) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE3.DOCX') #FLEXTABLE(TAB14) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE4.DOCX') FLEXTABLE(TAB15) %>% SAVE_AS_DOCX(PATH = 'FLEXTBALE5.DOCX')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
NEWCENSUS12 <- DATA.FRAME(CENSUS_DATA$FAM_INCOME,CENSUS_DATA$PEEDUCA,CEN SUS_DATA$PRMJOCC1,CENSUS_DATA$PTDTRACE,CENSUS_DATA$GEREG,C ENSUS_DATA$GTMETSTA,CENSUS_DATA$HRNUMHOU,CENSUS_DATA$PRT AGE,CENSUS_DATA$PESEX,CENSUS_DATA$PEMARITL,CENSUS_DATA$PRDI SFLG,CENSUS_DATA$PTNMEMP1,CENSUS_DATA$PEDISREM,CENSUS_DATA $PECERT1) COLNAMES(NEWCENSUS12) <- C("FAM_INCOME","EDU_LEVEL","OCCUPATION","ETHNICITY","REGION", "METRO_STATUS","TOTAL_PEOPLE","AGE","GENDER","MARITAL_STATU S","DISABILITY","TOTAL_PAID_EMPLOYEES","HEALTH","CERTIFIED") CORRRRR<- COR(NEWCENSUS12) CORRPLOT(CORRRRR, METHOD = 'CIRCLE', ) # CORRELATION PLOT FOR SIX IMPORTANT VARIABLES NEWCENSUS <- DATA.FRAME(CENSUS_DATA$FAM_INCOME,CENSUS_DATA$EDU_LEVEL,CE NSUS_DATA$OCCUPATION,CENSUS_DATA$ETHNICITY,CENSUS_DATA$REGI ON,CENSUS_DATA$METRO_STATUS,CENSUS_DATA$HRNUMHOU,CENSUS_D ATA$PRTAGE,CENSUS_DATA$GENDER,CENSUS_DATA$MARITAL_STATUS,C ENSUS_DATA$DISABILITY,CENSUS_DATA$PTNMEMP1,CENSUS_DATA$HEAL TH,CENSUS_DATA$CERTIFIED) COLNAMES(NEWCENSUS) <- C("FAM_INCOME","EDU_LEVEL","OCCUPATION","ETHNICITY","REGION", "METRO_STATUS","TOTAL_PEOPLE","AGE","GENDER","MARITAL_STATU S","DISABILITY","TOTAL_PAID_EMPLOYEES","HEALTH","CERTIFIED") # REGRESSION MODEL
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
LM1 <- LM(FAM_INCOME~EDU_LEVEL+OCCUPATION+ETHNICITY+REGION+METR O_STATUS+HRNUMHOU+PRTAGE+GENDER+MARITAL_STATUS+DISABILIT Y+PTNMEMP1+HEALTH+CERTIFIED, DATA = CENSUS_DATA) SUMMARY(LM1) # RIDGE REGRESSION MODEL SET.SEED(123) TRAININDEX <- SAMPLE(X=NROW(NEWCENSUS),SIZE = NROW(NEWCENSUS)*0.7) TRAINDATA <- NEWCENSUS[TRAININDEX,] TESTDATA <- NEWCENSUS[-TRAININDEX,] TRAIN_X <- MODEL.MATRIX(FAM_INCOME~. , TRAINDATA)[,-1] TEST_X <- MODEL.MATRIX(FAM_INCOME~. , TESTDATA)[,-1] TRAIN_Y <- TRAINDATA$FAM_INCOME TEST_Y <- TESTDATA$FAM_INCOME SET.SEED(123) LAMBDA <- CV.GLMNET(TRAIN_X, TRAIN_Y, ALPHA=0 , NFOLDS = 10) PLOT(LAMBDA) LAMBDAMIN <- LAMBDA$LAMBDA.MIN
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
LAMBDA1SE <- LAMBDA$LAMBDA.1SE LAMBDAMIN LAMBDA1SE # FITTING RIDGE MODEL BASED ON LAMBDA MODEL <- GLMNET(TRAIN_X,TRAIN_Y, ALPHA = 0) PLOT(MODEL, XVAR = "LAMBDA") MODEL <- GLMNET(TRAIN_X,TRAIN_Y, ALPHA = 1) PLOT(MODEL, XVAR = "LAMBDA") # MODEL FOR LAMBDA MIN MODELMINRIDGE <- GLMNET(TRAIN_X,TRAIN_Y, ALPHA = 0, LAMBDA = LAMBDAMIN) COEF(MODELMINRIDGE) TRAIN_PREDICT_RIDGE <- PREDICT(MODELMINRIDGE, NEWX = TRAIN_X) TRAIN_RMSE_RIDGE <- RMSE(TRAIN_Y, TRAIN_PREDICT_RIDGE) TRAIN_RMSE_RIDGE # LASSO REGRESSION MODEL SET.SEED(123)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
TRAININDEX1 <- SAMPLE(X=NROW(NEWCENSUS),SIZE = NROW(NEWCENSUS)*0.7) TRAINDATA1 <- NEWCENSUS[TRAININDEX1,] TESTDATA1 <- NEWCENSUS[-TRAININDEX1,] TRAIN_X1 <- MODEL.MATRIX(FAM_INCOME~. , TRAINDATA1)[,-1] TEST_X1 <- MODEL.MATRIX(FAM_INCOME~. , TESTDATA1)[,-1] TRAIN_Y1 <- TRAINDATA1$FAM_INCOME TEST_Y1 <- TESTDATA1$FAM_INCOME SET.SEED(123) LAMBDA1 <- CV.GLMNET(TRAIN_X1, TRAIN_Y1, ALPHA=1 , NFOLDS = 10) PLOT(LAMBDA1) LAMBDAMIN1 <- LAMBDA1$LAMBDA.MIN LAMBDA1SE1 <- LAMBDA1$LAMBDA.1SE LAMBDAMIN1 LAMBDA1SE1 MODELMINRIDGE1 <- GLMNET(TRAIN_X,TRAIN_Y, ALPHA = 1, LAMBDA = LAMBDAMIN1) COEF(MODELMINRIDGE1)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
TRAIN_PREDICT <- PREDICT(MODELMINRIDGE1, NEWX = TRAIN_X) TRAIN_RMSE <- RMSE(TRAIN_Y, TRAIN_PREDICT) TRAIN_RMSE PREDICTV <- PREDICT(LM1, NEWX = TRAINX) PREDICT_RMSE <- RMSE(CENSUS_DATA$FAM_INCOME, PREDICTV) PREDICT_RMSE
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help