Question
I need help with the machine learning question. I need help on trying to use this.
Don't know which code to use.
- Build a simple linear regression model to predict tip from total_bill using scikit-learn.
- Calculate and interpret the R-squared value, and MSE for both the training and testing sets.
- Generate a residual plot (using the testing dataset: y_test - y_pred) and analyze its distribution.
- Interpret the slope and intercept of the regression line.
- Split the dataset into 70% for model training and 30% for testing the model.
- Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day.
- Use the same split as above (70% for training and 30% for testing).
- Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSE).
- Compare the performance of the simple and multiple linear regression models.

Transcribed Image Text:Question 5: Linear Regression with SKlearn
• Build a simple linear regression model to predict tip from total_bill using scikit-learn.
⚫ Calculate and interpret the R-squared value, and MSE for both the training and testing sets.
• Generate a residual plot (using the testing dataset: y_test-y_pred) and analyze its distribution.
Interpret the slope and intercept of the regression line.
⚫ Split the dataset into 70% for model training and 30% for testing the model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean squared_error, r2_score
# Code goes here
Training Set:
R-squared: 0.4555559076450443
MSE: 1.1627372520616328
H
Testing Set:
R-squared: 0.4291782688312412
MSE: 0.7524510349809156
Residual Plot on the Training Sample
-1
Intercept: 0.8769576391532712
Slope: 0.10889370921420224
2
Residual Plot on the Testing Sample
1
2
-N
![▾ Question 6: Multiple Linear Regression Model
• Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day.
• Use the same split as above (70% for training and 30% for testing).
• Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSB
⚫ Compare the performance of the simple and multiple linear regression models.
[] # Code goes here
total bill tip size sex Female smoker_No day Fri day Sat day Sun time Dinner
0
16.99 1.01
2
1
1
0
0
1
1
1
10.34 1.66
3
0
1
0
0
1
1
2
21.01 3.50
3
0
1
0
0
1
1
Model metrics (MSE & R^2) on Training Dataset:
R-squared: 0.49252348904344123
MSE: 1.0837877609860398
Model metrics (MSE & R^2) on Testing Dataset:
R-squared: 0.2930966744126695
MSE: 0.9318323215911046
▾ Question 7: Regularization
• Apply StandardScaler on the datasets.
⚫ Check which alpha parameter is the optimal value among [0.1, 1, 10, 100, 200]
• Apply Ridge Linear Regression with the best alpha value.
• What is your conclusion?
[ ] from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
# Code goes here](https://content.bartleby.com/qna-images/question/5038e9fe-a67b-4654-bcd7-3337a3eb6f7a/d6daf184-beb1-484f-8ac1-381f2a85c7c1/cuijwxc_thumbnail.jpeg)
Transcribed Image Text:▾ Question 6: Multiple Linear Regression Model
• Build a multiple linear regression model to predict tip using total_bill, size, sex, smoker, and day.
• Use the same split as above (70% for training and 30% for testing).
• Evaluate the model's performance using the same metrics as in the simple linear regression (R^2 & MSB
⚫ Compare the performance of the simple and multiple linear regression models.
[] # Code goes here
total bill tip size sex Female smoker_No day Fri day Sat day Sun time Dinner
0
16.99 1.01
2
1
1
0
0
1
1
1
10.34 1.66
3
0
1
0
0
1
1
2
21.01 3.50
3
0
1
0
0
1
1
Model metrics (MSE & R^2) on Training Dataset:
R-squared: 0.49252348904344123
MSE: 1.0837877609860398
Model metrics (MSE & R^2) on Testing Dataset:
R-squared: 0.2930966744126695
MSE: 0.9318323215911046
▾ Question 7: Regularization
• Apply StandardScaler on the datasets.
⚫ Check which alpha parameter is the optimal value among [0.1, 1, 10, 100, 200]
• Apply Ridge Linear Regression with the best alpha value.
• What is your conclusion?
[ ] from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
# Code goes here
Expert Solution

This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by stepSolved in 2 steps
