Skip to main content

Engineering AI and Machine Learning

You are working as a data scientists and you have received data on house prices in the Boston region. The data set contains the following variables: • crim: per capita crime rate by town • zn: proportion of residential land zoned for lots over 25,000 sq.ft. • indus: proportion of non-retail business acres per town • chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) • nox: nitric oxides concentration • rm: average number of rooms per dwelling •age: proportion of owner-occupied units built prior to 1940 • dis: weighted distances to five Boston employment centers • rad: index of accessibility to radial highways • tax: full-value property-tax rate per $10,000 • ptratio: pupil-teacher ratio by town • b: 1000(Bk – 0.63)2 where Bk is the proportion of blacks by town • Istat: % lower status of the population • medv: Median value of owner-occupied homes in $1000s Given this information: 1. Download the dataset boston.csv and open it as a PANDAS dataframe. 2. Using 'medv' as the response variable and per capita crime rate by town, proportion of owner-occupied units built prior to 1940, and nitric oxides concentration as predictors, fit a linear model (OLS), and a k-nearest neigherbour model (using the 5 nearest neighbour). Which one has better prediction properties using k-fold cross validation (k=5)? Explain why. 3. Fit a model to predict the house prices using crim, zn, indus, chas,nox,rm, age, dis, rad, tax,ptratio, b, and Istat, using OLS, Ridge, and Lasso. Show the coefficients. Use lambda equal .1 to both Ridge and Lasso. What variable(s) can be eliminated from the analysis based on the Lasso results?

You are working as a data scientists and you have received data on house prices in the Boston region. The data set contains the following variables: • crim: per capita crime rate by town • zn: proportion of residential land zoned for lots over 25,000 sq.ft. • indus: proportion of non-retail business acres per town • chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) • nox: nitric oxides concentration • rm: average number of rooms per dwelling •age: proportion of owner-occupied units built prior to 1940 • dis: weighted distances to five Boston employment centers • rad: index of accessibility to radial highways • tax: full-value property-tax rate per $10,000 • ptratio: pupil-teacher ratio by town • b: 1000(Bk – 0.63)2 where Bk is the proportion of blacks by town • Istat: % lower status of the population • medv: Median value of owner-occupied homes in $1000s Given this information: 1. Download the dataset boston.csv and open it as a PANDAS dataframe. 2. Using 'medv' as the response variable and per capita crime rate by town, proportion of owner-occupied units built prior to 1940, and nitric oxides concentration as predictors, fit a linear model (OLS), and a k-nearest neigherbour model (using the 5 nearest neighbour). Which one has better prediction properties using k-fold cross validation (k=5)? Explain why. 3. Fit a model to predict the house prices using crim, zn, indus, chas,nox,rm, age, dis, rad, tax,ptratio, b, and Istat, using OLS, Ridge, and Lasso. Show the coefficients. Use lambda equal .1 to both Ridge and Lasso. What variable(s) can be eliminated from the analysis based on the Lasso results?

Related questions

Q: Use the following confusion matrices to answer Questions 4 – 13. Please note that some of the…

A: Highest Recall Rate (also known as Sensitivity or True Positive Rate): Recall rate measures the…

Q: use https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=MCRFPUS2&f=M to prepare data…

A: The objective of the question is to prepare the data from the provided link for time-series…

Q: day. If there is no # price data on a day, we know that price did not change on that day, so its…

A: It appears that you want to clean and process the price change data to obtain daily prices at the…

Q: List the property IDs of any pair of properties that have the same number of bedrooms. For example,…

A: In this question we have to write a SQL query for the given statement Let's query and hope this…

Q: Excel: Create a totalizer (function or formula in Excel) of the quantities of “ON” shares traded,…

A: Use the following formula: =COUNTIF(range,"*txt*") =COUNTIF(C2:C11,"*ON*")

Q: Is the following statement true or false: You are able to assign numbers to a particular data set.…

A: The statement is false.

Q: ease do a minimize DFA for the following DFA: 0 1 àA F B B G D *C G E D C B E G B F A D

Q: Write a SQL function that accepts a principal mortgage amount, an annual percentage rate (APR), and…

A: SQL: SQL stands for Structured Query Language. SQL is used for manipulating, managing, and…

Q: . Generate a variable, which is age squared: gen age squared=age^2 In what kind of applications…

A: Let's see the answer:

Q: Select a random working day in your work week. Log all activities that qualify as communication…

A: So, I am taking my example.

Q: You are contracted to complete the data system for Citywide Taxi Company. Now, the information for…

A: Given:You are contracted to complete the data system for Citywide Taxi Company. Now, the information…

Q: SPSS Assignment: Variability The following output shows the measures of variability for the variable…

A: 4. interquartile range = 75th percentile- 25th percentile Since the interquartile range contains the…

Q: A course instructor will assign grades from (A, B+, B, C+, C, D+, D, E). What is the minimum number…

A: Total number of grades = 8 To Find : The minimum number of grades that must be assigned in order to…

Q: Use the given data to classify the record below using the k-NN algorithm for k=1 to 5. Loan…

A: Based on the information you provided, we can classify the record with the given loan purpose,…

Q: You are working as a data scientists and you have received data on house prices in the Boston…

A: Download the dataset 'boston.csv' from the provided link and save it locally. Import the necessary…

Q: SQL: Consider the following relational schema: Staff (staffNo, name,…

A: SQL stands for Structured Query Language. It is a standard programming language used to manage and…

Q: 1 Let an approximate value of π is given by x₁ = 3.142871 and its true value is x = 3.1415926. Then…

A: Question 1Absolute error Δx = | True value - Approximate value | Δx = |3.14156 -…

Q: Happy Shop sells different products to the customers. A customer could buy one or more products at a…

A: Ans: The primary key for the given table is customer ID. The 3 different update anomalies are as…

Q: Testa Inc. is a company specializing on self-driving electric vehicle (EV) production. In Testa, a…

A: The design and creation of a MySQL database for managing the data related to Testa Inc., a company…

Q: Task 4: As a part of marketing strategy, the Marketing team continuously conducting an advertising…

A: Actually, database is a used to stores the data.

Q: Using Jaccard coefficient, find the most two similar objects in the following dataset. Att1 Att2…

A: NSWER : PLEASE REFER TO THE IMAGES BELOW : (Handwritten Solution)

Q: for a 1000x5 matrix where each column is a specific data for profit r=randi(10,1000,5) write a…

A: Required MATLAB code is given below

Q: For the carpet city problem, upload your Excel file that shows 3-month moving average and the…

A: MAD for 3 month moving average is 1. Solution with the formula is below:

Q: | E_name +--- +--· | Susan Brand | 24000.00 | 83 E_salary | E_age | M_name -+-- | M_salary | M_age…

A: To create a view that lists the required information, we can use a SELECT statement with JOIN and…

Q: Select a random working day in your work week. Log all activities that qualify as communication…

A: Fraction of a day which represents communication will include talking to friends over coffee,…

Question

1 crim
ΝΕ
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Ready
A
0.00632
0.02731
0.02729
0.03237
0.06905
0.02985
0.08829
0.14455
0.21124
0.17004
0.22489
0.11747
0.09378
0.62976
0.63796
0.62739
1.05393
0.7842
0.80271
0.7258
1.25179
0.85204
1.23247
0.98843
0.75026
0.84054
0.67191
0.95577
0.77299
1.00245
1.13081
Boston
zn
18
0
0
0
0
0
12.5
12.5
12.5
12.5
12.5
12.5
12.5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
indus
Accessibility: Unavailable
C
2.31
7.07
7.07
2.18
2.18
2.18
7.87
7.87
7.87
7.87
7.87
7.87
7.87
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
8.14
chas
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
nox
E
0.538
0.469
0.469
0.458
0.458
0.458
0.524
0.524
0.524
0.524
0.524
0.524
0.524
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
0.538
rm
F
6.575
6.421
7.185
6.998
7.147
6.43
6.012
6.172
5.631
6.004
6.377
6.009
5.889
5.949
6.096
5.834
5.935
5.99
5.456
5.727
5.57
5.965
6.142
5.813
5.924
5.599
5.813
6.047
6.495
6.674
5.713
age
G
65.2
78.9
61.1
45.8
54.2
58.7
66.6
96.1
100
85.9
94.3
82.9
39
61.8
84.5
56.5
29.3
81.7
36.6
69.5
98.1
89.2
91.7
100
94.1
85.7
90.3
88.8
94.4
87.3
94.1
dis
H
4.09
4.9671
4.9671
6.0622
6.0622
6.0622
5.5605
5.9505
6.0821
6.5921
6.3467
6.2267
5.4509
4.7075
4.4619
4.4986
4.4986
4.2579
3.7965
3.7965
3.7979
4.0123
3.9769
4.0952
4.3996
4.4546
4.682
4.4534
4.4547
4.239
4.233
rad
|
1
2
2
3
3
3
5
5
5
5
5
5
5
4
4
4
4
4
4
4
4
4
4
4
4
tax
4
4
4
4
4
4
296
242
242
222
222
222
311
311
311
311
311
311
311
307
307
307
307
307
307
307
307
307
307
307
307
307
307
307
307
307
307
K
ptratio
15.3
17.8
17.8
18.7
18.7
18.7
15.2
15.2
15.2
15.2
15.2
15.2
15.2
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
b
L
396.9
396.9
392.83
394.63
396.9
394.12
395.6
396.9
386.63
386.71
392.52
396.9
390.5
396.9
380.02
395.62
386.85
386.75
288.99
390.95
376.57
392.53
396.9
394.54
394.33
303.42
376.88
306.38
387.94
380.23
360.17
Istat
M
4.98
9.14
4.03
2.94
5.33
5.21
12.43
19.15
29.93
17.1
20.45
13.27
15.71
8.26
10.26
8.47
6.58
14.67
11.69
11.28
21.02
13.83
18.72
19.88
16.3
16.51
14.81
17.28
12.8
11.98
22.6
N
medv
24
21.6
34.7
33.4
36.2
28.7
22.9
27.1
16.5
18.9
15
18.9
21.7
20.4
18.2
19.9
23.1
17.5
20.2
18.2
13.6
19.6
15.2
14.5
15.6
13.9
16.6
14.8
18.4
21
12.7
B
O
a
I

You are working as a data scientists and you have received data on house prices in the Boston region.
The data set contains the following variables:
• crim: per capita crime rate by town
• zn: proportion of residential land zoned for lots over 25,000 sq.ft.
• indus: proportion of non-retail business acres per town
• chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
• nox: nitric oxides concentration
• rm: average number of rooms per dwelling
•age: proportion of owner-occupied units built prior to 1940
• dis: weighted distances to five Boston employment centers
• rad: index of accessibility to radial highways
• tax: full-value property-tax rate per $10,000
ptratio: pupil-teacher ratio by town
• b: 1000(Bk - 0.63)² where Bk is the proportion of blacks by town
Istat: % lower status of the population
• medv: Median value of owner-occupied homes in $1000s
Given this information:
1. Download the dataset boston.csv and open it as a PANDAS dataframe.
2. Using 'medv' as the response variable and per capita crime rate by town, proportion of owner-occupied units built prior to 1940, and nitric oxides
concentration as predictors, fit a linear model (OLS), and a k-nearest neigherbour model (using the 5 nearest neighbour). Which one has better prediction
properties using k-fold cross validation (k=5)? Explain why.
3. Fit a model to predict the house prices using crim, zn, indus, chas,nox,rm, age, dis, rad, tax,ptratio, b, and Istat, using OLS, Ridge, and Lasso. Show the
coefficients. Use lambda equal .1 to both Ridge and Lasso. What variable(s) can be eliminated from the analysis based on the Lasso results?

Expert Solution

This question has been solved!

Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.

See solution Check out a sample Q&A here

Step 1: Model building - part 2

Step 2: Model building - part 3

Step 3: Python code consolidated

Step 4: Python code execution

Step 5: Explanation of the results

Solution

bartleby

Step by stepSolved in 6 steps with 2 images

Check out a sample Q&A here

Blurred answer

Knowledge Booster

Background pattern image

Similar questions