Answered: Let's now split our input data X and labels y into a train and test set using the train_test_split() function (docs). Here we'll use the 80-20 split rule where…

Database System Concepts

7th Edition

ISBN: 9780078022159

Author: Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher: McGraw-Hill Education

See similar textbooks

TODO 12

Let's now split our input data X and labels y into a train and test set using the train_test_split() function (docs). Here we'll use the 80-20 split rule where we use 80% of the data for training and 20% for testing. Lastly, we'll seed our split using the random_state keyword argument which will make sure we create the same split every time we run the function.

Use the train_test_split() function to get a train and test split. Store the output into X_train, X_test, y_train, and y_test.
1. Pass the required arguments X and y.
2. Further specify we want to use 20% of the training data by setting the test_size keyword argument.
3. Lastly, pass the keyword argument random_state=42 to set the random seed so we get the same split every time we run this code.
Print the shape for X_train.
Print the shape for y_train.
Print the shape for X_test.
Print the shape for y_test.

# TODO 12.1
X_train, X_test, y_train, y_test =

todo_check([
(X_train.shape == (413, 29), 'X_train does not have the correct shape (413, 29)'),
(X_test.shape == (104, 29), 'X_test does not have the correct shape (104, 29)'),
(y_train.shape == (413,), 'y_train does not have the correct shape (413,)'),
(y_test.shape == (104,), 'y_test does not have the correct shape (104,)'),
(np.all(np.isclose(X_train.values[-5:, -4], np.array([17.7, 18.2, 21.8, 23.8, 20.1]),rtol=.01)), 'X_train does not contain the correct values! Make sure you used `X` when splitting!'),
(np.all(np.isclose(y_test.values[-5:], np.array([1.25561604, 1.8531681 , 1.15373159, 4.01259206, 3.56558124]),rtol=.01)), 'y_test does not have the correct values! Make sure you used `y` when splitting!')
])

Check to see if you split the data correct you should get 413 data samples for `X_train` and `y_train`. Furthermore, you should get 104 samples for `X_test` and `y_test` as seen below.

# TODO 12.2
print(X_train.shape)

# TODO 12.3
print(Y_train.shape)

# TODO 12.4
print(x_train.shape)

# TODO 12.5
print(y_train.shape)

Expert Solution

Step 1

Code:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

Similar questions

SEE MORE QUESTIONS

Recommended textbooks for you

Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education