Let's now split our input data X and labels y into a train and test set using the train_test_split() function (docs). Here we'll use the 80-20 split rule where we use 80% of the data for training and 20% for testing. Lastly, we'll seed our split using the random_state keyword argument which will make sure we create the same split every time we run the function. Use the train_test_split() function to get a train and test split. Store the output into X_train, X_test, y_train, and y_test. Pass the required arguments X and y. Further specify we want to use 20% of the training data by setting the test_size keyword argument. Lastly, pass the keyword argument random_state=42 to set the random seed so we get the same split every time we run this code. Print the shape for X_train. Print the shape for y_train. Print the shape for X_test. Print the shape for y_test. # TODO 12.1 X_train, X_test, y_train, y_test =  todo_check([     (X_train.shape == (413, 29), 'X_train does not have the correct shape (413, 29)'),     (X_test.shape == (104, 29), 'X_test does not have the correct shape (104, 29)'),     (y_train.shape == (413,), 'y_train does not have the correct shape (413,)'),     (y_test.shape == (104,), 'y_test does not have the correct shape (104,)'),     (np.all(np.isclose(X_train.values[-5:, -4], np.array([17.7, 18.2, 21.8, 23.8, 20.1]),rtol=.01)), 'X_train does not contain the correct values! Make sure you used `X` when splitting!'),     (np.all(np.isclose(y_test.values[-5:], np.array([1.25561604, 1.8531681 , 1.15373159, 4.01259206, 3.56558124]),rtol=.01)), 'y_test does not have the correct values! Make sure you used `y` when splitting!') ]) Check to see if you split the data correct you should get 413 data samples for `X_train` and `y_train`. Furthermore, you should get 104 samples for `X_test` and `y_test` as seen below. # TODO 12.2 print(X_train.shape) # TODO 12.3 print(Y_train.shape) # TODO 12.4 print(x_train.shape) # TODO 12.5 print(y_train.shape)

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

TODO 12

Let's now split our input data X and labels y into a train and test set using the train_test_split() function (docs). Here we'll use the 80-20 split rule where we use 80% of the data for training and 20% for testing. Lastly, we'll seed our split using the random_state keyword argument which will make sure we create the same split every time we run the function.

  1. Use the train_test_split() function to get a train and test split. Store the output into X_train, X_test, y_train, and y_test.
    1. Pass the required arguments X and y.
    2. Further specify we want to use 20% of the training data by setting the test_size keyword argument.
    3. Lastly, pass the keyword argument random_state=42 to set the random seed so we get the same split every time we run this code.
  2. Print the shape for X_train.
  3. Print the shape for y_train.
  4. Print the shape for X_test.
  5. Print the shape for y_test.

# TODO 12.1
X_train, X_test, y_train, y_test = 

todo_check([
    (X_train.shape == (413, 29), 'X_train does not have the correct shape (413, 29)'),
    (X_test.shape == (104, 29), 'X_test does not have the correct shape (104, 29)'),
    (y_train.shape == (413,), 'y_train does not have the correct shape (413,)'),
    (y_test.shape == (104,), 'y_test does not have the correct shape (104,)'),
    (np.all(np.isclose(X_train.values[-5:, -4], np.array([17.7, 18.2, 21.8, 23.8, 20.1]),rtol=.01)), 'X_train does not contain the correct values! Make sure you used `X` when splitting!'),
    (np.all(np.isclose(y_test.values[-5:], np.array([1.25561604, 1.8531681 , 1.15373159, 4.01259206, 3.56558124]),rtol=.01)), 'y_test does not have the correct values! Make sure you used `y` when splitting!')
])

Check to see if you split the data correct you should get 413 data samples for `X_train` and `y_train`. Furthermore, you should get 104 samples for `X_test` and `y_test` as seen below.

# TODO 12.2
print(X_train.shape)

# TODO 12.3
print(Y_train.shape)

# TODO 12.4
print(x_train.shape)

# TODO 12.5
print(y_train.shape)

Expert Solution
Step 1

Code:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Functions
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education