Database System Concepts
Database System Concepts
7th Edition
ISBN: 9780078022159
Author: Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher: McGraw-Hill Education
Bartleby Related Questions Icon

Related questions

Question

TODO 12

Let's now split our input data X and labels y into a train and test set using the train_test_split() function (docs). Here we'll use the 80-20 split rule where we use 80% of the data for training and 20% for testing. Lastly, we'll seed our split using the random_state keyword argument which will make sure we create the same split every time we run the function.

  1. Use the train_test_split() function to get a train and test split. Store the output into X_train, X_test, y_train, and y_test.
    1. Pass the required arguments X and y.
    2. Further specify we want to use 20% of the training data by setting the test_size keyword argument.
    3. Lastly, pass the keyword argument random_state=42 to set the random seed so we get the same split every time we run this code.
  2. Print the shape for X_train.
  3. Print the shape for y_train.
  4. Print the shape for X_test.
  5. Print the shape for y_test.

# TODO 12.1
X_train, X_test, y_train, y_test = 

todo_check([
    (X_train.shape == (413, 29), 'X_train does not have the correct shape (413, 29)'),
    (X_test.shape == (104, 29), 'X_test does not have the correct shape (104, 29)'),
    (y_train.shape == (413,), 'y_train does not have the correct shape (413,)'),
    (y_test.shape == (104,), 'y_test does not have the correct shape (104,)'),
    (np.all(np.isclose(X_train.values[-5:, -4], np.array([17.7, 18.2, 21.8, 23.8, 20.1]),rtol=.01)), 'X_train does not contain the correct values! Make sure you used `X` when splitting!'),
    (np.all(np.isclose(y_test.values[-5:], np.array([1.25561604, 1.8531681 , 1.15373159, 4.01259206, 3.56558124]),rtol=.01)), 'y_test does not have the correct values! Make sure you used `y` when splitting!')
])

Check to see if you split the data correct you should get 413 data samples for `X_train` and `y_train`. Furthermore, you should get 104 samples for `X_test` and `y_test` as seen below.

# TODO 12.2
print(X_train.shape)

# TODO 12.3
print(Y_train.shape)

# TODO 12.4
print(x_train.shape)

# TODO 12.5
print(y_train.shape)

Expert Solution
Check Mark
Step 1

Code:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

Knowledge Booster
Background pattern image
Computer Science
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
SEE MORE QUESTIONS
Recommended textbooks for you
Text book image
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Text book image
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Text book image
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
Text book image
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Text book image
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Text book image
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education