Let's now split our input data X and labels y into a train and test set using the train_test_split() function (docs). Here we'll use the 80-20 split rule where we use 80% of the data for training and 20% for testing. Lastly, we'll seed our split using the random_state keyword argument which will make sure we create the same split every time we run the function. Use the train_test_split() function to get a train and test split. Store the output into X_train, X_test, y_train, and y_test. Pass the required arguments X and y. Further specify we want to use 20% of the training data by setting the test_size keyword argument. Lastly, pass the keyword argument random_state=42 to set the random seed so we get the same split every time we run this code. Print the shape for X_train. Print the shape for y_train. Print the shape for X_test. Print the shape for y_test. # TODO 12.1 X_train, X_test, y_train, y_test = todo_check([ (X_train.shape == (413, 29), 'X_train does not have the correct shape (413, 29)'), (X_test.shape == (104, 29), 'X_test does not have the correct shape (104, 29)'), (y_train.shape == (413,), 'y_train does not have the correct shape (413,)'), (y_test.shape == (104,), 'y_test does not have the correct shape (104,)'), (np.all(np.isclose(X_train.values[-5:, -4], np.array([17.7, 18.2, 21.8, 23.8, 20.1]),rtol=.01)), 'X_train does not contain the correct values! Make sure you used `X` when splitting!'), (np.all(np.isclose(y_test.values[-5:], np.array([1.25561604, 1.8531681 , 1.15373159, 4.01259206, 3.56558124]),rtol=.01)), 'y_test does not have the correct values! Make sure you used `y` when splitting!') ]) Check to see if you split the data correct you should get 413 data samples for `X_train` and `y_train`. Furthermore, you should get 104 samples for `X_test` and `y_test` as seen below. # TODO 12.2 print(X_train.shape) # TODO 12.3 print(Y_train.shape) # TODO 12.4 print(x_train.shape) # TODO 12.5 print(y_train.shape)
TODO 12
Let's now split our input data X and labels y into a train and test set using the train_test_split() function (docs). Here we'll use the 80-20 split rule where we use 80% of the data for training and 20% for testing. Lastly, we'll seed our split using the random_state keyword argument which will make sure we create the same split every time we run the function.
- Use the train_test_split() function to get a train and test split. Store the output into X_train, X_test, y_train, and y_test.
- Pass the required arguments X and y.
- Further specify we want to use 20% of the training data by setting the test_size keyword argument.
- Lastly, pass the keyword argument random_state=42 to set the random seed so we get the same split every time we run this code.
- Print the shape for X_train.
- Print the shape for y_train.
- Print the shape for X_test.
- Print the shape for y_test.
# TODO 12.1
X_train, X_test, y_train, y_test =
todo_check([
(X_train.shape == (413, 29), 'X_train does not have the correct shape (413, 29)'),
(X_test.shape == (104, 29), 'X_test does not have the correct shape (104, 29)'),
(y_train.shape == (413,), 'y_train does not have the correct shape (413,)'),
(y_test.shape == (104,), 'y_test does not have the correct shape (104,)'),
(np.all(np.isclose(X_train.values[-5:, -4], np.array([17.7, 18.2, 21.8, 23.8, 20.1]),rtol=.01)), 'X_train does not contain the correct values! Make sure you used `X` when splitting!'),
(np.all(np.isclose(y_test.values[-5:], np.array([1.25561604, 1.8531681 , 1.15373159, 4.01259206, 3.56558124]),rtol=.01)), 'y_test does not have the correct values! Make sure you used `y` when splitting!')
])
Check to see if you split the data correct you should get 413 data samples for `X_train` and `y_train`. Furthermore, you should get 104 samples for `X_test` and `y_test` as seen below.
# TODO 12.2
print(X_train.shape)
# TODO 12.3
print(Y_train.shape)
# TODO 12.4
print(x_train.shape)
# TODO 12.5
print(y_train.shape)
Code:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
Trending now
This is a popular solution!
Step by step
Solved in 2 steps