Database System Concepts
7th Edition
ISBN: 9780078022159
Author: Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher: McGraw-Hill Education
expand_more
expand_more
format_list_bulleted
Concept explainers
Question
Load & check the data:
1. Load the data into a pandas dataframe named data_firstname where first name is you name.
2. Carryout some initial investigations:
a. Check the names and types of columns.
b. Check the missing values.
c. Check the statistics of the numeric fields (mean, min, max, median, count..etc.)
d. In you written response write a paragraph explaining your findings about each column.
Pre-process and visualize the data
3. Replace the ‘?’ mark in the ‘bare’ column by np.nan and change the type to ‘float’
4. Fill any missing data with the median of the column.
5. Drop the ID column
6. Using Pandas, Matplotlib, seaborn (you can use any or a mix) generate 3-5 plots and add them
to your written response explaining what are the key insights and findings from the plots.
7. Separate the features from the class.
8. Split your data into train 80% train and 20% test, use the last two digits of your student number
for the seed.
Build Classification Models
Supportvector machine classifier with linear kernel
1. Load the data into a pandas dataframe named data_firstname where first name is you name.
2. Carryout some initial investigations:
a. Check the names and types of columns.
b. Check the missing values.
c. Check the statistics of the numeric fields (mean, min, max, median, count..etc.)
d. In you written response write a paragraph explaining your findings about each column.
Pre-process and visualize the data
3. Replace the ‘?’ mark in the ‘bare’ column by np.nan and change the type to ‘float’
4. Fill any missing data with the median of the column.
5. Drop the ID column
6. Using Pandas, Matplotlib, seaborn (you can use any or a mix) generate 3-5 plots and add them
to your written response explaining what are the key insights and findings from the plots.
7. Separate the features from the class.
8. Split your data into train 80% train and 20% test, use the last two digits of your student number
for the seed.
Build Classification Models
Support
9. Train an SVM classifier using the training data, set the kernel to linear and set the regularization
parameter to C= 0.1. Name the classifier clf_linear_firstname.
10. Print out two accuracy score one for the model on the training set i.e. X_train, y_train and the
other on the testing set i.e. X_test, y_test. Record both results in your written response.
11. Generate the accuracy matrix. Record the results in your written response.
Support vector machine classifier with “rbf” kernel
12. Repeat steps 9 to 11, in step 9 change the kernel to “rbf” and do not set any value for C.
Support vector machine classifier with “poly” kernel
13. Repeat steps 9 to 11, in step 9 change the kernel to “poly” and do not set any value for C.
Support vector machine classifier with “sigmoid” kernel
14. Repeat steps 9 to 11, in step 9 change the kernel to “sigmoid” and do not set any value for C.
(Optional: for steps 9 to 14 you can consider a loop)
By now you have the results of four SVM classifiers with different kernels recorded in your written
report. Please examine and write a small paragraph indicating which classifier you would recommend
and why
parameter to C= 0.1. Name the classifier clf_linear_firstname.
10. Print out two accuracy score one for the model on the training set i.e. X_train, y_train and the
other on the testing set i.e. X_test, y_test. Record both results in your written response.
11. Generate the accuracy matrix. Record the results in your written response.
Support vector machine classifier with “rbf” kernel
12. Repeat steps 9 to 11, in step 9 change the kernel to “rbf” and do not set any value for C.
Support vector machine classifier with “poly” kernel
13. Repeat steps 9 to 11, in step 9 change the kernel to “poly” and do not set any value for C.
Support vector machine classifier with “sigmoid” kernel
14. Repeat steps 9 to 11, in step 9 change the kernel to “sigmoid” and do not set any value for C.
(Optional: for steps 9 to 14 you can consider a loop)
By now you have the results of four SVM classifiers with different kernels recorded in your written
report. Please examine and write a small paragraph indicating which classifier you would recommend
and why
answer question 7,8,9,10
Expert Solution
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
This is a popular solution
Trending nowThis is a popular solution!
Step by stepSolved in 2 steps
Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Similar questions
- In Oracle, a calculated field such as item_price * item_quantity differs from a table field in what way? Question 26 options: 1) A calculated field should be saved in the table after its value has been computed. 2) There is no difference. They are just synonyms for the same concept. 3) A calculated field's value appears in the output of a SELECT statement, but it is not physically stored as a field in the table. 4) A sum cannot be computed for a calculated field.arrow_forwardRead the data into a DataFrame with ID as the index. Convert the “Hired” column into Date/Time data type Create a new column with years of experience with the company at present without rounding. Create a new Boolean column for senior status with employees with at least 10 years of experience as senior and others are not. Create a new column for longevity pay equal to $150 per whole year of experience in the company. Create a list of column names for each data type in the DataFrtame.arrow_forwardOpen the Missing Addresses query in Design view. Add a new column to determine if a customer does not have an address on file. If the customer’s Address is null, it should display Missing. If not, it should display nothing. Name the column AddressPresent. Add criteria of Missing to the column you just created, so only the customers missing an address display. Move the AddressPresent field so it appears between PhoneNumber and Address. Run the query. Ensure only customers with null Address fields display. Save and close the query.arrow_forward
- In your opinion, what's the difference between using group data and using ungroup data?arrow_forwardQUESTION 8 Columns added to a table in transform data can be seen in the Data View? O True Falsearrow_forwardFor SQL, Modify the following select statement. Convert the datatypes of all the columns to text. (Actually, sometimes this code will work as it is and the conversion of the datatypes is done automatically for you behind the scenes.)select date_1, date_1, date_1from sec1507_firstunionselect number_2, word_2, date_2from sec1507_second;arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education