Concept explainers
decision_trees.ipynb [Part 2]
1. Pima Indian Diabetes Dataset
The Pima Indians Diabetes Data Set was developed by the United States National Institute of Diabetes and Digestive and Kidney Diseases.
Astonishingly, over 30% of Pima people develop diabetes. In contrast, the diabetes rate in the United States is 8.3% and in China it is 4.2%.
Each instance in the dataset represents information about a Pima woman over the age of 21 and belonged to one of two classes: a person who developed diabetes within five years, or a person that did not. There are eight attributes in addition to the column representing whether or not they developed diabetes:
- The number of times the woman was pregnant
- Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- Diastolic blood pressure (mm Hg)
- Triceps skin fold thickness (mm)
- 2-Hour serum insulin (mu U/ml)
- Body mass index (weight in kg/(height in m)^2)
- Diabetes pedigree function
- Age
- Whether they got diabetes or not (0 = no, 1 = yes)
We are trying to predict whether they got diabetes or not based on the features.
The csv file at is at
https://raw.githubusercontent.com/yew1eb/machine-learning/master/Naive-bayes/pima-indians-diabetes.data.csv
This file does not have a header row
You will need to
- load the file into a dataframe
- divide the data into training and test sets. (an 80-20 split sounds good)
- train a decision tree classifier on the training data
- display the tree
- run the classifier on the test data
- compute the accuracy
- Have a small paragraph describing the results.
Good luck!
[].....
[].....
[]....
2. The Wisconsin Cancer Datasett
The task is to predict whether a tumor is malignant or benign (the second column of the dataset based on 30 real values.
The data file is
https://raw.githubusercontent.com/zacharski/ml-class/master/data/wdbc.data
And a writeup about the data is at:
https://raw.githubusercontent.com/zacharski/ml-class/master/data/wdbc.names
Follow the same steps as above.
[]......
[].....
Trending nowThis is a popular solution!
Step by stepSolved in 2 steps
- What function does it serve to omit a data item from a data model?arrow_forwardTask:Let us consider the following relational database. The primary key column(s) of each table is denoted by an underline. The foreign keys are italicized.Schema:▪ Customers (custID, fName, lName, password)▪ Phones (custID, phone)▪ Items (iID, name, price, qtyInStock)▪ OrdersPlaces (oID, ordDate, shippingDate, receivalDate payAmount, payMethod, custID)▪ Contain (oID, iID, price, qty)Specifically, the foreign keys for this database are as follows:• the column custID of relation Phones that references table Customers, • the column custID of relation OrdersPlaces that references table Customers, • the column oID of relation Contain references table OrdersPlaces, and• the column iID of relation Contain references table Items, This is an individual assignment – no group submissions are allowed. Submit a script file that contains the SELECT statements by assigned date. The outline of the script file lists as follows:/*…arrow_forwardExplain how Logistic Regression works. Note: Please do it with your own words. Thankyouarrow_forward
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education