Database System Concepts
Database System Concepts
7th Edition
ISBN: 9780078022159
Author: Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher: McGraw-Hill Education
Bartleby Related Questions Icon

Related questions

bartleby

Concept explainers

Question

decision_trees.ipynb [Part 2]

1. Pima Indian Diabetes Dataset

The Pima Indians Diabetes Data Set was developed by the United States National Institute of Diabetes and Digestive and Kidney Diseases.

 

Astonishingly, over 30% of Pima people develop diabetes. In contrast, the diabetes rate in the United States is 8.3% and in China it is 4.2%.

Each instance in the dataset represents information about a Pima woman over the age of 21 and belonged to one of two classes: a person who developed diabetes within five years, or a person that did not. There are eight attributes in addition to the column representing whether or not they developed diabetes:

  1. The number of times the woman was pregnant
  2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  3. Diastolic blood pressure (mm Hg)
  4. Triceps skin fold thickness (mm)
  5. 2-Hour serum insulin (mu U/ml)
  6. Body mass index (weight in kg/(height in m)^2)
  7. Diabetes pedigree function
  8. Age
  9. Whether they got diabetes or not (0 = no, 1 = yes)

We are trying to predict whether they got diabetes or not based on the features.

The csv file at is at

https://raw.githubusercontent.com/yew1eb/machine-learning/master/Naive-bayes/pima-indians-diabetes.data.csv

This file does not have a header row

You will need to

  1. load the file into a dataframe
  2. divide the data into training and test sets. (an 80-20 split sounds good)
  3. train a decision tree classifier on the training data
  4. display the tree
  5. run the classifier on the test data
  6. compute the accuracy
  7. Have a small paragraph describing the results.

Good luck!

[].....

[].....

[]....

2. The Wisconsin Cancer Datasett

The task is to predict whether a tumor is malignant or benign (the second column of the dataset based on 30 real values.

The data file is

https://raw.githubusercontent.com/zacharski/ml-class/master/data/wdbc.data

And a writeup about the data is at:

https://raw.githubusercontent.com/zacharski/ml-class/master/data/wdbc.names

Follow the same steps as above.

[]......

[].....

 

Expert Solution
Check Mark
Knowledge Booster
Background pattern image
Computer Science
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
Text book image
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Text book image
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Text book image
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
Text book image
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Text book image
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Text book image
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education