Question 2. (To be done using 'R’). For this question, you will have to work with Pima Indians Diabetes Database data set in R, named 'PimalndiansDiabetes' from 'mlbench' library. The complete data set can be seen by simply typing PimaIndiansDiabetes into the R console, however, for the sake of this question, we will be working with subsets of this data frame. As part of your solutions for this question, you will have to print screen or save some output, so I suggest to submit the entirety of this question as a Word or LaTeX document (use \begin{verbatim} Your R output \end{verbatim}) separate to the hand- written document you scan and submit for the previous questions. a) The first step in this question, is to remove the columns that we will not be using in the analysis from the original PimalndiansDiabetes data set, including the 'diabetes' variable which contains the outcome (positive/negative), so that all 768 observations in the sample can be thought as being from the same population, i.e. assume we did not know they were separated into diabetes outcomes. To do this, we will create a new data frame, called sample.data which consists of all rows of the the PimalndiansDiabetes data frame but only the columns from 1 to 4, using the following code: > library (mlbench) > data ("PimaIndiansDiabetes") > sample.data<- data.frame (PimaIndiansDiabetes[,1:4]) i) For this new data set, we will assume that the observation vectors are from a multi- variate normal distribution. By creating and analysing Q-Q plots, conclude if this is a valid assumption and give reasons for your conclusion? ii) Calculate the sample mean vector j and sample covariance matrix S for this data; iii) Regardless of your conclusion in part i), we continue with the assumption of mul- tivariate normality. Using the HotellingsT2() function, print and analyse the output to test the null hypothesis Ho : ji= jio , where 120 lo = 70 20 [Note that you will have to install the packages "mvtnorm" and "ICSNP" to use the HotellingsT2 () function] iv) Using the plot () function, plot the sample profile for this data set, with solid points at each value, a dashed line connecting the points and sensibly named axes and title;

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question
Needed to be solve correctly I need only answers not code please just wirte answers of all part's and get thumbs up
Question 2. (To be done using 'R'). For this question, you will have to work with Pima
Indians Diabetes Database data set in R, named 'PimalndiansDiabetes' from 'mlbench'
library. The complete data set can be seen by simply typing PimaIndiansDiabetes
into the R console, however, for the sake of this question, we will be working with subsets
of this data frame.
As part of your solutions for this question, you will have to print screen or save some
output, so I suggest to submit the entirety of this question as a Word or LaTeX document (use
\begin{verbatim} Your R output \end{verbatim})separate to the hand-
written document you scan and submit for the previous questions.
a) The first step in this question, is to remove the columns that we will not be using in
the analysis from the original PimalndiansDiabetes data set, including the 'diabetes'
variable which contains the outcome (positive/negative), so that all 768 observations in
the sample can be thought as being from the same population, i.e. assume we did not
know they were separated into diabetes outcomes. To do this, we will create a new data
frame, called sample.data which consists of all rows of the the PimalndiansDiabetes
data frame but only the columns from 1 to 4, using the following code:
> library (mlbench)
> data("PimaIndiansDiabetes")
> sample.data<- data.frame (PimaIndiansDiabetes[,1:4])
i) For this new data set, we will assume that the observation vectors are from a multi-
variate normal distribution. By creating and analysing Q-Q plots, conclude if this
is a valid assumption and give reasons for your conclusion?
ii) Calculate the sample mean vector j and sample covariance matrix S for this data;
iii) Regardless of your conclusion in part i), we continue with the assumption of mul-
tivariate normality. Using the HotellingsT2() function, print and analyse the
output to test the null hypothesis Ho : ji = jño, where
()
4
120
jio =
70
20
[Note that you will have to install the packages "mvtnorm" and "ICSNP" to
use the HotellingsT2 () function]
iv) Using the plot () function, plot the sample profile for this data set, with solid
points at each value, a dashed line connecting the points and sensibly named axes
and title;
Transcribed Image Text:Question 2. (To be done using 'R'). For this question, you will have to work with Pima Indians Diabetes Database data set in R, named 'PimalndiansDiabetes' from 'mlbench' library. The complete data set can be seen by simply typing PimaIndiansDiabetes into the R console, however, for the sake of this question, we will be working with subsets of this data frame. As part of your solutions for this question, you will have to print screen or save some output, so I suggest to submit the entirety of this question as a Word or LaTeX document (use \begin{verbatim} Your R output \end{verbatim})separate to the hand- written document you scan and submit for the previous questions. a) The first step in this question, is to remove the columns that we will not be using in the analysis from the original PimalndiansDiabetes data set, including the 'diabetes' variable which contains the outcome (positive/negative), so that all 768 observations in the sample can be thought as being from the same population, i.e. assume we did not know they were separated into diabetes outcomes. To do this, we will create a new data frame, called sample.data which consists of all rows of the the PimalndiansDiabetes data frame but only the columns from 1 to 4, using the following code: > library (mlbench) > data("PimaIndiansDiabetes") > sample.data<- data.frame (PimaIndiansDiabetes[,1:4]) i) For this new data set, we will assume that the observation vectors are from a multi- variate normal distribution. By creating and analysing Q-Q plots, conclude if this is a valid assumption and give reasons for your conclusion? ii) Calculate the sample mean vector j and sample covariance matrix S for this data; iii) Regardless of your conclusion in part i), we continue with the assumption of mul- tivariate normality. Using the HotellingsT2() function, print and analyse the output to test the null hypothesis Ho : ji = jño, where () 4 120 jio = 70 20 [Note that you will have to install the packages "mvtnorm" and "ICSNP" to use the HotellingsT2 () function] iv) Using the plot () function, plot the sample profile for this data set, with solid points at each value, a dashed line connecting the points and sensibly named axes and title;
Expert Solution
steps

Step by step

Solved in 2 steps with 4 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY