Question
Normalize the numeric predictors using range normalization in the range of -0.5 to .5 in R.
Create dummy variables for the categorical variables that can be used in models that require numerical predictors. The dimension of the data.frame at this point should be (5110 17).
Create a random sample of 10 observations that oversamples rows that are positive for stroke with a probability of 95%.
SAVE
AI-Generated Solution
info
AI-generated content may present inaccurate or offensive content that does not represent bartleby’s views.
Unlock instant AI solutions
Tap the button
to generate a solution
to generate a solution
Click the button to generate
a solution
a solution
Knowledge Booster
Similar questions
- You decide to run a simpler model to predict churn, using only the variables tenure (in months) and TotalCharges (in US$). The output is given below. The AIC of this model is 4727.6 (in contrast to the AIC of 4240 for the full model). On the basis of this which model would be expected to give superior predictive performance? Actual ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 2.471e-01 5.360e-02 4.611 4.01e-06 *** ## tenure < 2e-16 *** -1.124e-01 5.816e-03 -19.334 ## TotalCharges 8.236e-04 5.618e-05 14.660 < 2e-16 *** ## No --- ## Signif. codes: 0 ## Yes Yes ## Null deviance: 5701.5 on 4921 ## Residual deviance: 4721.6 on 4919 ## AIC: 4727.6 515 345 ## (Dispersion parameter for binomial family taken to be 1) ## Predicted ***** No 795 3267 0.001 Confusion Matrix (Training) **** Actual 0.01 Yes No degrees of freedom degrees of freedom Yes The simpler model (with just tenure and TotalCharges) The full model (with all variables) 0.05 0.1 220 145 Predicted No 339…arrow_forwardIn the simple linear regression equation ŷ = bo + b₁x, how is b₁ interpreted? it is the change in that occurs with a one-unit change in y O It is the estimated value of ŷ when x = 0 O It is the change in ŷ that occurs when bo increases O it is the change in ŷ that occurs with a one-unit change inarrow_forwardQuestion 48. Let us return to the Titanic data set. We now have learned several models and want to choose the best one. We used three different methods to validate these models: The training error rate (apparent error rate), the error rate on an external test set and the error rate estimated by a 10-fold cross validation. Training Error | Error on the test set | Cross Validation Error 0.18 Learner Decision Tree 0.22 0.21 Random Forest 0.01 0.10 0.12 1-Nearest-Neighbour 0.18 0.19 Which of the following statements are correct? a) 1-Nearest-Neighbour has a perfect training error and hence it should be used here. b) Random Forests outperforms both 1-Nearest-Neighbour and the Decision Tree in terms of prediction error. c) Not just in this case, but in general, Cross Validation is the better validation strategy and should always be preferred over the error on a single test set. d) Not just in this case, but in general, Decision Trees always perform worse than Random Forests.arrow_forward
- Assume the following simple regression model, Y = β0 + β1X + ϵ ϵ ∼ N(0, σ^2 ) Now run the following R-code to generate values of σ^2 = sig2, β1 = beta1 and β0 = beta0. Simulate the parameters using the following codes: Code: # Simulation ## set.seed("12345") beta0 <- rnorm(1, mean = 0, sd = 1) ## The true beta0 beta1 <- runif(n = 1, min = 1, max = 3) ## The true beta1 sig2 <- rchisq(n = 1, df = 25) ## The true value of the error variance sigmaˆ2 ## Multiple simulation will require loops ## nsample <- 10 ## Sample size n.sim <- 100 ## The number of simulations sigX <- 0.2 ## The variances of X # # Simulate the predictor variable ## X <- rnorm(nsample, mean = 0, sd = sqrt(sigX)) Q1 Fix the sample size nsample = 10 . Here, the values of X are fixed. You just need to generate ϵ and Y . Execute 100 simulations (i.e., n.sim = 100). For each simulation, estimate the regression coefficients (β0, β1) and the error variance (σ 2 ). Calculate the mean of…arrow_forward2. Take a bivariate normal distribution with two random variables X and Y, with mean value = (1, -1), var(X) = 3, var(Y) = 6, and cor(X,Y) = -0.5. %3! (a) create a contour plot for this data (b) plot 1,000 simulations of this distribution (c) Using 1,000,000 simulations, find (1) the expected value of Y (ii) the expected value of Y, given that X> 2 (ii) the expected value of Y, given that X = 2arrow_forwardgive the steps by steps answerarrow_forward
- 2. Can you design a binary classification experiment with 100 total population (TP+TN+FP+ FN), with precision (TP/(TP+FP)) of 1/2, with sensitivity (TP/(TP+FN)) of 2/3, and specificity (TN/(FP+TN)) of 3/5? (Please consider the population to consist of 100 individuals.)arrow_forwardDraw a QQ (quantile-quantile) plot for the built-in data set, islands, to assess the normality of the observations. Is the data set well-modeled by a normal distribution?arrow_forward
arrow_back_ios
arrow_forward_ios