
Database System Concepts
7th Edition
ISBN: 9780078022159
Author: Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher: McGraw-Hill Education
expand_more
expand_more
format_list_bulleted
Concept explainers
Question
Assume the following simple regression model,
Y = β0 + β1X + ϵ
ϵ ∼ N(0, σ^2 )
Now run the following R-code to generate values of σ^2 = sig2, β1 = beta1 and β0 = beta0. Simulate the parameters using the following codes:
Code:
# Simulation ##
set.seed("12345")
beta0 <- rnorm(1, mean = 0, sd = 1) ## The true beta0
beta1 <- runif(n = 1, min = 1, max = 3) ## The true beta1
sig2 <- rchisq(n = 1, df = 25) ## The true value of the error variance sigmaˆ2
## Multiple simulation will require loops ##
nsample <- 10 ## Sample size n.sim <- 100 ## The number of simulations
sigX <- 0.2 ## The variances of X #
# Simulate the predictor variable ##
X <- rnorm(nsample, mean = 0, sd = sqrt(sigX))
Q1
- Fix the sample size nsample = 10 . Here, the values of X are fixed. You just need to generate ϵ and Y . Execute 100 simulations (i.e., n.sim = 100). For each simulation, estimate the regression coefficients (β0, β1) and the error variance (σ 2 ). Calculate the mean of the estimates from the different simulations. What did you expect the mean to be?
- Plot the histogram of each of the regression parameter estimates from (b). Explain the pattern of the distributions.
- Obtain the variance of the regression parameter estimator (i.e., βˆ 0 and βˆ 1) from the simulations. That is, calculate the sample variances of the regression parameter estimates from the 100 simulations. Is this variance approximately equal to the true variances of the regression parameter estimates?
- Construct the 95% t and z confidence intervals for β0 and β1 during every simulation. What is the proportion of the intervals for each method containing the true value of the parameters? Is this consistent with the definition of confidence interval? Next, what differences do you observe in the t and z confidence intervals? What effect does increasing the number of simulations from 100 have on the confidence intervals?
- For steps (a)-(d) the sample size was fixed at 10. Start increasing the sample size (e.g., 20, 50, 100) and run steps (a)-(d). Explain what happens to the mean, variance and distribution of the estimators as the sample size increases.
- Choose the largest sample size you have used in step (f). Fix the sample size to that and start changing the error variance (sig2). You can increase and decrease the value of the error variance. For each value of error variance execute steps (a) - (d). Explain what happens to the mean, variance and distribution of the estimates as the error variance changes.
Expert Solution

This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by stepSolved in 2 steps

Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Similar questions
- Make sure to show it as steps on a python with the final answer Fit a decision tree model using the training dataset (`x_train` and `y_train`)Create a variable named `y_pred`. Make predictions using the `x_test` variable and save these predictions to the y_pred variable Create a variable called `dt_accuracy`. Compute the accuracy rate of the logistic regression model using the `y_pred` and `y_test` and assign it to the `dt_accuracy` variablearrow_forwardPlease provide steps to answer this question: Your task is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck. Implement and train a classification model for the Titanic dataset (the dataset can be found here: https://www.kaggle.com/c/titanic). Please ignore the test set (i.e., test.csv) and consider the given train set (i.e., train.csv) as the dataset. What you need to do: 1. Data cleansing 2. Split the dataset (i.e., train.csv) into a training set (80% samples) and a testing set (20% samples) 3. Train your model (see details below) 4. Report the overall classification accuracies on the training and testing sets 5. Report the precision, recall, and F-measure scores on the testing setarrow_forwardAn INFO 5880 student is interested in predicting the city miles per gallon (MPG) rating of vehicles. He eventually came up with a regression model for predicting MPG based on horsepower based on a sample of 110 compact cars. Minitab regression output for this model is shown below. Use this output to answer the questions that follow. City MPG = 30.74 – 0.04162 Horsepower One of the vehicles in the sample has 255 horsepower and is rated at 17 MPG. For this vehicle, the residual is _____________arrow_forward
- You build a model predicting blood pressure as a function of three variables: weight (numeric) age (numeric) income (categorical: low, medium, high) You first specify your model as: blood pressure ~ age * income + weight How many parameters (k) does your model have? (Remember, we do not count the grand mean in k) You change the above model specification to be: blood pressure ~ age * income + weight * income How many parameters does your model have now? You change your model to include the three-way interaction (which, remember, includes all two-way interactions and main effects, too!) Your model now looks like this: blood pressure ~ age * income * weight How many parameters does your model have now?arrow_forwardTask 4: Given the data set with two dimensions X and Y: Calculate every step and not using libraries X Y HE 1435 4232 Use a linear regression method to calculate the parameters a and ß where y = a + Bx. (Show every step and not using libraries)arrow_forwardWe use the Breast Cancer Wisconsin dataset from UCI machine learning repository: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 Data File: breast-cancer-wisconsin.data (class: 2 for benign, 4 for malignant) Data Metafile: breast-cancer-wisconsin.names Please implement this algorithm for logistic regression (i.e., to minimize the cross-entropy loss as discussed in class), and run it over the Breast Cancer Wisconsin dataset. Please randomly sample 80% of the training instances to train a classifier and then testing it on the remaining 20%. Ten such random data splits should be performed and the average over these 10 trials is used to estimate the generalization performance. You are expected to do the implementation all by yourself so you will gain a better understanding of the method. Please submit: (1) your source code (or Jupyter notebook file) that TA should be able to (compile and) run, and the…arrow_forward
- The consistency of a set of formulas can be defined as: Select one or more: a. There is at least one model for the set. b. There is at least one interpretation on which all the formulas are true. c. There is an interpretation yielding a truth value for all of the formulas in the set. d. There is an interpretation on which some of the formulas are true. e. All of the formulas can be false at the same time.arrow_forwardWrite short notes on: Lasso Regressionarrow_forwardWrite a scikit-learn based application to predict the secondary school student performance using a logistic regression model. The dataset is present in file student.cleaned.data.csv. The features to be taken into account are traveltime, studytime, failures, famrel, freetime, gout, health. The target should be G3. In G3 column, assume the values less than 10 to be 0, and the values equal to or more than 10 to be 1. Evaluate the accuracy of the model.arrow_forward
- Add the missing pieces from checkpoint B while using this codearrow_forwardExercise 12: Suppose we would like to use scikit-learn to solve a multiple linear regression problem using ₁ regularization. Which of the following is a possible option to use? A) "sklearn.linear_model.Logistic Regression" class with default choices of param- eters B) "sklearn.linear_model.Logistic Regression" class by changing the default choice of "penalty" and "solver" parameters to "11" and "liblinear", respectively. C) "sklearn.linear_model.Lasso" class with default choices of parameters D) "sklearn.linear_model. Ridge" class with default choices of parametersarrow_forwardPlease implement Multinomial Logistic Regression on the following data. Please continue from the given code:arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education

Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education

Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON

Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON

C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON

Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning

Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education