Database System Concepts
7th Edition
ISBN: 9780078022159
Author: Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher: McGraw-Hill Education
expand_more
expand_more
format_list_bulleted
Concept explainers
Question
Assume the following simple regression model,
Y = β0 + β1X + ϵ
ϵ ∼ N(0, σ^2 )
Now run the following R-code to generate values of σ^2 = sig2, β1 = beta1 and β0 = beta0. Simulate the parameters using the following codes:
Code:
# Simulation ##
set.seed("12345")
beta0 <- rnorm(1, mean = 0, sd = 1) ## The true beta0
beta1 <- runif(n = 1, min = 1, max = 3) ## The true beta1
sig2 <- rchisq(n = 1, df = 25) ## The true value of the error variance sigmaˆ2
## Multiple simulation will require loops ##
nsample <- 10 ## Sample size n.sim <- 100 ## The number of simulations
sigX <- 0.2 ## The variances of X #
# Simulate the predictor variable ##
X <- rnorm(nsample, mean = 0, sd = sqrt(sigX))
Q1
- Fix the sample size nsample = 10 . Here, the values of X are fixed. You just need to generate ϵ and Y . Execute 100 simulations (i.e., n.sim = 100). For each simulation, estimate the regression coefficients (β0, β1) and the error variance (σ 2 ). Calculate the mean of the estimates from the different simulations. What did you expect the mean to be?
- Plot the histogram of each of the regression parameter estimates from (b). Explain the pattern of the distributions.
- Obtain the variance of the regression parameter estimator (i.e., βˆ 0 and βˆ 1) from the simulations. That is, calculate the sample variances of the regression parameter estimates from the 100 simulations. Is this variance approximately equal to the true variances of the regression parameter estimates?
- Construct the 95% t and z confidence intervals for β0 and β1 during every simulation. What is the proportion of the intervals for each method containing the true value of the parameters? Is this consistent with the definition of confidence interval? Next, what differences do you observe in the t and z confidence intervals? What effect does increasing the number of simulations from 100 have on the confidence intervals?
- For steps (a)-(d) the sample size was fixed at 10. Start increasing the sample size (e.g., 20, 50, 100) and run steps (a)-(d). Explain what happens to the mean, variance and distribution of the estimators as the sample size increases.
- Choose the largest sample size you have used in step (f). Fix the sample size to that and start changing the error variance (sig2). You can increase and decrease the value of the error variance. For each value of error variance execute steps (a) - (d). Explain what happens to the mean, variance and distribution of the estimates as the error variance changes.
Expert Solution
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by stepSolved in 2 steps
Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Similar questions
- 7 Don't copy paste ans pleasearrow_forwardBY USING PYTHONarrow_forwardRoot Mean Square Error (RMSE) is a standard way to measure the error of a model in predicting quantitative data. Formally, it is defined as follows: RMSE = The RMSE is calculated using two 1D arrays. The first array y contains the true result (reference) for each sample and the second array contains the model prediction p for the same sample. The error is calculated by subtracting each element in p from its analogous element in y and square the answer as described in the equation above. When the program starts, the user is asked to enter the name of the file that contains the data. The file name is guaranteed not to exceed 20 characters. Then you should open the supplied file to read the data and calculate the RMSE. The data in the file is organized as follows: • The first line contains the size of the 1D array (i.e. denoted as N in the equation) • The second line contains the elements of y array followed by the content of the p array. If N is less than or equals zero or not an integer…arrow_forward
- Make sure to show it as steps on a python with the final answer Fit a decision tree model using the training dataset (`x_train` and `y_train`)Create a variable named `y_pred`. Make predictions using the `x_test` variable and save these predictions to the y_pred variable Create a variable called `dt_accuracy`. Compute the accuracy rate of the logistic regression model using the `y_pred` and `y_test` and assign it to the `dt_accuracy` variablearrow_forwardPlease provide steps to answer this question: Your task is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck. Implement and train a classification model for the Titanic dataset (the dataset can be found here: https://www.kaggle.com/c/titanic). Please ignore the test set (i.e., test.csv) and consider the given train set (i.e., train.csv) as the dataset. What you need to do: 1. Data cleansing 2. Split the dataset (i.e., train.csv) into a training set (80% samples) and a testing set (20% samples) 3. Train your model (see details below) 4. Report the overall classification accuracies on the training and testing sets 5. Report the precision, recall, and F-measure scores on the testing setarrow_forwardAn INFO 5880 student is interested in predicting the city miles per gallon (MPG) rating of vehicles. He eventually came up with a regression model for predicting MPG based on horsepower based on a sample of 110 compact cars. Minitab regression output for this model is shown below. Use this output to answer the questions that follow. City MPG = 30.74 – 0.04162 Horsepower One of the vehicles in the sample has 255 horsepower and is rated at 17 MPG. For this vehicle, the residual is _____________arrow_forward
- Just answer LETTERS D,E,F dont forget to answer this part too!! ?x = ? =arrow_forwardData Science How do you find line of best fit for a 3d model in Python? I have a dataframe with columns x, y, and z. The values create a polynomial scatterplot. I used polynomialfeatures and fit to transform the data. I also did linear regression on the values per teachers instructions. I am not sure if it is correct since there are three axes instead of planar. When I try to make z-test data from x-test and y-test, I can't get the code to work. I am not sure what I am doing wrong. I really need some direction. # Plot Curve Fitx_test = np.linspace(-21,21,1000)y_test = x_test z_test= model.predict(x_test.reshape(-1,1))arrow_forwardPlease answer this question fastarrow_forward
- You build a model predicting blood pressure as a function of three variables: weight (numeric) age (numeric) income (categorical: low, medium, high) You first specify your model as: blood pressure ~ age * income + weight How many parameters (k) does your model have? (Remember, we do not count the grand mean in k) You change the above model specification to be: blood pressure ~ age * income + weight * income How many parameters does your model have now? You change your model to include the three-way interaction (which, remember, includes all two-way interactions and main effects, too!) Your model now looks like this: blood pressure ~ age * income * weight How many parameters does your model have now?arrow_forwardTask 4: Given the data set with two dimensions X and Y: Calculate every step and not using libraries X Y HE 1435 4232 Use a linear regression method to calculate the parameters a and ß where y = a + Bx. (Show every step and not using libraries)arrow_forwardWe use the Breast Cancer Wisconsin dataset from UCI machine learning repository: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 Data File: breast-cancer-wisconsin.data (class: 2 for benign, 4 for malignant) Data Metafile: breast-cancer-wisconsin.names Please implement this algorithm for logistic regression (i.e., to minimize the cross-entropy loss as discussed in class), and run it over the Breast Cancer Wisconsin dataset. Please randomly sample 80% of the training instances to train a classifier and then testing it on the remaining 20%. Ten such random data splits should be performed and the average over these 10 trials is used to estimate the generalization performance. You are expected to do the implementation all by yourself so you will gain a better understanding of the method. Please submit: (1) your source code (or Jupyter notebook file) that TA should be able to (compile and) run, and the…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education