ass_4_solution

pdf

School

McMaster University *

*We aren’t endorsed by this school

Course

2B03

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

10

Report

Uploaded by DoctorFlamingoPerson503

2B03 Assignment 4 Statistical Inference (Chapters 7, 8, & 9) Angelo Farruggia 400368065 2023-11-16 Instructions: You are to use Quarto Markdown for generating your assignment output file. You begin with the Quarto Markdown script downloaded from A2L, and need to pay attention to information provided via introductory material posted to A2L on working with R and Quarto Markdown. Having added your answers to the Quarto Markdown script, you then are to generate your output file using the “Render” button in the RStudio IDE and, when complete, upload both your Quarto Markdown file and your PDF file to the appropriate folder on A2L. 1. Define the following terms in a sentence (or short paragraph) and state a formula if appropriate (this question is worth 5 marks). a. Type II Error: In statistical hypothesis testing, a Type II error, also known as a false negative, is the error that occurs when one fails to reject a null hypothesis that is actually false. b. Power of a Test: The power of a statistical test is the probability that the test will correctly reject a null hypothesis that is actually false. The power of a test is inversely related to the probability of making a Type II error: power = 𝑃( reject 𝐻 0 |𝐻 1 is true ) = 1 − 𝛽 , where 𝛽 is the probability of a Type II error. c. Goodness of Fit Test: A goodness-of-fit test is a statistical test used to determine whether a set of observed values match those expected under the applicable model. It is commonly used when analyzing categorical data to determine if the observed data follows a specified probability distribution. d. 𝑃 -value: In statistical hypothesis testing, a 𝑝 -value is the probability of obtaining results as or more extreme than the ones observed if the null hypothesis is actually true. A small 𝑝 -value (typically below a predetermined significance level 𝛼 ) suggests that the observed data is unlikely to have occurred by random chance alone, leading DESKTOP-UESQ5Q8, x86-64, Vistauser 1
to the rejection of the null hypothesis. A large 𝑝 -value implies that the observed data is consistent with the null hypothesis. e. Simple Regression Analysis: Simple Regression Analysis is a statistical method used to explore and quantify the relationship between a single independent variable ( ? ) and a single dependent variable ( ? ) based on observations that have been carried out in the past. The simple linear regression model can be expressed by the equation ? = 𝛽 0 + 𝛽 1 ? + 𝜖 where ? is the dependent variable, ? is the independent variable, and 𝜖 is the error term representing unobserved factors affecting ? . The regression analysis aims to estimate the values of the coeffcients 𝛽 0 and 𝛽 1 based on the given data. 2. A coin operated coffee machine is set to pour 8 oz per cup. A random sample of the weights of a number of cups is as follows: 8.40, 8.25, 8.05, 7.84, 7.36, 8.54, 7.56, 7.56, 8.02, 7.39, 8.34, 8.56. Test the hypothesis that the machine is delivering at the level set by the manufacturer. Use a 0.01 level of significance (this question is worth 2 marks). Let 𝜇 be the population mean weight (in oz) of coffee per cup. 𝐻 0 ∶ 𝜇 = 8 𝐻 1 ∶ 𝜇 ≠ 8 coffee <- c ( 8.40 , 8.25 , 8.05 , 7.84 , 7.36 , 8.54 , 7.56 , 7.56 , 8.02 , 7.39 , 8.34 , 8.56 ) qqnorm (coffee) qqline (coffee) 2
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 7.4 7.8 8.2 8.6 Normal Q-Q Plot Theoretical Quantiles Sample Quantiles The points in the Normal Q-Q plot approximately follow the straight line, indicating that the data is approximately normally distributed and thus the one-sample t-test is appropriate. t.test (coffee, mu = 8 , conf.level = 0.99 ) One Sample t-test data: coffee t = -0.085097, df = 11, p-value = 0.9337 alternative hypothesis: true mean is not equal to 8 99 percent confidence interval: 7.593780 8.384554 sample estimates: mean of x 7.989167 Using the one-sample t-test, we cannot reject the null hypothesis at 1% level of significance, ?(11) = −0.085 , 𝑝 = .934 . The amount of coffee that the machine is delivering is not statistically significantly different from the level set by the manufacturer of 8 oz per cup. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. Two different brands of milk are randomly sampled, and the fat content in each bottle of milk is determined. Twenty-six bottles of Brand A milk yielded an average fat content of ̄ ? 1 = 25 grams with ? 2 1 = 4 , and thirty one bottles of Brand B yielded an average fat content of ̄ ? 2 = 25.8 grams with ? 2 2 = 7 (this question is worth 3 marks). Test the hypothesis that both brands have identical average fat content at the 5% level of significance. Let 𝜇 1 be the population mean fat content of Brand A milk and let 𝜇 2 be the population mean fat content of Brand B milk. 𝐻 0 ∶ 𝜇 1 = 𝜇 2 𝐻 1 ∶ 𝜇 1 ≠ 𝜇 2 ̄ ? 1 = 25 , ? 2 1 = 4 , 𝑛 1 = 26 ̄ ? 2 = 25.8 , ? 2 2 = 7 , 𝑛 2 = 31 ? 2 𝑝 = (𝑛 1 −1)𝑠 2 1 +(𝑛 2 −1)𝑠 2 2 𝑛 1 +𝑛 2 −2 = (26−1)×4+(31−1)×7 26+31−2 = 25×4+30×7 26+31−2 = 310 55 ≈ 5.64 ? 𝑝 = √ ? 2 𝑝 = 5.64 ≈ 2.37 ? = ̄ 𝑋 1 ̄ 𝑋 2 𝑠 𝑝 1/𝑛 1 +1/𝑛 2 = 25−25.8 2.37×√ 1/26+1/31 ≈ −1.27 ?? = 𝑛 1 + 𝑛 2 − 2 = 26 + 31 − 2 = 55 # p-value pt ( - 1.27 , df = 55 ) * 2 [1] 0.2094315 Using the two-sample t-test, we cannot reject the null hypothesis at 5% level of significance, ?(55) = −1.27 , 𝑝 = .209 . The fat content does not statistically significantly differ between the two brands of milk. 4. To compare two programs for training industrial workers to perform a skilled job, 20 workers are included in an experiment. Of these, 10 are selected at random and trained by method 1; the remaining 10 are trained by method 2. After completion of training, all the workers are subjected to a time-and-motion test that records the speed of performance of a skilled job. The following time, as measured in minutes, is obtained. Method Method 1 15 20 11 23 16 21 18 16 27 24 Method 2 23 31 13 19 23 17 28 26 25 28 4
Test the hypothesis that the mean job time is equal before and after training with method 1 and 2 versus the alternative that it is significantly less after training with method 1 than after training with method 2. Use a signficance level of 𝛼 = 0.05 (this question is worth 4 marks). Let 𝜇 1 be the population mean job time after training with method 1 and let 𝜇 2 be the population mean job time after training with method 2. 𝐻 0 ∶ 𝜇 1 = 𝜇 2 𝐻 1 ∶ 𝜇 1 < 𝜇 2 method1 <- c ( 15 , 20 , 11 , 23 , 16 , 21 , 18 , 16 , 27 , 24 ) method2 <- c ( 23 , 31 , 13 , 19 , 23 , 17 , 28 , 26 , 25 , 28 ) par ( mfrow = c ( 1 , 2 )) qqnorm (method1, main = "Normal Q-Q Plot: M1" ) qqline (method1) qqnorm (method2, main = "Normal Q-Q Plot: M2" ) qqline (method2) -1.5 0.0 1.0 15 20 25 Normal Q-Q Plot: M1 Theoretical Quantiles Sample Quantiles -1.5 0.0 1.0 15 20 25 30 Normal Q-Q Plot: M2 Theoretical Quantiles Sample Quantiles 5
par ( mfrow = c ( 1 , 1 )) var.test (method1, method2) F test to compare two variances data: method1 and method2 F = 0.75117, num df = 9, denom df = 9, p-value = 0.6769 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1865797 3.0242006 sample estimates: ratio of variances 0.7511686 The points in the Normal Q-Q plot for each method approximately follow the straight line, indicating that the data in both groups is approximately normally distributed and thus the two- sample t-test is appropriate. In addition, the variances of the two groups are not statistically significantly different ( 𝐹(9, 9) = 0.75 , 𝑝 = .677 ), so the assumption of equal variances is met. t.test (method1, method2, alternative = "less" , var.equal = TRUE ) Two Sample t-test data: method1 and method2 t = -1.8055, df = 18, p-value = 0.04387 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -0.1662568 sample estimates: mean of x mean of y 19.1 23.3 Using the two-sample t-test, we reject the null hypothesis at 5% level of significance, ?(18) = −1.81 , 𝑝 = .044 . The mean job time is statistically significantly less after training with method 1 ( 𝑀 = 19.1 ) than after training with method 2 ( 𝑀 = 23.3 ). 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5. A Canadian-wide marketing survey found that only one-fifth of Canadians drink beer on a regular basis. A random sample of 36 residents in North York found nine who were regular beer drinkers. Test whether or not North York has a greater then the national proportion of beer drinkers (i.e. test 𝐻 0 ∶ 𝜋 = 0.2 versus 𝐻 1 ∶ 𝜋 > 0.2 - this question is worth 3 marks). 𝐻 0 ∶ 𝜋 = 0.2 𝐻 1 ∶ 𝜋 > 0.2 𝑝 0 = 1/5 = 0.2 𝑛 = 36 , 𝑥 = 9 , ̂ 𝑝 = 𝑥/𝑛 = 9/36 = 0.25 𝑧 = ̂𝑝−𝑝 0 𝑝 0 (1−𝑝 0 )/𝑛 = 0.25−0.2 0.2×(1−0.2)/36 = 0.75 # p-value pnorm ( 0.75 , lower.tail = FALSE ) [1] 0.2266274 Using the one-sample proportion test, we cannot reject the null hypothesis at 5% level of significance, 𝑧 = 0.75 , 𝑝 = .227 . North York does not have a statistically significantly greater than the national proportion of beer drinkers. 6. A firm draws a random sample of 18 ball bearings from the day’s output. The sample variance of their diameters is ? 2 = 0.009 inches (this question is worth 4 marks). a. Construct a 95% confidence interval for the population variance. n <- 18 s2 <- 0.009 alpha <- 1 - 0.95 a <- qchisq (alpha / 2 , df = n -1 ) b <- qchisq (alpha / 2 , df = n -1 , lower.tail = FALSE ) ci_lwr <- (n - 1 ) * s2 / b ci_upr <- (n - 1 ) * s2 / a ci_lwr; ci_upr [1] 0.005067734 [1] 0.02022689 7
The 95% confidence interval for the population variance is (0.0051, 0.0202). b. Construct a 90% confidence interval for the population variance. n <- 18 s2 <- 0.009 alpha <- 1 - 0.90 a <- qchisq (alpha / 2 , df = n -1 ) b <- qchisq (alpha / 2 , df = n -1 , lower.tail = FALSE ) ci_lwr <- (n - 1 ) * s2 / b ci_upr <- (n - 1 ) * s2 / a ci_lwr; ci_upr [1] 0.005546068 [1] 0.01764348 The 90% confidence interval for the population variance is (0.0055, 0.0176). c. What assumptions underlie the answers in the first two parts of this question. The assumption that the diameters of the ball bearings are independent and normally dis- tributed. 7. In an agricultural experiment to determine the effects of a particular insecticide, a field was planted with corn. Half the plants were sprayed with the insecticide, and half were unsprayed. Several weeks later, independent random samples of 200 sprayed plants and 200 unsprayed plants were examined. The number of healthy plants in each sample was as follows (this question is worth 4 marks). Status Sprayed Unsprayed Healthy 131 111 Unhealthy 69 89 If the significance level is set at 𝛼 = 0.05 , does the evidence indicate that a higher proportion of sprayed than of unsprayed plants were healthy? Use a one tailed ? test for equality of population proprtions (note - since the null is that the proportions are equal, use this information to construct a pooled estimate of the proportion). 8
Let 𝜋 1 be the population proportion of sprayed plants that are healthy and let 𝜋 2 be the population proportion of unsprayed plants that are healthy. 𝐻 0 ∶ 𝜋 1 = 𝜋 2 𝐻 1 ∶ 𝜋 1 > 𝜋 2 𝑛 1 = 200 , 𝑥 1 = 131 , ̂ 𝑝 1 = 𝑥 1 /𝑛 1 = 131/200 = 0.655 𝑛 2 = 200 , 𝑥 2 = 111 , ̂ 𝑝 2 = 𝑥 2 /𝑛 2 = 111/200 = 0.555 𝑝 0 = (𝑥 1 + 𝑥 2 )/(𝑛 1 + 𝑛 2 ) = (131 + 111)/(200 + 200) = 0.605 𝑧 = ̂ 𝑝 1 ̂ 𝑝 2 𝑝 0 (1−𝑝 0 )(1/𝑛 1 +1/𝑛 2 ) = 0.655−0.555 0.605×(1−0.605)(1/200+1/200) ≈ 2.0456 # p-value pnorm ( 2.0456 , lower.tail = FALSE ) [1] 0.02039787 Using the one-sided two-sample proportion test, we reject the null hypothesis at 5% level of significance, 𝑧 = 2.05 , 𝑝 = .0204 . A statistically significantly higher proportion of sprayed than of unsprayed plants were healthy. 8. The success of a federally funded, locally administered manpower program was measured by the proportion of clients who moved from subsidized employment into unsubsidized (private sector) employment and remained there for a certain length of time. A random sample of 𝑛 = 376 clients of the program produced the following results (this question is worth 4 marks). Education Success Failure 8 years or less 13 19 9 to 11 years 76 45 12 years 107 65 13 years or more 32 19 a. Estimate the marginal probabilities of success and failure. 𝑃(𝑆??????) = (13 + 76 + 107 + 32)/376 = 228/376 = 0.606383 𝑃(𝐹?𝑖𝑙???) = (19 + 45 + 65 + 19)/376 = 148/376 = 0.393617 The marginal probabilities of success and failure are about 0.6064 and 0.3936, respectively. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
b. Test the hypothesis that the program outcomes are independent of educational level using 𝐻 0 program outcomes and educational level are independent; 𝐻 1 program outcomes and educational level are not independent. tbl <- matrix ( c ( 13 , 76 , 107 , 32 , 19 , 45 , 65 , 19 ), ncol = 2 , byrow = FALSE , dimnames = list ( c ( "<= 8 years" , "9-11 years" , "12 years" , ">= 13 years" ), c ( "Success" , "Failure" ))) tbl Success Failure <= 8 years 13 19 9-11 years 76 45 12 years 107 65 >= 13 years 32 19 chisq.test (tbl) Pearson's Chi-squared test data: tbl X-squared = 5.8817, df = 3, p-value = 0.1175 Using 𝜒 2 test of independence, we cannot reject the null hypothesis at 5% level o significance, 𝜒 2 (3) = 5.88 , 𝑝 = .1175 . Program outcomes are statistically significantly independent of educational level. 10