L13-14-15 Ch9 R (sta305-class18-26Nov2019)

.pdf

School

University of Toronto *

*We aren’t endorsed by this school

Course

305

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

42

Uploaded by BarristerEagle1470

Report
Week 8 - Lecture 13-14 , 27 & 29 October 2021 Acknowledgment " This document has been prepared by Professor Nathan Taback. I am grateful to Professor Nathan Taback for providing me this document for presentation and discussion in the class of STA305 Fall 2021. Murari Singh " Go to page 16 : Blocking ... Also: R-for-NT-Chapter-9.html and run ***.Rmd
STA305/1004-Class 17 Nov. 26, 2019
Today’s Class I Sample size for ANOVA I Randomized block designs I Linear model and ANOVA I Assumptions I Other Blocking Designs I Latin Square I Graeco Latin Square I hypo-Graeco Latin Square I Randomized incomplete block design
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Sample size for ANOVA - Designing a study to compare more than two treatments I Consider the hypothesis that k means are equal vs. the alternative that at least two differ. I What is the probability that the test rejects if at least two means differ? I Power = 1 - P ( Type II error ) is this probability.
Sample size for ANOVA - Designing a study to compare more than two treatments The null and alternative hypotheses are: H 0 : μ 1 = μ 2 = · · · = μ k v s . H 1 : μ i = μ j . The test rejects at level α if MS Treat / MS E F k - 1 , N - K . The power of the test is 1 - β = P ( MS Treat / MS E F k - 1 , N - K ) , when H 0 is false.
Sample size for ANOVA - Designing a study to compare more than two treatments I When H 0 is false it can be shown that: I MS Treat 2 has a non-central Chi-square distribution with k - 1 degrees of freedom and non-centrality parameter δ . I MS Treat / MS E has a non-central F distribution with the numerator and denominator degrees of freedom k - 1 and N - k respectively, and non-centrality parameter I δ = k i = 1 n i ( μ i - ¯ μ ) 2 σ 2 , where n i is the number of observations in group i , ¯ μ = k i = 1 μ i / k , and σ 2 is the within group error variance . I This is dentoted by F k - 1 , N - k ( δ ) .
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Direct calculation of Power I The power of the test is P ( F k - 1 , N - k ( δ ) > F k - 1 , N - K ) . I The power is an increasing function δ I The power depends on the true values of the treatment means μ i , the error variance σ 2 , and sample size n i . I If the experimentor has some prior idea about the treament means and error variance, and the sample size (number of replications) the formula above will calculate the power of the test. The treatment means can be obtained from the table below.
Blood coagulation example - sample size Suppose that an investigator would like to replicate the blood coagulation study with only 3 animals per diet. In this case k = 4 , n i = 3 . The treatment means from the initial study are: Diet A B C D Average 61 66 68 61 lm.diets <- lm (y ~ diets, data = tab0401) anova (lm.diets) ## Analysis of Variance Table ## ## Response: y ## Df Sum Sq Mean Sq F value Pr(>F) ## diets 3 228 76.0 13.571 4.658e-05 *** ## Residuals 20 112 5.6 ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Blood coagulation example - sample size I μ 1 = 61, μ 2 = 66, μ 3 = 68, μ 4 = 61. I The error variance σ 2 was estimated as MS E = 5 . 6. I Assuming that the estimated values are the true values of the parameters, the non-centrality parameter of the F distribution is: I δ = 3 × ( ( 61 - 64 ) 2 + ( 66 - 64 ) 2 + ( 68 - 64 ) 2 + ( 61 - 64 ) 2 ) / 5 . 6 = 20 . 35714
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Blood coagulation example - sample size If we choose α = 0 . 05 as the significance level then F 3 , 20 , 0 . 05 = 3.0983912. The power of the test is then P ( F 3 , 20 ( 20 . 36 ) > 3 . 10 ) = 0 . 94 . This was calculated using the CDF for the F distribution in R pf() . 1 - pf ( q = 3.10 , df1 = 3 , df2 = 20 , ncp = 20.36 ) ## [1] 0.9435208
Calculating power and sample size using the pwr library There are several libraries in R which can calculate power and sample size for statistical tests. The library pwr() has a function pwr.anova.test(k = NULL, n = NULL, f = NULL, sig.level = 0.05, power = NULL) For computing power and sample size. k : Number of groups n : Number of observations (per group) f : Effect size The effect size is the square root of the non-centrality parameter of the non-central F distribution. f = k i = 1 n i ( μ i - ¯ μ ) 2 σ 2 , where n i is the number of observations in group i , ¯ μ = k i = 1 μ i / k , and σ 2 is the within group error variance.
Calculating power and sample size using the pwr library In the previous example δ = 20 . 35714 so f = 20 . 35714 = 4.5118887. library (pwr) pwr.anova.test ( k = 4 , n = 3 , f = 4.5 ) ## ## Balanced one-way analysis of variance power calculation ## ## k = 4 ## n = 3 ## f = 4.5 ## sig.level = 0.05 ## power = 1 ## ## NOTE: n is number in each group
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Calculating power and sample size using the pwr library 0 1 2 3 4 5 0.2 0.4 0.6 0.8 1.0 Power vs. Effect Size for k=4, n=3 Effect Size Power
Calculating power using simulation The general procedure for simulating power is: I Use the underlying model to generate random data with (a) specified sample sizes, (b) parameter values that one is trying to detect with the hypothesis test, and (c) nuisance parameters such as variances. I Run the estimation program (e.g., t.test() , lm() ) on these randomly generated data. I Calculate the test statistic and p-value. I Do the previous steps many times, say, N, and save the p-values. The estimated power for a level alpha test is the proportion of observations (out of N) for which the p-value is less than alpha.
Calculating power using simulation - R program #Simulate power of ANOVA for three groups NSIM <- 1000 # number of simulations res <- numeric (NSIM) # store p-values in res mu1 <- 2 ; mu2 <- 2.5 ;mu3 <- 2 # true mean values of treatment groups sigma1 <- 1 ; sigma2 <- 1 ; sigma3 <- 1 #variances in each group n1 <- 40 ; n2 <- 40 ; n3 <- 40 #sample size in each group for (i in 1 : NSIM) # do the calculations below N times { # generate sample of size n1 from N(mu1,sigma1^2) y1 <- rnorm ( n = n1, mean = mu1, sd = sigma1) # generate sample of size n2 from N(mu2,sigma2^2) y2 <- rnorm ( n = n2, mean = mu2, sd = sigma2) # generate sample of size n3 from N(mu3,sigma3^2) y3 <- rnorm ( n = n3, mean = mu3, sd = sigma3) y <- c (y1,y2,y3) # store all the values from the groups # generate the treatment assignment for each group trt <- as.factor ( c ( rep ( 1 ,n1), rep ( 2 ,n2), rep ( 3 ,n3))) m <- lm (y ~ trt) # calculate the ANOVA res[i] <- anova (m)[ 1 , 5 ] # p-value of F test } sum (res <= 0.05 ) / NSIM # calculate p-value ## [1] 0.642
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Blocking - Example: penicillin yield I In this example a process for the manufacture of penicillin was investigated and yield was primary response of interest. I There were 4 variants of the process (treatments) to be compared. I An important raw material corn steep liquor varied considerably. I It was thought that corn steep liquor might causes significant differences in yield. I Experimenters decided to study 5 blends of corn steep liquor. I Within each blend the order in which the four treatments were run was random. I Randomization done separately within each block. Within each blend the order in which the treatments were run were randomized. I In a fully randomized one-way design blend differences might not be balanced between the treatments A, B, C, D. This might increase the experimental noise. I But, by randomly assigning the order in which the four treatments were run within each blend (block), blend differences between the groups were largely eliminated.
Example: penicillin yield The results of the experiment for blend 1 run blend treatment y 1 1 A 89 3 1 B 88 2 1 C 97 4 1 D 94 The results of the experiment for blend 2 run blend treatment y 4 2 A 84 2 2 B 77 3 2 C 92 1 2 D 79 Randomization of treatments was done separately within each block.
The ANOVA identity for randomized block designs The total sum of squares can be re-expressed by adding and subtracting the treatment and block averages as: a i = 1 b j = 1 ( y ij - ¯ y ·· ) 2 = a i = 1 b j = 1 [( ¯ y i · - ¯ y ·· ) + ( ¯ y · j - ¯ y ·· ) + ( y ij - ¯ y i · - ¯ y · j + ¯ y ·· ))] 2 . After some algebra . . . SS T = a i = 1 b j = 1 ( y ij - ¯ y ·· ) 2 is equal to b a i = 1 ( ¯ y i · - ¯ y ·· ) 2 + a b j = 1 ( ¯ y · j - ¯ y ·· ) 2 + a i = 1 b j = 1 ( y ij - ¯ y i · - ¯ y · j + ¯ y ·· ) 2 So, SS T = SS Treat + SS Blocks + SS E
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Degrees of freedom I There are N observations so SS T has N - 1 degrees of freedom. I There are a treatments and b blocks so SS Treat and SS Blocks have a - 1 and b - 1 degrees of freedom, respectively. I The sum of squares on the left hand side the equation should add to the sum of squares on the right hand side of the equation. Therefore, the error sum of squares has ( N - 1 ) - ( a - 1 ) - ( b - 1 ) = ( ab - 1 ) - ( a - 1 ) - ( b - 1 ) = ( a - 1 )( b - 1 ) degrees of freedom.
Linear Model for Randomized Block Design I The linear model for the randomized block design is y ij = μ + τ i + β j + ij , where E ( ij ) = 0 . I The model is completely additive. I It assumes that there is no interaction between blocks and treatments. I An interaction could occur if an impurity in blend 3 poisoned treatment B and made it ineffective, even though it did not affect the other treatments.
Linear Model for Randomized Block Design pen.model <- lm (y ~ as.factor (treatment) + as.factor (blend), data= tab0404) anova (pen.model) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) as.factor(treatment) 3 70 23.333 1.2389 0.33866 as.factor(blend) 4 264 66.000 3.5044 0.04075 * Residuals 12 226 18.833 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Calculation of the p-value assumes that ij N ( 0 , σ 2 ) . So that MS Treat / MS E F a - 1 , ( a - 1 )( b - 1 ) , MS Blocks F b - 1 , ( a - 1 )( b - 1 ) .
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Penicillin example - interpretation I There is no evidence that the four treatments produce different yields. I How could this information be used in optimizing yield in the manufacturing process? I Is one of the treatments less expensive to run? I If one of the treatments is less expensive to run then an analysis on cost rather than yield might reveal important information. I The differences between the blocks might be informative. I In particular the investigators might speculate about why blend 1 has such a different influence on yield. I Perhaps now the experimenters should study the characteristics of the different blends of corn steep liquor. (Box, Hunter, Hunter, 2005)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Other blocking designs I Latin square I Graeco-Latin squares, I Hyper-Graeco-Latin Squares, I Balanced incomplete block designs.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The Latin Square Design I There are several other types of designs that utilize the blocking principle such as The Latin Square design. I If there is more than one nuisance source that can be eliminated then a Latin Square design might be appropriate.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Latin Square Design - Automobile Emissions I An experiment to test the feasibility of reducing air pollution. I A gasoline mixture was modified by changing the amounts of certain chemicals. I This produced four different types of gasoline: A, B, C, D I These four treatments were tested with four different drivers and four different cars.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Latin Square Design - Automobile Emissions I Two blocking factors: cars and drivers. I The Latin square design was used to help eliminate possible differences between drivers I, II, III, IV and cars 1, 2, 3, 4. I Randomly allocate treatments, drivers , and cars. Driver Car 1 Car 2 Car 3 Car 4 Driver I A B D C Driver II D C A B Driver III B D C A Driver IV C A B D
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Latin Square Design - Automobile Emissions I The data from the experiment. Driver Car 1 Car 2 Car 3 Car 4 Driver I A B D C 19 24 23 26 Driver II D C A B 23 24 19 30 Driver III B D C A 15 14 15 16 Driver IV C A B D 19 18 19 16
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Latin Square Design - Automobile Emissions I Why not standardize the conditions and make the 16 experimental runs with a single car and single driver for the four treatments? I Could also be valid but Latin square provides a wider inductive basis.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Latin Square Design - Automobile Emissions latinsq.auto <- lm (y ~ additive + as.factor (cars) + as.factor (driver), data= tab0408) anova (latinsq.auto) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) additive 3 40 13.333 2.5 0.156490 as.factor(cars) 3 24 8.000 1.5 0.307174 as.factor(driver) 3 216 72.000 13.5 0.004466 ** Residuals 6 32 5.333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 SS T = SS cars + SS drivers + SS Additives + SS E
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Latin Square Design - Automobile Emissions I Assumming that the residuals are independent and normally distributed and the null hypothesis that there are no treatment differences is true then the ratio of mean squares for treatments and residuals has an F 3 , 6 distribution. I This analysis assumes that treatments, cars, and drivers are additive. I If the design was replicated then this would increase the degrees of freedom for the residuals and reduce the mean square error.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
General Latin Square I A Latin square for p factors of a p × p Latin square, is a square containing p rows and p columns I Each of the p 2 cells contains one of the p letters that correspond to a treatment. I Each letter occurs once and only once in each row and column. I There are many possible p × p Latin squares.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
General Latin Square Which of the following is a Latin square? Col1 Col2 Col3 Row 1 B A C Row 2 A C B Row 3 C B A Col1 Col2 Col3 Row 1 A B C Row 2 C A B Row 3 B B A
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Misuse of the Latin Square I Inappropriate to use Latin square to study factors that can interact. I Effects of one factor can then be mixed up with interactions of other factors. I Outliers can occur as a result of these interactions. I When interactions between factors are likely possible need to use a factorial design.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Graeco-Latin Square A Graeco-Latin square is a k × k pattern that permits study of k treatments simultaneously with three different blocking variables each at k levels. Car 1 Car 2 Car 3 Car 4 Driver I A α B β C γ D δ Driver II B δ A γ D β C α Driver III C β D α A δ B γ Driver IV D γ C δ B α A β
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Graeco-Latin Square I This is a Latin square in which each Greek letter appears once and only once with each Latin letter. I Can be used to control three sources of extraneous variability (i.e. block in three different directions). Driver Car 1 Car 2 Car 3 Car 4 Driver I A α B β C γ D δ Driver II B δ A γ D β C α Driver III C β D α A δ B γ Driver IV D γ C δ B α A β
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Graeco-Latin Square To generate a 3 × 3 Graeco-Latin square design, superimpose two designs using the Greek letters for the second 3 × 3 Latin square. Col1 Col2 Col3 Row 1 B A C Row 2 A C B Row 3 C B A Col1 Col2 Col3 Row 1 A B C Row 2 C A B Row 3 B C A
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
hyper-Graeco-Latin Square These three Latin squares can be superimposed to form a hyper-Graeco-Latin square. Can be used to control 4 nuisance factors (i.e. block 4 factors). Row Col1 Col2 Col3 Col4 Row 1 B A D C Row 2 C D A B Row 3 D B C A Row 4 A C B D Row Col1 Col2 Col3 Col4 Row 1 D A C B Row 2 A D B C Row 3 B C A D Row 4 C B D A Row Col1 Col2 Col3 Col4 Row 1 A D B C Row 2 C A D B Row 3 B C A D Row 4 D B C A
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
hyper-Graeco-Latin Square I A machine used for testing the wear on types of cloth. I Four pieces of cloth can be compared simultaneously on one machine. I Response is weight loss in tenths of mg when rubbed against a standard grade of emery paper for 1000 revolutions of the machine.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
hyper-Graeco-Latin Square I Specimens of 4 different cloths (A, B,C,D) are compared. I The wearing qualities can be in any one of 4 positions P 1 , P 2 , P 3 , P 4 on the machine. I Each emery ( α, β, γ, δ ) paper used to cut into for quarters and each quarter used to complete a cycle C 1 , C 2 , C 3 , C 4 of 1000 revolutions. I Object was to compare treatments.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
hyper-Graeco-Latin Square i) type of specimen holders 1, 2, 3, 4 ii) position on the machine P 1 , P 2 , P 3 , P 4 . iii) emory paper sheet α, β, γ, δ . iv) machine cycle C 1 , C 2 , C 3 , C 4 . The design was replicated. The first replicate is shown in the table below. P 1 P 2 P 3 P 4 C 1 A α 1 B β 2 C γ 3 D δ 4 320 297 299 313 C 2 C β 4 D α 3 A δ 2 B γ 1 266 227 260 240 C 3 D γ 2 C δ 1 B α 4 A β 3 221 240 267 252 C 4 B δ 3 A γ 4 D β 1 C α 2 301 238 243 290
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
hyper-Graeco-Latin Square A linear model can be fit so that the ANOVA table and parameter treatment effects can be calculated. wear.hypsq <- lm (y ~ treatment + as.factor (rep) + as.factor (position) + as.factor (cycle) + as.factor (holder) + as.factor (paper), data= tab0412) anova (wear.hypsq) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) treatment 3 1705.3 568.45 5.3908 0.021245 * as.factor(rep) 1 603.8 603.78 5.7259 0.040366 * as.factor(position) 3 2217.3 739.11 7.0093 0.009925 ** as.factor(cycle) 6 14770.4 2461.74 23.3455 5.273e-05 *** as.factor(holder) 3 109.1 36.36 0.3449 0.793790 as.factor(paper) 6 6108.9 1018.16 9.6555 0.001698 ** Residuals 9 949.0 105.45 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Balanced incomplete block design I Suppose that instead of four samples to be included on each 1000 revolution cycle only three could be included, but the experimenter still wanted to compare four treatments. I The size of the block is now 3 - too small to accommodate all treatments simultaneously. I A balanced incomplete block design has the property that every pair of treatments occurs together in a block the same number of times. Cycle block 1 A B C 2 A B D 3 A C D 4 B C D Cycle block A B C D 1 x x x 2 x x x 3 x x x 4 x x x
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help