Online Test Notes
docx
keyboard_arrow_up
School
The University of Sydney *
*We aren’t endorsed by this school
Course
5018
Subject
Statistics
Date
Jan 9, 2024
Type
docx
Pages
12
Uploaded by JudgeAntPerson1041
PUBH5018 – Quizzes
Quiz 1 Question 1 - The table below was created from the enrolment information of PUBH5018 Introductory Biostatistics students, 2016. Which of the following statements is true?
Public Health
International Public Health
Clinical Epidemiology
Surgery
Other
Total
Male
60
27
53
50
26
216
Female
188
81
42
17
31
359
Total
248
108
95
67
57
575
a.
There are more males than females studying PUBH5018 Introductory Biostatistics b.
Half of international public health students enrolled in PUBH5018 are female
c.
24% of male students enrolled in PUBH5018 are studying public health
d.
27% of international public health students studying PUBH5018 are male
e.
76% of public health students are female and 25% of surgery students studying PUBH5018 are female
Question 2 – What type of variable is country of birth?
a.
Ordinal
b.
Continuous c.
Nominal d.
Dichotomous
e.
Discrete
Question 3 – Which of the following best describes cumulative frequency if describing grouped data such as age groups in ascending order?
a.
The sum of all individuals that are aged greater than, or equal to, a defined age, expressed as a percentage b.
The sum of all individuals that are aged greater than, or equal to, a defined age, expressed as a number
c.
The sum of all individuals that are aged less than, or equal to, a defined age, expressed as a number d.
The sum of all individuals that are aged less than, or equal to, a defined age, expressed as a percentage Question 4 – Which of the following is a table to present results for the distribution of a nominal variable? a.
Histogram
b.
Bar chart
c.
Frequency distribution d.
Pie chart
Question 5 – Which grouping would be preferred for a frequency distribution of height for people with heights ranging from 141 to 203cm? (Hint: there may be more than one way of grouping these but only one of the options meets the guidelines for the unit)
a.
140 - 144cm, 145 - 149 cm, 150 - 154 cm, 155 - 159 cm, 160 - 164 cm, 165 - 169 cm, 170 - 174 cm, 175 - 179 cm, 180 - 184 cm, 185 - 189 cm, 190 - 194 cm, 195 - 199 cm, 200 - 204 cm.
b.
140 - 145cm, 146 - 150 cm, 151 - 155 cm, 156 - 160 cm, 161 - 165 cm, 166 - 170 cm, 171 - 175 cm, 176 - 180 cm, 181 - 185 cm, 186 - 190 cm, 191 - 195 cm, 196 - 200 cm, 201 - 205 cm.
c.
< 145 cm, 145 - 149 cm, 150 - 154 cm, 155 - 159 cm, 160 - 164 cm, 165
- 169 cm, 170 - 174 cm, 175 - 179 cm, 180 - 184 cm, 185 - 189 cm, 190 - 194 cm, 195 - 199 cm, > 199 cm.
d.
<
145cm, 145 - 150cm, 150 - 155 cm, 155 - 160 cm, 160 - 165 cm, 165 -
170 cm, 170 - 175 cm, 175 - 180 cm, 180 - 185 cm, 185 - 190 cm, 190 - 195 cm, 195 - 200 cm,
>
200 cm.
e.
< 150cm, 150 - 179 cm,
>
180 cm.
Question 8 – If the right of your histogram is obviously longer than the left, the distribution can be described as?
a.
Negatively skewed
b.
Normal distribution c.
Positively skewed
d.
Bimodal distribution Quiz 2 Question 1 – Assuming a boxplot is displayed vertically as in the course notes, what does the y axis represent? a.
The percent of the variable b.
The relative frequency of the variable
c.
The values of the variable d.
The frequency of the variable Question 2 – Based on the following figure, the most appropriate statistic for measuring central location (central tendency) is: a.
Median b.
Mode
c.
Average
d.
Mean
Question 3 – Based on the following figure, which is the most appropriate statistic for the measure of variability?
a.
Range b.
IQR
c.
Confidence interval d.
Standard deviation Question 4 – Which of the following is a correct description of this figure?
a.
As age increases the proportion with dementia increases for both sexes
b.
As age increases the difference in the proportion with dementia between males and females becomes smaller
c.
As age increases the proportion with dementia decreases for both sexes
d.
As age increases the difference in the proportion with dementia between males and females remains fairly consistent
Question 5 – You are presenting an analysis of height in the adult population. The
data values for height in cm were recorded as integers. SPSS output shows a mean height of 172.46789 cm. Assuming the height was normally distributed, what would be the most appropriate way to report the mean?
a.
172 cm
b.
172
c.
172.4 cm
d.
172.5
e.
172.47 cm
Quiz 3
Question 1 – A study has been carried out that has determined that the probability of contracting disease X is 0.77. Reported as a percentage, what is the
probability of NOT contracting disease X?
a.
77%
b.
0.23
c.
0.77
d.
0.23%
e.
23%
Question 2 – In a normal distribution, there is a 99% chance that an observation lies
a.
Within 2.58 standard errors from the mean
b.
Within 1 standard deviation from the mean c.
Within 1.96 standard deviations from the mean
d.
Within 1.96 standard errors from the mean
e.
Within 2.58 standard deviations from the mean
Question 3 – Using Table F1 – Normal Distirbution, what is area in the tail above the z statistic of 2.00?
a.
0.02275
b.
0.4920
c.
0.02222
d.
0.2275
Question 4 – The ages of a population of students are normally distributed with a mean of 27 years and a standard deviation 3.4. What proportion of students is older than 30 years?
a.
0.882 or 88.2%
b.
0.1894 or 18.9%
c.
0.01894 or 1.9%
d.
0.8106 or 81.1%
Question 5 – From Table F1, the area in the tail above the z-statistic of 1.52 is 0.0643. What is the area above -1.52?
a.
1.000
b.
0.9357
c.
0.0152
d.
0.0643
Quiz 4
Question 1 – Assuming a population SD of 0.8, what would the standard error (SE) of the sample mean be if n = 125? Would the SE be higher or lower than the number you estimated if n was increased to 150?
a.
0.0716 and if n increased to 150 the SE would be lower
b.
0.0064 and if n is increased to 150 the SE would be lower
c.
139.75 and if n increased to 150 the SE would be lower
d.
Unable to calculate as have not been provided with the sample mean e.
0.0716 and if n is increased to 160 the SE would be higher Question 2 – A paper reports a p-value of 0.031, what strength of evidence does this equate to?
a.
Little evidence
b.
Weak evidence
c.
Evidence
d.
No evidence
e.
Weak evidence Question 3 – The level of a fictitious substance in the blood, Pysarium, was measured in a sample of 30 biostatistics students. The mean level was estimated at 148.2mg. From earlier work, the standard deviation (SD) of Pysarium in the population is known to be 24.66mg. You are planning on calculating 95% confidence intervals. Which of the following is the most correct equation to calculate these?
a.
148.2 +/- 1.96 x 4.5
b.
148.2 +/- 2.045 x 24.66
c.
148.2 +/- 1.96 x 4.5023
d.
148.2 +/- 1.96 x 24.66
e.
148.2 +/- 2.045 x 4.5023
Question 4 – The level of fictitious substance in the blood, Killedarite, was measured in a sample of 61 biostatistics students. The mean level was estimated at 12.11 mg with a sample standard deviation of 2.03mg. The researchers were unable to find any evidence of the expected standard deviation of Pysarium in the population. 1. Calculate 95% confidence intervals for the mean of Killedarite, 2. Is the sample consistent with the sample being drawn from the general population with a known mean of 12.61mg? Why? From the following which one is more correct?
a.
95% CI 11.60 to 12.62mg. Yes, it is consistent as the population mean falls within the confidence intervals for the mean b.
95% CI 11.6 to 12.6. Yes, it is consistent as the population mean falls within the confidence intervals for the mean. c.
95% CI 11.59 to 12.63 mg. Yes, it is consistent as the population mean falls within the confidence intervals for the mean
d.
95% CI 11.59 to 12.63. No, it is not consistent as the population mean is lower than the mean and most of the confidence interval. e.
95% CI 11.60 to 12.62. No, it is not consistent as the population mean is lower than the mean and most of the confidence interval
Question 5 – Below is some output from a one-sample t-test. Using the information from the table, find the values for the blank cells. N
Mean
Std. Deviation
Std. Error Mean
Weight (kg)
12.3759
0.9382
Test Value = 64.5
T
Df
Sig. (2-tailed)
Mean difference
95% confidence interval of the difference
Lower
Upper
Weight (kg)
173
0.003
2.9
a.
N – 174, mean – 61.6, t – 1.04, CI – 1.0 to 4.7
b.
N – 174, mean – 67.4, t – 3.05, CI – 1.0 to 4.7
c.
N – 173, mean – 61.6, t – 2.12, CI - -0.5 to 2.7
d.
N – 173, mean – 67.4, t – 2.86, CI – 1.0 to 4.7
e.
N – 173, mean – 61.6, t – 4.50, CI – 3.1 to 4.7
Quiz 5
Question 1 – Results from a study of whether the mean height of a sample of 92 university students differs from the published mean height of the Australian adult
population are summarised below. “There is very strong evidence that the mean height of the university students is higher compared with the known mean of the adult Australian population (t=4.68, DF 91, p < 0.001). The mean height of the students was 169.7cm compared to the population mean of 165.0 cm, the difference was estimated at 4.7cm (95% CI: 2.7 – 6.3)”. Which SPSS output did they use to produce this summary?
a.
N
Mean
Std. Deviation
Std. Error Mean
Height (cm)
92
169.65
9.539
0.995
Test Value = 165.0
T
Df
Sig. (2-
tailed)
Mean difference
95% confidence interval of the difference
Lower
Upper
Height (cm)
4.678
91
0.000
4.652
-2.68
6.63
b.
N
Mean
Std. Deviation
Std. Error Mean
Height (cm)
92
169.65
9.539
0.995
Test Value = 165.0
T
Df
Sig. (2-
tailed)
Mean difference
95% confidence interval of the difference
Lower
Upper
Height (cm)
4.678
91
0.000
4.652
2.68
6.63
c.
N
Mean
Std. Deviation
Std. Error Mean
Height (cm)
92
169.65
9.539
0.995
Test Value = 165.0
T
Df
Sig. (2-
tailed)
Mean difference
95% confidence interval of the difference
Lower
Upper
Height (cm)
91
4.678
0.000
4.652
2.68
6.63
d.
N
Mean
Std. Deviation
Std. Error Mean
Height (cm)
92
169.65
9.539
0.995
Test Value = 166.5
T
Df
Sig. (2-
tailed)
Mean difference
95% confidence interval of the difference
Lower
Upper
Height (cm)
4.678
91
0.000
4.652
2.68
6.63
Question 2 – You are analysing results from a paired t-test (N=16), the t-statistic is 3.555. What are the numerical values for DF and p and what strength of evidence does this equate to?
a.
DF 15, p < 0.001. Strong evidence.
b.
DF 15, p < 0.001. Significant c.
DF 15, p > 0.001. Strong evidence d.
DF 15, p > 0.01. Strong evidence
Question 3 – A 1 year crossover study of a weight loss drug compared to placebo produced the following statistics (t = 4.89, DF 30, p < 0.001). The difference was
estimated at 6.5kg (95% CI: 5.1 to 7.9) greater weight loss in favour of the drug. The researchers regarded a difference of 5.0kg as clinically important. Which of the following is the most correct conclusion?
a.
The results were statistically significant but are not of practical or clinical significance/importance
b.
The results were not statistically significant and are not of practical or clinical significance/importance
c.
The results were statistically significant and are possibly of practical or
clinical significance/importance d.
The results were statistically significant and are definitely of practical or clinical significance/importance e.
The results were not statistically significant and are inconclusive regarding practical or clinical significance/importance Question 4 – What is missing from this conclusion? “There was strong evidence of
a difference in mean pain score between a new drug and standard care (t = 2.68, DF 120, p < 0.01). Th mean pain score whilst on drug was 3.5 points compared with 5.6 points for standard care. The mean difference in pain score was lower in favour of the drug (95% CI: 1.8 to 2.4)
a.
Significance of effect
b.
Point estimate of effect
c.
Test of effect
d.
Certainty of effect
e.
Direction of effect
Question 5 – Below is some edited SPSS outputs from a paired t test. There are 2 places highlighted where the value is blank. Using the available information in the
output, what are the missing values of: 1) mean heart rate (post) and 2) DF
N
Mean
Std. Deviation
Std. Error Mean
Pair 1
Heart rate pre (bpm)
75
76.16
10.916
1.544
Heart rate post (bpm)
75
11.033
1.560
Paired differences
Mean
Std Devi
atio
n
Std. Error mean
t
df
95% confidence interval of the difference
Lower
Upper
Pair 1
Heart rate pre (bpm)
Heart rate post (bpm)
0.400
2.06
0
0.176
1.373
-0.186
0.986
a.
76.56 beats, DF 74
b.
76.56 beats, DF 75
c.
75.76 beats, DF 75
d.
75.76 beats, DF 74
e.
75.76 beats, DF 149
Question 7 – If there was no p-value reported at the output for a paired t-test, what two results would confirm if this was a statistically significant difference or not?
a.
1) Confirm whether the CI’s contain the null value of 0 nights. 2) Look at the standard error (SE) of the mean difference. If this is similar to the standard deviation then the missing p-value will be > 0.05
b.
1) Confirm whether the CI’s contain the null value of 0 nights. 2) Find the significance of the t-value from a t-table
c.
1) Confirm whether the CI’s contain the null value of 0 nights. 2) Compare the standard deviation for the drug and the placebo. If similar the missing p-value will be <0.05. d.
1) Confirm whether the CI’s contain the null value of 0 nights. Use the p-value from the correlations table. If it is significant then the missing p-value will also be significant. Quiz 6
Question 1 – A researcher conducts a study to compare the effectiveness of two skin lotions (lotion A and lotion B) used to relieve the symptoms of an allergic skin
rash. A total of 50 people with the skin rash on both arms are included in the study. For each subject, lotion A is applied to one arm (either left or right, chosen at random) and lotion B is applied to the other arm. Below is a 2 x 2 table of the results. What is the value of chi-square (x2)?
Lotion B
Total
Better
No better
Lotion A
Better
15
10
25
No better
8
17
25
Total
23
27
50
a.
0.06 with 1DF
b.
0.50 with 49DF
c.
0.22 with 1DF
d.
0.50 with 1DF
e.
0.11 with 1DF
Question 2 – Researchers analysed data from 30 paired samples using McNemar’s test for paired proportions. The chi-squared statistic was estimated at 46.70. What would be the p-value for this test?
a.
P < 0.001
b.
P < 0.05
c.
P < 0.025
d.
P > 0.75
Question 3 – A study was conducted to determine if the mean weight of the PUBH5018 students was consistent with the known mean weight of the entire USYD student population. What test would have been used to test this?
a.
McNemar’s test
b.
Paired t-test c.
One sample t-test
d.
The binomial test
Question 4 – Two different swab types, ‘flocked’ and traditional, were assessed for their ability to detect Shigella (a common pathogen) in children with gastroenteritis in Botswana. The number of children studied was 236. Shigella was detected by both swab types in 39 children, only by flocked swabs in 18, only
by traditional swabs in 6 and not detected by either in 173. From the following, which table accurately displays these data?
a.
Traditional Swab
Total
Positive
Negative
Flocked swab
Positive
173
6
179
Negative
18
39
45
Total
191
45
236
b.
Traditional Swab
Total
Positive
Negative
Flocked swab
Positive
39
18
57
Negative
6
173
179
Total
45
191
236
c.
Traditional Swab
Total
Positive
Negative
Flocked swab
Positive
6
173
57
Negative
39
18
179
Total
45
191
236
d.
Traditional Swab
Total
Positive
Negative
Flocked swab
Positive
18
39
57
Negative
173
6
179
Total
191
45
236
Question 5 – In a study examining anxiety among students, the proportion who had anxiety in a sample of 20 was 0.45. What is the standard error of the proportion? a.
0.05534
b.
0.02487
c.
0.11124
d.
0.01238
Quiz 7 Question 1 – A cohort study was conducted by Chao C. et al (2009). “Correlates for Completion of 3-Dose Regimen of HPV Vaccine in Female Members of a Managed Care Organisation”. Women who attended the first of a series of 3 vaccinations for HPV had their health insurance status recorded. At the end of 12 months, whether they had completed all 3 vaccinations or not was recorded. The exposure of interest is having health insurance and the outcome of interest is having completed all the vaccinations. From the output below, calculate an appropriate estimate of effect and choose the most correct one from the four options given
Completed vaccinations
Total
Yes
No
Health insurance
Yes
414
724
1138
No
55
220
275
Total
469
944
1413
a.
RR = 0.55
b.
RR = 1.82
c.
RR = 0.80
d.
OR = 2.29
Question 2 – A case control study was performed examining predictors of neonatal mortality in rural Northern Ethiopia. History of neonatal mortality was one of the predictors (exposures) of interest. From the output below, calculate
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
the most appropriate test statistic and p-value and provide the strength of evidence. Neonatal mortality
Total
Case
Control
History of neonatal mortality
Yes
26
20
46
No
49
130
179
Total
75
150
225
a.
Very strong evidence, x
2
1
= 13.99, p < 0.001
b.
Very strong evidence, x
2
= 5.67 x 10
3
, p < 0.001
c.
Very strong evidence, x
2
1
= 46.17, p < 0.001
d.
No evidence, x
2
1
= 0.01, p < 0.98
e.
No evidence, x
2
1
= 0.06, p < 0.9
Question 3 – Assuming exposure is displayed in rows and outcome in columns and both the exposure and outcome of interest appear first in a 2 x 2 table, what is the correct formula for the odds ratio (OR)?
a.
(a/n1) / (c/n2)
b.
N1/n
c.
Ad/bc
d.
(a/n1) – (c/n2)
Question 4 – You are performing a chi-square test and wish to report relative risk.
Why do you need to take the log of RR (ln) in order to calculate your confidence intervals?
a.
The mean of relative risk is right skewed and needs to be transformed to be ‘normalised’
b.
The distribution of relative risk is left skewed and needs to be transformed to be ‘normalised’ c.
The mean of relative risk is left skewed and needs to be transformed to be ‘normalised’
d.
The distribution of relative risk is right skewed and needs to be transformed to be ‘normalised’
Question 5 – You have calculated the following risk estimate and confidence intervals in terms of ln(OR): 1.238078 (95% CI: 0.568924 to 1.907233). How should they appear in your conclusion? a.
OR = 1.24 (95% CI: 0.57 to 1.91)
b.
OR = 3.45 (95% CI: 1.77 to 6.73)
c.
OR = 0.09 (95% CI: - 0.24 to 0.28)
d.
OR = 0.21 (95% CI: -0.56 to 0.65)
Question 6 – A study was conducted in Nigeria on the effect of HIV and malaria parasites co-infection on immune-hematological profiles among patients attending anti-retroviral treatment clinic in Infectious’ Disease Hospital Kano, Nigeria. All patients were known to have HIV and the risk factors for co-infection with malaria were studied. One of the risk factors to be studied was whether there was an association between sleeping with insecticide treated mosquito nets and contracting malaria. Seven hundred and sixty one (761) people were studied. Of the 211 that contracted malaria, 127 slept with treated bed nets. Of the 550 who did not contract malaria, 438 slept with treated bed nets. Overall, what was the prevalence of malaria?
a.
42.9%
b.
22.5%
c.
27.7%
d.
100.0%
Question 7 – Analyse these data using a chi-square test and provide a conclusion. For this exercise please use netting as the exposure of interest and contracting malaria as the outcome of interest for the calculation and conclusion. a.
There is very strong evidence of an association between sleeping with insecticide treated bed netting and contracting malaria (x
2
1
= 30.16, p<0.001). Compared to those with no netting (42.9%), those with netting (22.5%) have a 36% increased risk of contracting malaria, RR = 1.36 (95%CI: 1.19 to 1.54).
b.
There is very strong evidence of an association between sleeping with insecticide treated bed netting and contracting malaria (x
2
1
= 30.16, p<0.001). Compared to those with no netting (42.9%), those with netting (22.5%) have an increased risk of approximately 90%, RR = 1.91 (95%CI: 1.53 to 2.38).
c.
There is very strong evidence of an association between sleeping with insecticide treated bed netting and contracting malaria (x
2
1
= 30.16, p<0.001). Compared to those with no netting (42.9%), those with netting (22.5%) have approximately 40% the risk of contracting malaria, RR = 0.39 (95%CI: 0.28 to 0.55).
d.
There is very strong evidence of an association between sleeping with insecticide treated bed netting and contracting malaria (x
2
1
= 30.16, p<0.001). Compared to those with no netting (42.9%), those with netting (22.5%) have approximately half the risk of contracting malaria, RR = 0.52 (95%CI: 0.42 to 0.66).
When performing a chi-square test in SPSS, when do you NOT need to ‘weight’ cases by ‘n’?
a.
When data are paired
b.
When it is an RCT
c.
When you have observations for exposure and outcome available for each individual subject
d.
When it is a case control study
Quiz 8 Question 1 – In isolation, from which of the following is it reasonable to assume normality for a sample of continuous observations? a.
The sample size is very large
b.
The median lies directly in the middle between the lower and upper quartile
c.
A plot of the data is bell-shaped and symmetrical d.
The mean and the median are exactly the same
Question 2 – In a crossover study of a continuous variable, the most appropriate test to use is
a.
The two sample t-test if normality can be reasonably assumed or the Wilcoxon’s Rank Sum Test if non-normal or normality cannot be reasonably assumed b.
The paired t-test if normality can be reasonably assumed or the Wilcoxon’s Signed Rank Test if non-normal or normality cannot be reasonably assumed c.
The paired t-test if normality can be reasonably assumed or the Wilcoxon’s Rank Sum Test if non-normal if non-normal or normality cannot be reasonably assumed d.
The two sample t-test if normality can be reasonably assumed or the Wilcoxon’s Signed Rank Test if non-normal or normality cannot be reasonably assumed
Question 3 – The Wilcoxon’s Rank Sum Test is the most appropriate test to use when: a.
The samples are paired and it is not appropriate to use McNemar’s test as cannot be assumed to be normally distributed b.
The samples are independent and are non-normal or cannot be assumed to be normally distributed
c.
The samples are paired and are non-normal or cannot be assumed to be normally distributed d.
The samples are independent and it is not appropriate to use the Chi-
square test as cannot be assumed to be normally distributed
Question 4 – In a multiple choice question with 4 possible answers the probability
of randomly selecting the correct answer is 0.25. If a multiple choice quiz has 8 questions, what is the probability of a person randomly selecting the correct answer to ALL questions? a.
0.10 (10%)
b.
0.0000015 (0.002%)
c.
1.0 (100%)
d.
0.00015 (0.02%)
Question 5 – This plot displays the distribution of the differences in quality of life (QOL) scores (measured on a scale from 0 to 10) between two paired samples. Based on this, what is the most appropriate test to use?
a.
Wilcoxon’s Signed Rank Test
b.
Two sample t-test
c.
Sign test
d.
Wilcoxon’s Rank Sum Test
e.
Paired t-test
Answers
Quiz 1 – e, c, c, c, a, -, -, c
Quiz 2 – c, a, d, a, a, -, -, -
Quiz 3 – e, e, a, b, b, -, -, -
Quiz 4 – a, c, c, c, b, -, -, -
Quiz 5 – b, a, d, b, d, -, b, -
Quiz 6 – a, a, c, b, c, -, -, -
Quiz 7 – b, a, c, d, b, c, d, c
Quiz 8 – c, b, b, b, e, -, -, -
Past Exams
2011
Question 1 (Short Answer, 5 marks) (final exam only – not for online test)
As part of a study of women with breast cancer, the maximum size of the tumour
(mm) was estimated based on an MRI film prior to surgery. After surgery, the actual maximum size of the tumour was measured based on histology. Measurements were available for 32 tumours. Simple linear regression was used to examine the relationship between the estimated size of each tumour using MRI, and the actual size measured at histology. The resulting regression equation is given below. The R
2
for this model is 0.376. MRI size = 0.823 + 0.527 × Histology size a.
Interpret the intercept, and the slope of the regression line.
b.
What values would you expect the intercept and the slope to take if the measurements agreed perfectly. Question 2 (Short Answer, 5 marks)
Clark et al (
Clinical and Experimental Ophthalmology
, 2010) reported the results of a study of vision loss (partial or complete) in Aboriginals in remote Western Australia. The tables below are based on 920 study participants who were assessed for vision loss and diabetes. Diabetes * Vision loss crosstabulation
Vision loss
Total
Yes
No
Diabetes
Yes
Count
121
208
329
% within Diabetes
36.8%
63.2%
100.00%
No
Count
38
553
591
% within Diabetes
6.4%
93.6%
100.00%
Total
Count
159
761
920
% within Diabetes
17.3%
82.7%
100.00%
Risk Estimate
95% confidence interval
Value
Lower
Upper
Odds Ratio for Diabetes (Yes/No)
8.466
5.687
12.602
For cohort Vision loss = Yes
5.720
4.077
8.025
For cohort vision loss = No
.626
.621
.736
N of valid cases
920
a.
Based on the first table: i.
What proportion of people with diabetes have vision loss?
ii.
What proportion of people without diabetes have vision loss?
b.
Using only one estimate (and its 95% confidence interval) from the second table, comment very briefly on the association between vision loss and diabetes (Do NOT do any calculations to answer this question)
Question 3 (20 marks)
Wong et al (
Internal Medicine Journal
, 2011) compared two clinical prediction rules for the diagnosis of pulmonary embolism (PE) in patients presenting to the Emergency Department with clinically suspected PE. The commonly used Wells Rule which requires a subjective assessment of the patient was compared with the Revised Geneva Scores which is based entirely on objective variables. A total of 98 patients were assessed using both rules. Of these, 36 patients were classified as “low risk” by both rules, and 28 patients were classified as “intermediate/high risk” by both rules. 6 patients were classified as “intermediate/high” risk by the Wells Rule, but “low risk” by the Revised Geneva Scores; whereas 28 patients were classified as “low risk” by the Wells Rule but “intermediate/high risk” by the Revised Geneva Scores.
a.
Display these data in a 2 x 2 table
b.
Fully analyse these data and write a brief report stating your results and conclusions Question 4 (20 marks)
Obstructive sleep apnoea is associated with raised blood pressure which is associated with an increased risk of cardiovascular disease. Pepperell et al (
The Lancet
, 2002) conducted a study to assess the use of nasal continuous positive airway pressure (nCPAP) over a 4 week period on blood pressure in sleep apnoea patients. The effect of two interventions were compared: (i) therapeutic nCPAP and (ii) subtherapeutic nCPAP (Control). The outcome was change in each participant’s 24- hour average blood pressure over the 4-week study period. The results for the two groups are shown in the table below. Change in 24 hour average blood pressure (mmHg)
N
Mean Standard deviation
Therapeutic nCPAP
59
-2.5
6.1
Subtherapeutic nCPAP
59
0.8
5.4
a.
Fully analyse these data and write a brief summary of your results and
conclusions
b.
What assumptions have you made in conducting your analysis. Question 5 (20 marks) (final exam only – not for online test)
Rotavirus diarrhea is a major cause of hospital admissions and mortality amongst infants in developing countries. A study is planned to investigate the association between breast feeding and rotavirus diarrhea among infants admitted to hospital. The study will be conducted in a Pediatrics Emergency Department. The infants who test positive for rotavirus will be the cases
. The controls will be sampled from the admitted children who do not have diarrhea. Breast feeding status (any versus none) prior to admission will be ascertained for all infants. It is expected that about 50% of the control infants will be breast fed. a.
How many cases and controls will be required for this study to detect an odds ratio of at least 2.0, with 80% power at the 5% (two-sided) significance level? b.
How many cases and controls will be required for this study to detect an odds ratio of at least 2.0, with 80% power at the 5% (two-sided) significance level if 3 controls are used per case? Approximately 500 infants are admitted to the Paediatrics Emergency Department each month. OF these, 3% will test positive for rotavirus diarrhea. c.
How many months will it take to complete the study if
i.
The study design and sample size in (a) are used?
ii.
The study design and sample size in (b) are used? d.
For logistical reasons, the study must be completed in 4 months. With the number of cases that will be available during that 4 months, and assuming that 3 controls will be used per case, and that a 5% (two-
sided) significance level will be used:
i.
what is the revised minimum effect size that can be detected with 80% power? ii.
what is the revised power to detect an odds ratio of 2.0 ? e.
The few previous studies that examined the effect of breast feeding on rotavirus diarrhea have produced conflicting results. Do you think that conducting this study over a 4 month period will resolve the debate? (State very briefly one major statistical reason for your conclusion.) 2012
Question 1 (Short Answer, 5 marks)
Twenty–two patients with arthritis were enrolled in a randomized trial to compare two pain relief medications (drug A and drug B). Ten patients received drug A and twelve patients received drug B. After one week on medication, the patients were asked to rate their pain due to arthritis on a scale of 0 (no pain) to 10 (worst pain ever). The boxplot and summary statistics below describe distribution of the pain scores for each of the two groups.
The researchers used a nonparametric test to compare the pain scores in the two
groups. The p-value for this test was 0.11.
Table 1 – Summary of pain scores for patients who received drug A or drug B
Drug A (n = 10)
Drug B (n = 12)
Mean
3.2
4.4
Standard deviation
2.0
2.2
Median
2.5
3.5
Interquartile range
2.3
2.8
a.
Which nonparametric test would be appropriate to analyse these data?
b.
Why did the researchers choose a nonparametric test to analyse the data?
c.
Write a very brief conclusion summarising the most relevant results of
this trial. Question 2 (Short Answer, 5 marks)
A case control study was conducted to examine the association between hormone therapy and breast ductal carcinoma in situ (DCIS) among post-
menopausal women in Connecticut, USA (Calvocressi et al, Cancer Epidemiology 2012). Women in the study were categorised as either “Yes, have used” or “No, have not used” hormone therapy. The results of the analysis are given in the SPSS output below: Hrt_use * case_or_control Crosstabulation
Case or control
Total
Case
Control
Hrt_use
Yes, have used
222
238
460
No, have not used
326
342
668
Total
548
580
1128
Chi square tests
Value
Df
Asymp. Sig (2-
sided)
Exact Sig. (2-sided)
Exact Sig. (1-sided)
Pearson Chi-square
.032
1
.858
Continuity Correction .014
1
.906
Likelihood Ratio
.032
1
.858
Fisher’s Exact Test
.904
.453
Linear-by-linear
.032
1
.858
Association
N of valid cases
1128
Risk Estimate
95% confidence interval
Value
Lower
Upper
Odds Ratio for hrt_use
(Yes/No)
.979
.772
1.241
For cohort case_or_control = Case
.989
.875
1.118
For cohort case_or_control = Control
1.011
.901
1.134
N of valid cases
1128
Using the above output (NOTE: no calculations are required for this question): a.
Choose the appropriate risk estimate. Interpret this estimate and its confidence interval b.
State very briefly why you chose this estimate
c.
Is there evidence of an association? Report the part of the SPSS output
that supports your answer.
Question 3 (20 marks)
Between December 2006 and March 2007, residents of Esperance in Western Australia reported that there had been large numbers of unexplained bird deaths. Testing by environmental authorities identified high levels of airborne lead contamination. A survey was then carried out to compare blood lead levels among children aged less than 5 years in Esperance and among children in another community, Fremantle, that had not been affected. (Rossi et al, ANZ J Public Health 2012). In the town of Esperance, 333 children aged less than 5 years had their blood tested and 82 of these children had blood lead levels ≥ 5μg/dL. In the town of Fremantle, 100 children were tested and 8 of these children had blood lead levels
≥ 5μg/dL. a.
Display these data in a 2 x 2 table
b.
Fully analyse these data to assess whether there is a difference between the two towns in the prevalence of blood lead levels ≥ 5μg/dL among children aged less than 5 years
c.
Write a brief summary of your results and conclusions Question 4 (20 marks)
The effect of two drugs (formoterol and salbutamol) on peak expiratory flow among children with asthma was examined in a randomised cross-over trial (Senn, Stat Med 1990). Thirteen children were included in the trial. All children received both drugs, with the order of administration being random. Seven children were treated with formoterol on their first visit to the clinic and had their peak expiratory flow (in litres per minute) measured eight hours later. On their second visit to the clinic, these seven children were treated with salbutamol and had their peak expiratory flow measured eight hours later. A similar process was followed for the remaining six children, however for these children they received salbutamol at the first visit and formoterol at the second visit. Summary statistics for the measurements of peak expiratory flow for all thirteen children are given below. N
Mean (l/min)
Standard
deviatio
n (l/min)
Median (l/min)
IQR (l/min)
Peak expiratory flow after treatment with formoterol
13
341.2
59.66
340
82.5
Peak expiratory flow after treatment with salbutamol
13
295.8
82.86
300
107.5
Difference between treatments in peak expiratory flow (formoterol – salbutamol)
13
45.4
40.59
40
57.5
a.
Analyse these data and write a brief report to summarise your results and conclusions
b.
State very briefly the assumption you have made in the statistical test you chose to analyse these data
Question 5 (20 marks) (final exam only – not for online test)
The prostate specific antigen (PSA) test has been proposed as a screening test for prostate cancer. However, there has been a lot of debate about whether the benefits of the test outweigh the harms. This makes it difficult for men to decide whether to have a PSA test or not. Researchers plan to develop and evaluate a decision aid to assist men to decide whether or not to be screened. The research program will have two stages: In stage 1 the researchers intend to carry out a survey in their local health district
to determine the percentage of men aged between 50 and 75 who have not had a PSA test. In stage 2 an interactive website will be developed to provide information to men
about the benefits and harms of the PSA test. The website will be evaluated using
a randomised controlled trial in which men will (a) be given access to the website, or (b) receive the currently available printed leaflet which provides information on the PSA test (control group). After one week each participant’s knowledge about the PSA test and prostate cancer will be measured using a validated questionnaire. This provides a score out of 100 with higher values indicating more knowledge. a.
From a review of the literature the researchers believe the percentage
of men who have not had a PSA test is roughly 40%. How many men would they need for the survey in stage 1 if they wanted the resulting confidence interval for their estimate to be no wider than 10%? b.
The researchers inform you that the percentage of men who have not had a PSA test could be 40%, 50% or as high as 60%. How many men would you now advise them to include in their survey for stage 1? Explain very briefly your reason for this choice. c.
For stage 2, how many men are required in each group to detect a difference of 5 in mean knowledge scores between the groups, assuming a power of 80% and a two- sided significance level of 0.05. (In a pilot study the researchers have estimated the standard deviation of the knowledge score to be 15.) d.
The researchers only have enough resources to recruit a total of 200 men for the study. What difference could be detected with this sample size assuming the same power and significance level given in (c)? e.
In order to further reduce costs, the researchers wish to simplify the administration of the stage 2 randomised trial. They suggest that the 200 men be recruited from twenty general practices across the local health district. They propose that the twenty general practices be randomly allocated to either receive access to the website or to receive the printed leaflet. Ten men from each practice will then be recruited to participate in the study. Will this achieve the statistical design requirements assumed in (d)? Give one reason for your answer. 2013
Question 1 (Short Answer, 5 marks) (final exam only – not for online test)
A study was carried out using data from the Lothian Birth Cohort 1921 to investigate the factors that are associated with quality of life among older people
(Qual Life Res 2012). One of the factors that the researchers investigated was depression. Four hundred and fifty people who were aged about 80 years and living independently in Edinburgh completed a questionnaire which measured quality
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
of life and also depression. Both variables were measured on a continuous scale. Higher values on the quality of life scale indicate better quality of life, whereas higher values on the depression scale indicate an increased depressed state. The researchers assessed the association between quality of life and depression using simple linear regression. Some of the results from the SPSS output are provided in the table below. Unstandardised coefficient
B
Std. Error
Sig.
R Square
Depression score
-0.453
0.055
<0.001
0.215
a.
Calculate a 95% confidence interval for the slope of the fitted regression line
b.
Based on the estimated slope, summarise the relationship between quality of life and depression c.
State briefly one of the assumptions made in this analysis Question 2 (Short answer, 5 marks)
Researchers in the United States conducted a study to determine whether participation in cooking classes improved the diets of people with type II diabetes
(J Education Nutrition and Behavior, 2012). One hundred and seventeen people with type II diabetes were enrolled in the study. Each participant recorded the food that they ate for three days before the class and for three days one month after the class. The researchers compared the amount of fat consumed before the class and one month after the class using a non-parametric (distribution-free) test. The results are summarized in the table below: Before class (n = 117)
One month after class (n = 117)
Difference (after – before)
P-value
Median
76
63
-8
<0.001
Interquartile range
42
33
30
a.
Which nonparametric test do you think has been used to analyse these data?
b.
What test would be appropriate to use if the researchers had assumed
the data were Normally distributed?
c.
Write a very brief conclusion summarising the most relevant results of
the trial Note: no calculations are required to answer (a), (b), (c)
Question 3 (20 marks)
A family planning organisation updated their pamphlet which provides information to young adults on the different types of contraception. They then conducted a randomised controlled trial to assess whether the knowledge of young adults about contraception differed according to which pamphlet they had
read. Sixty-two people aged 14 to 25 were recruited to the study. Each person was then randomly allocated to either read the new pamphlet or the old pamphlet. Thirty-three people were allocated to the new pamphlet and twenty-nine people were allocated to the old pamphlet. After reading the pamphlet each individual completed a questionnaire that assessed their knowledge about contraception. Their questionnaire answers were
used to construct a knowledge score which could take any value between 0 and 100. Higher values of the knowledge score indicate greater knowledge regarding contraception. You can assume that the knowledge scores are Normally distributed. The data from the trial are summarised below. N
Mean knowledge score
Standard deviation
Old pamphlet
29
60
15
New pamphlet
33
72
14
a.
Fully analyse these data to assess whether there is a difference in knowledge scores between people who read the new pamphlet compared to people who read the old pamphlet b.
Write a brief summary of your results and conclusions c.
What assumption, other than that the knowledge scores are Normally distributed, have you made in this analysis?
Question 4 (20 Marks)
Barrett’s oesophagus (BO) is an abnormal change in the oesophagus (the tube that runs from the mouth to the stomach). BO is known to be a rare condition in the general population. A case control study was carried out to determine whether Helicobacter pylori is associated with the risk of developing BO (Int. J. Cancer, 2011). Helicobacter pylori (HP) is a bacterium that is found in the stomach. People with BO were identified from the records of pathology laboratories based in metropolitan Brisbane. Two hundred and ninety six people with BO agreed to take part in the study. Twenty-eight of these people with BO were found to have the HP bacterium. Three- hundred and ninety controls (people who do not have BO) were also randomly selected from metropolitan Brisbane. Seventy-three of the controls were found to have the HP bacterium. a.
Display these data in a 2 x 2 table
b.
Fully analyse these data to assess whether there is an association between the presence of Helicobacter pylori (HP) and Barrett’s oesophagus (BO)
c.
Write a brief summary of your results and conclusions
Question 5 (20 marks) (final exam only – not for online test)
Diets high in salt can lead to high blood pressure and cardiovascular disease. Researchers have developed a smartphone application (app) that they hope will reduce the amount of salt that people consume. The smartphone app allows people to scan the barcodes of food to find out whether the food is high in salt. The app also provides recommendations on alternative healthier food choices. The researchers have designed a randomised controlled trial to evaluate the app. Participants will be randomly allocated to one of two groups: Group 1
: will receive the smartphone app
Group 2
: will be provided a leaflet explaining the impact of a high salt diet on health. Each participant’s blood pressure will be measured at the end of four weeks. The standard deviation of blood pressure is known to be 20mmHg. The primary aim of the study is to assess the impact of the smartphone app on blood pressure. a.
Calculate the required sample size for the study if equal numbers are allocated to Group 1 and Group 2 and a difference of 5mmHg is required to be detected with 80% power and a two-sided signific
a
nce level of 0.05 A secondary aim of the study is to compare the proportion of people in each group who use other health related smartphone apps during the 4 week period of the trial. b.
The researchers believe that 40% of people in Group 1 will use other health related smartphone apps compared to 20% in Group 2
. Assuming that equal numbers are allocated to Group 1 and Group 2
, how many people would be required for the study to detect this difference with a power of 80% and a two- sided significance level of 0.05? c.
Based on your answers to (a) and (b), how many people would you recommend the researchers should include to achieve both the primary and secondary aims of the study? Briefly explain your answer.
d.
The researchers have decided to keep the number of people allocated to Group 2 the same as computed in (a), but to allocate twice that number to Group 1
. What is the revised power of the study assuming the difference to be detected remains at 5mmHg and the significance level remains as 0.05 (two-sided)? e.
Using the new sample size calculated in (d), what is the approximate difference between groups in the proportion using other health related apps that can be detected assuming 20% usage in Group 2, a power of 80% and a two-sided significance level of 0.05. Group 2 is assumed to have the lower proportion. 2014
Question 1 (Short Answer, 5 marks)
When electronic appliances (for example, computers and televisions) reach the end of their life they are often sent to be recycled. If this electronic waste (e-
waste) is not handled appropriately, the people involved in processing this e-
waste can be exposed to toxic substances. Newly born children of female workers
may also be affected. A study was carried out in China to assess the impact of exposure to e-waste on the amount of lead (a toxic substance) in the blood of new born children (
Ni, Science of the Total environment, 2014
). Blood was analysed for 126 children from a town where e-waste was processed, and 75 children from a town where e-waste was not processed. The results are summarised below.
Town
Number of children
Median lead level (ng/L)
Minimum lead level (ng/L)
Maximum lead level (ng/L)
Processes e-waste
126
110
28
379
Does not process e-
waste
75
57
12
285
The researchers compared the lead levels in the two towns using a non-
parametric (distribution free) test as the lead levels were skewed to the right. The result of this test was P<0.001. a.
What non-parametric test did the authors use to compare the lead levels between the two towns? b.
If the lead levels were Normally distributed, what test do you think the researchers would have used to compare the lead levels between the two towns? c.
Briefly summarise the results of the study. (
Note
: no calculations are required.) Question 2 (Short Answer, 5 marks)
Studies have demonstrated that women aged between 15 and 74 who participate in breast cancer screening have lower breast cancer mortality than women who do not participate in screening. However, no studies have included women 75 or older and so the benefits are not as clear for these women. A decision aid was created for women 75 or older to help them decide whether to continue with breast cancer screening (
Schonberg, JAMA Intern Med, 2014
). The decision aid describes the possible benefits and harms of continuing with breast cancer screening. Researchers conducted a study to determine the impact of the decision aid on a woman’s intention to continue with breast cancer screening. Forty-five women were recruited to the study. Each woman was asked whether they intended to continue with breast cancer screening before they read the decision aid, and then again after they read the decision aid. The data were analysed using SPSS and the output is given below. Before reading decision aid * After reading decision aid Crosstabulation
After reading decision aid
Total
Yes, will continue
No, will not continue
Before reading decision aid
Yes, will continue
23
14
37
No, will not continue
2
6
8
Total
25
20
45
Chi-Square Tests
Value
Exact Sig. (2-sided)
McNemar Test
.004
N of Valid Cases 45
a.
What proportion of women in the study said yes, they will continue with breast cancer screening before reading the decision aid? b.
What proportion of women in the study said yes, they will continue with breast cancer screening after reading the decision aid? c.
Briefly explain why the researchers used McNemar’s test to compare these proportions? d.
Is there evidence of a difference in these proportions? Briefly explain your answer. (
Note
: no calculations are required.) Question 3 (20 marks)
In Australia, infant feeding guidelines recommend that children should be introduced to drinking from a cup at 6 months of age. A randomised controlled trial was conducted to determine whether six visits from
a specially trained nurse in the first year of a baby’s life increased the proportion of babies who were drinking from a cup at the age of one (
Wen, Arch Pediatric Adolesc Med, 2011
). Two hundred and sixty-eight babies were randomised to receive the visits from the nurse and two hundred and fifty-nine babies were randomised to the control group that received no visits. At one year of age, twenty-two babies in the intervention group and forty babies in the control group were not drinking from a
cup. a.
Display these data in a 2x2 table. b.
Fully analyse these data to assess whether there is a difference in the proportion of babies drinking from a cup between the intervention and control groups. c.
Write a brief summary of your results and conclusions. Question 4 (20 marks)
Vitamin D is produced by the skin when it is exposed to sunlight. It is important for maintaining healthy bones and muscles and particularly important for pregnant women as Vitamin D deficiency is associated with poorer health outcomes for the child. In Australia most people require only a few minutes of sun exposure a day on their face, hands and arms to maintain adequate levels of Vitamin D. However, during winter months in the southern areas of Australia more exposure may be needed. A study was carried out among pregnant women living in northern Victoria, Australia, to determine whether Vitamin D levels in pregnant women were different in summer compared to winter (
Teale, ANZJOG, 2010
). Pregnant women
who attended a clinic for the first time were recruited to the study. The time of their first visit was categorised as either winter or summer. For each woman the level of Vitamin D was measured. You can assume that Vitamin D levels reflect the sun exposure of the previous week. The results are summarised below. Vitamin D (nmol/L)
Time period of first visit to clinic
N
Mean
Standard deviation
Winter
174
57.3
21.4
Summer
156
76.8
28.6
a.
Fully analyse these data and write a brief summary of your results and
conclusions. b.
What assumptions have you made in conducting your analysis?
c.
Briefly describe what methods you could use to assess these assumptions? Question 5 (20 marks) (final exam only – not for online test)
It has been suggested that women working in certain industries may be at a greater risk of developing breast cancer than other women. Identifying high risk occupations may lead to the detection of substances that cause cancer within the
workplace that could be eliminated. A case control study is planned to estimate the association between working in the plastics industry and breast cancer. Researchers want to be able to detect an odds ratio of 2.0 using a two-sided significance test at the 5% level. The researchers have estimated that 5% of controls work in the plastics industry. a.
How many cases and how many controls are needed for this study to achieve 80% power, assuming that an equal number of cases and controls will be included? b.
If less than 5% of controls worked in the plastics industry, would the sample size required to achieve the researchers’ design aims be larger or smaller? Briefly explain your answer. c.
If only 350 cases and 350 controls can be recruited, what would be the
revised approximate power of the study if the researchers still want to detect an odds ratio of 2.0 at the 5% significance level? (Note: Interpolation is not required.) d.
If five controls will be recruited for every case, how many cases and how many controls would be required to achieve the researchers’ design aims? (Note: the researchers’ design aims are a 5% significance level, able to detect an odds ratio of 2.0 with a power of 80%.) e.
The researchers believe that 10% of the cases and controls that are contacted will refuse to take part in the study. How many cases and how many controls should the researchers contact to achieve the sample size in (a), where an equal number of cases and controls will be included? The researchers are also interested in estimating the proportion of women in the general population who have attended breast screening (had a mammogram) in the last two years. The proportion of controls who have attended breast screening will provide the estimate of the proportion in the general population. The researchers will also calculate a 95% confidence interval for this proportion. f.
The researchers expect that 50% of controls will have attended breast screening. Calculate the resulting 95% confidence interval for the proportion of women who have attended breast screening using: i.
the number of controls calculated in (a)
ii.
the number of controls calculated in (d). g.
To determine the proportion in the general population who have attended breast screening, would you recommend that the researchers recruit the number of controls calculated in (a) or in (d)? Briefly explain your answer. 2015
Question 1 (Short Answer, 5 marks) (final exam only – not for online test)
Vitamin D is important for maintaining healthy bones. A study was carried out in Sweden to assess the association between levels of vitamin D in the blood and body mass index among 61 women. (Björk, BMC Family Practice 2013) The researchers analysed the association using simple linear regression. Vitamin D (nmol/L) was the outcome variable and body mass index (kg/m2) was the explanatory variable.
The estimate of the slope from this analysis was -1.19 with a standard error of 0.408 and R2 = 0.13. a.
State the null hypothesis for this study. b.
Calculate a test statistic to test the null hypothesis, obtain the P-value and briefly summarise these results. (Note: calculation of the confidence interval is not required for this question). c.
Calculate the correlation between Vitamin D and body mass index. Question 2 (Short answer, 5 marks)
Which statistical test would you apply to analyse the data from each of the studies described below? a.
Researchers are interested in the association between sun exposure and skin cancer. Cases of skin cancer were identified from the register of all cancers diagnosed in New South Wales. A similar number of controls were randomly selected from all people living in New South Wales. Cases and controls were then classified by whether they had high or low sun exposure. b.
Researchers are interested in whether the use of a smartphone app can reduce weight gain in adolescents. Participants in the study were randomly allocated to either receive a smartphone app or to receive paper based information on nutrition. All participants were measured at the start of the study and 3 months later. The difference between their weight at 3 months and at the beginning of the study was calculated. i.
Assuming that the difference in weight at 3 months and the
beginning of the study is Normally distributed. ii.
Assuming that the difference in weight at 3 months and the
beginning of the study is not Normally distributed. c.
Researchers are interested in whether there is a difference in the prevalence of obesity among adults in two different suburbs of Sydney. Adults from both suburbs were randomly selected and invited
to have their weight and height measured. Body mass index was then calculated and classified as either obese or not obese. d.
Researchers are interested whether the average height of Scottish men is the same as that for Australian men. The average height of Australian men is known to be 176cm. The researchers surveyed a random sample of 1314 Scottish men and measured their heights (in cm). (Assume that height is Normally distributed.) Question 3 (20 marks)
A new drug has been developed to manage epilepsy among patients who have a brain tumour. In a study to assess this new drug each patient recorded whether they had experienced an epileptic seizure in the month before the study began. They were then instructed to take one pill of the new drug each day for one month. At the end of this month they were asked whether they had experienced an epileptic seizure in the month when they were taking the new drug. Thirty-three patients were recruited to the study. Twenty-one patients reported that they had an epileptic seizure in the month when they were not using the new drug. Six patients reported that they had an epileptic seizure in both months. Eight patients reported that they did not have an epileptic seizure in either month. a.
Display the data in a 2x2 table. b.
Fully analyse these results and write a brief conclusion. Question 4 (20 marks)
Alpha-linolenic acid (ALA) is a nutrient that is found in foods such as flaxseed oil, soybean oil and walnuts. It is thought that diets high in ALA may reduce the level of cholesterol in the blood. (Note: high cholesterol is a risk factor for cardiovascular disease.) Researchers conducted a randomised trial to assess the impact of ALA on cholesterol levels. The trial was split up into two periods. In Period 1, participants were randomised to consume 25ml of flaxseed oil (which is high in ALA) each day for twelve weeks or to consume 25ml of corn oil (which is low in ALA) each day for twelve weeks. At the end of Period 1 each participant’s cholesterol was measured. Participants then returned to their usual diet for 8 weeks to allow their cholesterol levels to return to their usual level. In Period 2, participants who consumed flaxseed oil in Period 1 now consumed corn oil for twelve weeks, and those who consumed corn oil in Period 1 now consumed flaxseed oil. At the end of Period 2 each participant’s cholesterol level was again measured. The results from the 13 participants are given below Participant
Period 1 Cholesterol level at end Period 2 Cholesterol level at end of number
oil
of Period 1 (mg/dl) oil
Period 2 (mg/dl) 1
Corn
224
Flaxseed
199
2
Corn
172
Flaxseed
219
3
Corn
210
Flaxseed
181
4
Corn
235
Flaxseed
176
5
Corn
206
Flaxseed
185
6
Corn
213
Flaxseed
198
7
Corn
193
Flaxseed
235
8
Flaxseed
211
Corn
222
9
Flaxseed
228
Corn
215
10
Flaxseed
210
Corn
254
11
Flaxseed
212
Corn
207
12
Flaxseed
246
Corn
216
13
Flaxseed
200
Corn
255
a.
Fully analyse these results and write a brief conclusion. (Note: assume that the assumption of Normality holds). b.
If it was not reasonable to assume Normality what would be the most appropriate statistical test to use to analyse these data? c.
Name one plot that could be used to assess whether the Normality assumption was reasonable? Question 5 (20 marks) (final exam only – not for online test)
Recent research has suggested that increased amounts of time spent sitting each day may be related to poorer health. A two part study is being planned to investigate this in more detail. Part 1: Researchers aim to recruit participants to the study from the staff members of a local university. Each participant will be asked to keep a diary noting how much time they spend sitting each day. At the end of the survey the time spent sitting will be calculated for each participant. From previous studies, the researchers believe that the standard deviation of the time spent sitting is 2 hours per day. a.
How many participants would the researchers need to recruit if they wanted to estimate the mean time spent sitting per day with a 95% confidence interval no wider than ±0.25 hours per day. Part 2: The researchers also intend to carry out a randomised controlled trial among those people who sit for 4 hours or more per day. They expect 50% of the Part 1 study participants to sit for 4 hours or more per day. These participants will
be invited to take part in the randomised controlled trial. Participants who consent will be randomised to either have their desk modified to allow them to stand while working (GROUP A), or no modifications to their desk (GROUP B). The researchers believe that a mean difference of 0.75 hours per day sitting would be an important difference to detect between the two groups. Note: from previous studies the standard deviation of time spent sitting is 2 hours per day. b.
How many people are required for this randomised controlled trial assuming 80% power, a significance level of 5% and equal numbers are randomised to Group A and Group B? c.
How many people should be recruited for the Part 1 study to achieve the sample size calculated in (b) if 80% of those who sit for 4 or more hours per day consent to take part in the randomised controlled trial? d.
The researchers have resources to modify 60 desks in the randomised controlled trial. What would the approximate power of the study be if there were 60 modified desks and assuming a 5% significance level, the difference to be detected was 0.75 hours per day and equal numbers were randomised to Group A and Group B? (Note: interpolation is not required.)
e.
Briefly comment on the power calculated in (d). f.
What difference could be detected if there were 60 modified desks and assuming 80% power, 5% significance level, and 3 times as many participants are randomised to Group B as to Group A? g.
One of the researchers has suggested that people who work in the same office should be randomised to the same Group so that they all either do, or do not, have modified desks. Briefly state what impact this would have on the sample size required in (b) and provide a brief justification for your answer. 2016
Question 1 (Short Answer, 5 marks)
A group of 20 men participated in a study to investigate the effect of an intervention on their systolic blood pressure (SBP). For each participant, their blood pressure was measured before the intervention and again after the intervention. (All data were checked and are correct.) The SPSS output below shows the distribution of the differences in SBP (after – before).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The SPSS outputs below show the results of two analyses (Method A and Method
B) used to investigate whether there is evidence that the intervention affected SBP. Note: Only refer to the SPSS outputs to answer this question. No further calculations are required. a.
Which of the two statistical methods (A or B) applied to these data do you think is more appropriate? Give the name of this test. b.
Very briefly state why you think that the method you chose is the more appropriate one. c.
Are the results of the two methods consistent in terms of the interpretation of their p-values? Method A: Paired Samples Test
Paired differences
95% Confidence Interval of the Difference
Mean
Std. Deviation
Std. Error Mean
Lower
Upper
T
Df
Sig. (2-
tailed)
Pair
1
After - before
-4.15000
9.71312
2.17192
-8.69588
.39588
-1.911
19
.071
Method B:
Ranks
N
Mean Rank
Sum of Ranks
After – before
Negative Ranks
11
a
12.95
142.50
Positive Ranks
9
b
7.50
67.50
Ties
0
c
Total
20
a.
After < before
b.
After > before
c.
After = before
Test statistics
a
After – before
Z
-1.40
b
Asymp. Sig (2-tailed)
.160
a.
Wilcoxon Signed Ranks Test
b.
Based on negative ranks Question 2 (Short Answer, 5 marks)
The association between polycystic ovary syndrome (PCOS) and sleep quality was
investigated in a community based, cross-sectional sample of women (Moran et al, Human Reproduction, 2015). Women born during 1973-1975 at a large maternity hospital in Adelaide (South Australia) were eligible to participate in the study. Participants were assessed for PCOS and also classified according to the outcome of interest, sleep quality (a “good sleeper” or “not a good sleeper”). The SPSS output below summarises the results for the 724 women for whom complete data were available.
PCOS * good_sleeper crosstabulation
Good_sleeper
Total
Yes
No
PCOS
Yes
Count
41
46
329
% within PCOS
47.1%
52.9%
100.00%
No
Count
334
303
637
% within PCOS
52.4%
47.6%
100.00%
Total
Count
375
349
724
% within PCOS
51.8%
48.2%
100.00%
Risk Estimate
95% confidence interval
Value
Lower
Upper
Odds Ratio for PCOS (Yes/No)
.809
.516
1.267
For cohort good_sleeper = Yes
.899
.711
1.136
For cohort good_sleeper = No
1.112
.897
1.377
N of valid cases
724
Based only on the SPSS output above (no further calculations required): a.
Which point estimate do you think is most appropriate to report? State very briefly why you chose this. b.
Provide a very brief interpretation of this estimate and its 95% confidence interval. Question 3 (20 marks)
Cervical cancer is a preventable disease caused by sexual transmission of certain genotypes of the human papillomavirus (HPV) infection. Boggan et al (
Sexually Transmitted Diseases
, 2015) report the results of a study to assess the feasibility of women collecting a vaginal sample (self-obtained) for HPV as a primary cervical cancer screening tool in a low resource, Haitian population. For each woman who participated in the study, a vaginal sample (which was self- obtained) and a cervical sample (which required a clinician) were collected. Both samples were then sent to a laboratory and tested for the presence of relevant HPV genotypes. A positive test result indicated that at least one of the relevant HPV genotypes was found. A negative test result indicated that none of the relevant HPV genotypes were found. Of the 1836 women who had both a vaginal sample and a cervical sample taken, both samples were HPV positive for 288 women. The vaginal sample was HPV positive and the cervical sample was HPV negative for 105 women. The cervical sample was HPV positive and the vaginal sample was HPV negative for 53 women. The remaining women were HPV negative for both samples. a.
Display the data in a 2x2 table. b.
Fully analyse these data and write a brief conclusion. c.
All women who had a positive HPV test were further investigated for the presence of cancer and a biopsy was taken. Of the women who had a positive vaginal sample, 9.7% were found to have cervical cancer. How many cases of cancer were detected based on a positive vaginal sample? Question 4 (20 marks)
Mant et al (BMJ, 2016) conducted a randomised controlled trial to assess whether using intensive systolic blood pressure targets leads to lower systolic blood pressure in a community population of people with a history of stroke or transient ischaemic attack. All participants were recruited from general practices in England. Patients were individually randomised to one of two groups, either: i.
an intensive systolic blood pressure target (<130 mm Hg); or ii.
the usual systolic blood pressure target (<140 mm Hg). Apart from the different target, patients in both arms of the study were actively managed in the same way with regular reviews by the general practice care team. The primary outcome for the study was the change in systolic blood pressure between baseline and 12 months. A total of 379 participants (182 in the intensive
arm and 197 in the standard arm) had their systolic blood pressure measured at baseline and also 12 months later. The data are summarised in the table below: Table 4 – Systolic Blood Pressure (mm Hg) for intensive target and standard target groups
Intensive Target Group (n = 82)
Standard Target Group (n = 197)
Mean
Standard Deviation
Mean
Standard Deviation
Baseline
143.5
13.5
142.2
12.9
12 months
127.4
14.8
129.4
14.8
Change in systolic blood pressure
16.1
15.0
12.8
17.2
a.
Fully analyse the results for the primary outcome to assess the effect of the intervention on systolic blood pressure. Write a brief conclusion
to summarise and interpret your results.
b.
Very briefly state two statistical assumptions you have made in your analysis.
Question 5 (20 marks) (final exam only – not for online test)
A randomised controlled trial is planned to assess the effect of a diet-and-
exercise program on the prevention of osteoarthritis of the knee in overweight women aged 50- 60 years. Women in the required age range with a body mass index (BMI) greater than 27 kg/m2 who do not have osteoarthritis of the knee or other knee complaints are eligible for inclusion in the study. Women will be recruited through general practitioners. The primary outcome of interest is the proportion of women who develop osteoarthritis of the knee during the 2.5 years
after entry to the study. Women who agree to participate in the study will be randomised to one of two groups:
Intervention Group: A dietician will develop a “tailor-made” nutritional and exercise program for each woman using motivational interviewing techniques. Women in this group will also be invited to participate in a small group low-
impact exercise program supervised by a therapist for 20 weeks. Control Group: No intervention will be offered to women in this group. a.
It is expected that 20% of women in the control group will develop osteoarthritis of the knee during the 2.5 year follow-up period. Assuming equal numbers in the control and intervention groups, how many women would be required per group to detect a reduction of 10% in the proportion who develop osteoarthritis of the knee in the intervention group, with 80% power and using a two-sided significance level of 5%? b.
Approximately 15% of women are expected to be lost to follow-up during the study period. How many women should be randomised to each group to achieve the number of women specified in (a) at the end of the follow-up period? c.
Because of the high cost of the intervention, the researchers decide to
have twice as many women in the control group relative to the intervention group. How many women would be required for each group assuming the same effect size, power, and significance level as specified in (a)? (
Do not adjust for the expected loss to follow-up.
) d.
If the cost of the intervention is $1200 per woman, how much would be saved in intervention costs if the numbers per arm calculated in (c) are used rather than the numbers calculated in (a)? e.
A secondary outcome for this study is BMI at the end of 2.5 years. Based on the numbers per group calculated in (a), is there adequate power to detect a difference of 2 kg/m2 between the two groups using a two-sided significance level of 5%? (The standard deviation of BMI is expected to be 5 kg/m2.) 2017
Question 1 (Short Answer, 5 marks)
Which statistical test would you apply to analyse the data from each of the studies described below? a.
In Korea, a study was conducted to estimate the difference in severity of depression among people who experienced night-eating syndrome (NES) compared to those who did not experience NES (Kim, Public Health 2016). Participants were asked five questions about NES. If a participant answered ‘yes’, to all five questions, they were considered to experience NES. Depressive symptoms, were measured using the Patient Health Questionnaire (PHQ-9), a self-rated measurement of the severity of depressive symptoms with total scores ranging from 0 to 27. (You can assume that the depressive symptoms score is Normally distributed).
b.
Researchers in China assessed the association between the level of computer skills among primary healthcare workers and their attitudes towards web-based training on basic public health services (BPHS) (Zhan, Public Health 2016). Healthcare workers were asked about their level of computer knowledge and then researchers classified each worker as either being skilled or unskilled. Healthcare workers were also asked about their attitudes towards web-based BPHS training. Each worker’s overall attitude to web based BPHS training was then classified as either positive or negative.
c.
A randomised controlled trial was conducted on men who were newly diagnosed with prostate cancer and who had chosen surgery as their treatment. Participants (n=23) were randomly assigned to receive either robot-assisted laparoscopic prostatectomy, or radical retropubic prostatectomy. Urinary function scores, which can take any
value between 0 and 100, were compared between the two treatment
groups. (Assume that the urinary function scores are not Normally distributed)
d.
A case-control study was carried out to assess the association between sugar intake and colon cancer. Cases of colon cancer were identified from a cancer registry and controls were randomly selected from the same population from which the cases arose. Participants in the study completed a dietary questionnaire and then each participant was classified as having a high or low sugar intake. e.
Researchers assessed the impact of a new smartphone based information resource on contraception knowledge. They recruited 75 young adults to the study and each participant completed a questionnaire assessing their knowledge about contraception. A contraception knowledge score was then calculated for each participant (% of questions answered correctly in the questionnaire). After completing the questionnaire each participant was given access to the smartphone based information resource. One month later they again completed the same questionnaire to assess their knowledge of contraception and a second knowledge score was calculated. (Assume
that the contraceptive knowledge scores are Normally distributed)
Question 2 (Short Answer, 5 marks) (final exam only – not for online test)
Researchers in Canada carried out a study to examine the factors associated with cognitive decline among people with Alzheimer’s disease (Hager, American Journal of Alzheimer’s Disease & Other Dementias 2016). Each participant (n=82) had their cognitive ability measured at baseline and after one year of follow-up using the Mini-Mental State Examination (MMSE). The difference in MMSE score (baseline – follow-up) was the calculated and used as the outcome variable in a linear regression analysis. One of the factors that the researchers investigated was the quality of life of participants at baseline. Quality of life was measured using a continuous scale with higher values indicating better quality of life. The quality of life score was the explanatory variable in the linear regression analysis. The researchers obtained the following results
Table 1 – Simple Linear Regression Analysis
Variable
Estimate of slope
Standard error
Quality of life
0.137
0.099
a.
What type of plot would you use to show the relationship between quality of life at baseline and change in MMSE score?
b.
Calculate a 95% confidence interval for the slope and interpret the results
Question 3 (20 marks)
Researchers conducted a randomised cross-over trial to determine whether the type of contraception used had an impact on the frequency of migraine (severe headache) attacks. The researchers recruited 28 women who had experienced at least one migraine in the previous three months. The women were randomised to one of two groups – Group A: took the combined hormonal contraceptive (CHC) pill for 6 months and then switched to the progestin only pill (POP) for 6 months Group B: took the progestin only pill (POP) for 6 months and then switched to the
combined hormonal contraceptive (CHC) pill for 6 months The number of days with migraine per month was recorded for each woman while on each treatment. Also, for each woman the difference in the number of days with migraine per month while on CHC and while on POP (CHC - POP) was calculated. Data from the study are summarised in the table below.
Group A (n = 14)
Group B (n = 14)
All women (n = 28)
Mean
Standard deviation
Mean
Standard deviation
Mean
Standard deviation
While taking CHC
4.8
1.7
4.4
1.7
4.6
1.7
While taking POP
4.0
1.3
3.7
1.2
3.9
1.3
Difference (CHC – POP)
0.8
1.1
0.7
1.2
0.7
1.1
a.
Fully analyse these results and write a brief conclusion. (Note: assume that the assumption of Normality holds)
b.
To assess the Normality assumption for the current study, which set of
measurements would you plot and what type of plot would you use?
Question 4 (20 marks)
A study was carried out at eight health facilities in Ethiopia to assess whether the treatment success of tuberculosis (TB) differed between patients who did or did not have HIV. The study included 529 TB patients of whom 360 did not have HIV and 169 did have HIV. Among participants who did not have HIV 337 were successfully treated for TB. There were 145 participants with HIV whose treatment for TB was successful.
a.
Display these data in a 2x2 table
b.
Fully analyse these data to assess whether there is a difference in the proportion of treatment successes between participants who had HIV and those who did not have HIV.
c.
Write a brief summary of your results and conclusions.
Question 5 (20 marks) (final exam only – not for online test)
Researchers are planning a study to investigate polypharmacy (currently taking 5 or more medications) among older people. Polypharmacy among older people may impact upon quality of life due to adverse drug interactions and increasing the risk of hospital admission. The researchers have planned two phases of the study. In the first phase they will conduct a survey of people aged 75 or older living in Sydney to determine the proportion of older people who are currently taking 5 or
more medications. Survey participants will be randomly selected.
a.
The researchers expect that 60% of participants will be taking 5 or more medications. How many participants would the researchers need to include to obtain a 95% confidence interval for the proportion
of participants taking 5 or more medications that is not wider than ± 5%?
b.
The researchers decided to base their sample size calculation on an expected proportion of 50% of participants currently taking 5 or more medications. Provide one reason why they have done this. In the second phase the researchers are planning a randomised controlled trial. This trial will assess the effectiveness of pharmacist visits in reducing the number of older people who are currently taking 5 or more medications. In the trial, participants aged 75 or older who are currently taking 5 or more medications will be randomised to either (A) receive a visit from a pharmacist to discuss their medications or (B) receive a printed leaflet. Participants will be surveyed 3 months later to assess the number of medications they are currently taking. The researchers expect that among those who receive a visit from a pharmacist the proportion of participants who are still taking 5 or more medications will be 80%, and among those who receive only the printed leaflet the proportion who are still
taking 5 or more medications will be 95%.
c.
How many participants would be required for this randomised controlled trial to detect this difference with 80% power and a 5% two-sided significance level?
d.
The researchers believe that 80% of participants who begin the randomised controlled trial will complete the follow-up survey. How many people will the researchers need to recruit to achieve the statistical design aims of (c)?
e.
Would there be enough eligible participants from the survey described in (a) to meet the sample size required for (d)? Explain your answer. Each participant in the randomised controlled trial will also have their Drug Burden Index (DBI) calculated. The DBI is a continuous measurement and measures a participant’s exposure to particular medications.
f.
How many participants in the randomised controlled trial would be required to detect a mean difference of 0.25 in the DBI assuming the standard deviation of the DBI is 0.6 and that 80% power and 5% significance level are required?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning

Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
- College Algebra (MindTap Course List)AlgebraISBN:9781305652231Author:R. David Gustafson, Jeff HughesPublisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
