1.The data in the table are a sample from a larger data set collected on people discharged from a selected Pennsylvania hospital as part of a retrospective chart review of antibiotic usage in hospitals.
Hospitalstay data
ID number Duration of hospital stay Received Antibiotics?
1 5 no
2 10 no
3 6 no
4 11 no
5 5 no
6 14 yes
7 30 yes
8 11 no
9 17 no
10 3 no
11 9 no
12 3 no
13 8 yes
14 8 yes
15 5 no
16 5 no
17 7 yes
18 4 no
19 3 yes
20 7 no
21 9 no
22 11 yes
23 11 no
24 9 no
25 4 no
Compute the mean and median for the duration of hospitalization for the 25 patients.
2. Using the table from Problem 1, compute the standard deviation and range for the duration of hospitalization for the 25 patients.
3. Select all types of measures of location:
Mode
Median
Standard Deviation
Arithmetic Mean
Variance
4. Suppose 6 of 15 students in a gradeschool class develop influenza, whereas 20% of gradeschool students nationwide a develop influenza. Is there evidence of an excessive number of cases in the class? That is, what is the probability of obtaining at least 6 cases in this class if the nationwide rate holds true?
5. Using the data from Problem 4, what is the expected number of students in the class who will develop influenza?
6. Two events A and B are ___________________________ if they cannot both happen at the same time. (Hint: the answer is two words).
7. Serum cholesterol is an important risk factor for coronary disease. We can show that serum cholesterol is approximately normally distributed, with mean= 219 mg/dL and standard deviation =50 mg/dL.
What proportion of the general population has borderline highcholesterol levels–that is, > 200 but < 250 mg/dL?
0.380
0.648
0.7324
0.620
8. Much discussion has taken place concerning possible health hazards from exposure to anesthetic gases. In one study conducted in 1972, 525 Michigan nurse anesthetists were surveyed by mail questionnaires and telephone interviews to determine the incidence rate of cancer. Of this group, 7 women reported having a new malignancy other than skin cancer during 1971.
What is the best estimate of the 1971 incidence rate from these data?
9. Using the data from Problem 8, provide a
95%
confidence interval for the true incidence rate.
10. Choose the correct word for the following sentence: If the results of a previous trial does not affect the result of the following trial, then the results are __________________ (dependent or independent).
11. Approximately ___% of the probability mass falls within two standard deviations (2 ) of the mean of a random variable.
95%
50%
68%
99.7%
12. Irondeficiency anemia is an important nutritional health problem in the United States. A dietary assessment was performed on 51 boys 9 to 11 years of age whose families were below the poverty level. The mean daily iron intake among these boys was found to be 12.50 mg with a standard deviation 4.75 mg. Suppose the mean daily iron intake among a large population of 9 to 11yearold boys from all income strata is 14.44 mg. We want to test whether the mean iron intake among the lowincome group is different from that of the general population.
State the hypotheses that we can use to consider this question. Perform the hypothesis test using the criticalvalue method with an alpha of 0.05, and summarize your findings. Report the pvalue for this test (may give a range that includes the pvalue)
13. A clinical trial is called _____________________ if neither the physician nor the patient knows what treatment he or she is getting. (Hint: the blank is two words)
14. In a study, 28 adults with mild periodontal disease are assessed before and 6 months after implementation of a dentaleducation program intended to promote better oral hygiene. After 6 months, periodontal status improved in 15 patients, declined in 8, and remained the same in 5.
Assess the impact of the program statistically (use a twosided test).
15. Two drugs (A, B) are compared for the medical treatment of duodenal ulcer. For this purpose, patients are carefully matched with regard to age, gender, and clinical condition. The treatment results based on 200 matched pairs show that for 89 matched pairs both treatments are effective; for 90 matched pairs both treatments are ineffective; for 5 matched pairs drug A is effective, whereas drug B is ineffective; and for 16 matched pairs drug B is effective, whereas drug A is ineffective.
What test procedure can be used to assess the results?
· McNemar’s test for correlated proportions
· Fisher’s Exact Test
· The Paired t Test
· The Sign Test
16. The standard screening test for Down’s syndrome is based on a combination of maternal age and the level of serum alphafetoprotein. Using this test 80% of Down’s syndrome cases can be identified, while 5% of normals are detected as positive.
What is the sensitivity and specificity of the test?
17. Lutein, an important carotenoid in the maintenance of ocular health, has been found postmortem in th macula of eyes. Hence, a study is planned to supplement patients with high doses of lutein in capsule form to possibly prevent agerelated macular degeneration, an important eye disease that can cause partial or total blindness in large numbers of elderly people.
To assess compliance in study participants, a blood sample will be drawn. It is estimated that a serum lutein would indicate that a participant is taking study medication.
The study began in 1999. A test sample of 9 participants had their lutein level measured in 1999 and again in 2003. The researchers found a calibration error in the 1999 assays, but the 2003 assays were correct. The data are shown in the table below.
Serumlutein data analyzed in 1999 and 2003
Sample 1999 Serumlutein level
2003 Serumlutein level
1 3.5 6.4
2 2.9 7.5
3 4.1 8.4
4 5.1 9.6
5 6.4 12.0
6 1.9 4.2
7 1.3 3.1
8 4.1 6.3
9 2.3 4.4
Mean 3.511 6.878
sd 1.616 2.839
Using regression methods, derive a calibration formula predicting the 2003 level as a function of the 199 level.
y=0.999+1.674x
y=0.999+1.674x
y=0.9991.674x
y=0.999x+1.674
18. The probability of a _________________ (Type 1 error/Type 2 error) is the probability of accepting the null hypothesis when H_1 is true.
19. In a ________________ study the same group of people is followed over time.
longitudinal
crosssectional
paired
double blind
20. A recent article by Kenfield et al. studied the relationship between various aspects of smoking and mortality among 104,519 women in the Nurses’ Health Study (NHS) from 19802004. One issue is whether there is a mortality benefit from quitting smoking vs. continuing to smoke and, if so, how long it takes for the mortality experience of former smokers to approximate that to never smokers. The data in the table below were presented comparing former smokers with current smokers.
Relationship of time since quitting to total mortality
Number of deaths 
Number of personyears of followup 

Current smokers 
3,602 
420,761 
Former smokers 

Quit <5 yrs 
889 
124,095 
Quit 59 yrs 
669 
113,056 
Quit 1014 yrs 
590 
111,701 
Quit 1519 yrs 
541 
117,914 
Quit 20+ yrs 
1,707 
336,177 
What is the estimated mortality rate and 95% confidence interval per 1000 personyears among current smokers?
Chapter
1
4 Homework
Hypothesis Testing: PersonTime Data
1. The data relating oralcontraceptive (OC) use and the incidence of breast cancer in the age group 4044
in the NHS are given in the table below.
Relationship between breast cancer incidence and OC use among 40 to 44yearold women in the
NHS
OCuse group Number of cases Number of personyears
Current users 19 4,761
Past users 164 1
2
1,091
Never users 113 98,091
(a) Compare the incidence density of breast cancer in current users vs. never users, and report a
pvalue.
(b) Compare the incidence density of breast cancer in past users vs. never users, and report a pvalue.
(c) Estimate the rate ratio comparing current users vs. never users, and provide a 95% confidence
interval about this estimate.
(d) Estimate the rate ratio comparing past users vs. never users, and provide a 95% confidence
interval.
(e) How much power did the study have detecting an IRR for breast cancer of 1.5, comparing current
OC users vs. never OC users among 40 to 44yearold women if
i. the true incidence rate of breast cancer among never users and the amount of persontime for
current and never users are the same in the table above,
ii. the expected number of events for never OC users is the same as the observed number of
events in the table above,
iii. the average followup time per subject is the same for both current and never OC users?
(f) What is the expected number of events that need to be realized n each group to achieve 80%
power to detect an IRR for breast cancer of 1.5 for current OC users vs never OC users under
teh same assumptions as in part (e)?
1
2. Suppose we wish to study the association between aspirin intake and the incidence of colon cancer.
We find that 10% of women take 7 aspirin tablets per week (ASA group), while 50% of women never
take aspirin (control group). The ASA group is followed for 50,000 personyears, during which 34 new
colon cancers occurred over a 20year period. The control group is followed for 250,000 personyears,
during which 251 new colon cancers developed over a 20year period.
(a) What are the estimated incidence rates in the ASA and control groups?
(b) Is there a significant difference between these incidence rates? Report a pvalue (twotailed).
(c) What is the estimated rate ratio for colon cancer between the ASA and placebo groups?
(d) Provide a 95% confidence interval for the rate ratio in part (c).
(e) Suppose we look at the subset of women with a family history of colon cancer. Aspirin might
be more beneficial in this highrisk subgroup. We have a total of 5000 personyears among ASA
women and 2 events. We have a total of 20,000 personyears among control women and 20 events.
Is there a significant difference in the incidence rates of colon cancer between these 2 groups?
Provide a pvalue (twotailed).
2
Chapter
1
3 Homework
Design and Analysis Techniques for Epidemiologic Studies
1. In a 1985 study of the relationship between contraceptive use and infertility, 89 of
2
83 infertile women,
compared with 640 of 3833 control (fertile) women, had used an intrauterine device (IUD) at some
time in their lives.
(a) Use the normaltheory method to test for significant differences in contraceptiveuse patterns
between the two groups.
(b) Use the contingencytable method to perform the test in part (a).
(c) Compare your results from part (a) and (b).
(d) Compute a 95% confidence interval for the difference in the proportion of women who have ever
used IUDs between the infertile and fertile women in part (a).
(e) Compute the OR in favor of ever using an IUD for infertile women vs. fertile women.
(f) Provide a 95% confidence interval for the true OR corresponding to your answer in part (e).
(g) What is the relationship between your answers to part (b) and (f).
1
2. The data in the table below were presented relating body mass index (BMI) to progression of advanced
agerelated macular degeneration (AMD), a common eye disease in the elderly that results in significant
visual loss.
Association between BMI and progression of AMD
BMI Progression Nonprogression
< 25 72 423
≥ 25 209 762
(a) What is the attributable risk (AR) of high BMI (≥ 25) for progression of AMD?
(b) Provide a 95% confidence interval about this estimate.
2
Chapter
1
2
Homework
Multisample Inference
1. Some common strategies for treating hypertensive patients by nonpharmacologic methods include (1)
weight reduction and (2) trying to get the patient to relax more by mediational or other techniques.
Suppose these strategies are evaluated by randomizing hypertensive patients to four group who receive
the following types of nonpharmacologic therapy: Group 1 : Patients receive counseling for both weight
reduction and meditation.
Group 2 : Patients receive counseling for weight reduction but not for meditation.
Group
3
: Patients receive counseling for meditation but not for weight reduction.
Group 4 : Patients receive no counseling at all.
Suppose 20 hypertensive patients are assigned at random to each of the four groups, and the change in
diastolic blood pressure (DBP) is noted in these patients after 1month period. The results are given
in the table below.
Change in DBP among hypertensive patients who receive different kinds of nonpharmacologic therapy
Group Mean change in DBP (baselinefollow up) (mmHg) sd change n
1 8.6 6.2 20
2 5.3 5.4 20
3 4.9 7.0 20
4 1.1 6.5 20
(a) Test the hypothesis that mean change in DBP is the same among the four groups.
(b) Analyze whether counseling for weight reduction has a significant effect on reducing blood pres
sure.
(c) Analyze whether meditation instruction has a significant effect on reducing blood pressure.
(d) Is there any relationship between the effects of weightreduction counseling and meditation coun
seling on bloodpressure reduction? That is, does weightreduction counseling work better for
people who receive meditational counseling or for people who do not receive meditation counsel
ing, or is there no difference in effect between these two subgroups?
1
2. A randomized trial examined the effects of lipidmodifying therapy (simvastatin plus niacin) and an
tioxidants (vitamins E and C, betacarotene, and selenium) on cardiovascular protection in patients
with clinical coronary disease, low HDL cholesterol, and normal LDL cholesterol. A total of 160
patients were randomized into four groups: placebo lipidlowering and placebo antioxidants, active
lipidlowering and placebo antioxidants, placebo lipidlowering and active antioxidants, or active lipid
lowering and active antioxidants.
All participants had substantial stenoses (blockages) of the coronary arteries quantified by catheteriza
tion at baseline, and the primary endpoint was the percent change in a person’s stenoses after 3 years
of treatment, with a positive change indicating an increased amount of stenosis, as shown in the table
below. Because some patients did not complete the study, the primary endpoint was assessed in 146
participants.
Mean changes (±sd), per patient, in the percentage of stenosis by treatment group
Placebo Simvastainniacin Antioxidants Simvastatinniacin
(n = 34) (n = 33) (n = 39) plus antioxidants (n = 40)
Mean change in stenosis 3.9(±5.2) −0.4(±2.8) 1.8(±4.2) 0.7(±3.2)
(% of diameter)
Source: Based on The New England Journal of Medicine, 312(6), 329334, 1985
(a) Perform a oneway ANOVA to assess whether there are significant differences in mean change in
percent stenosis among the four groups.
(b) Using the LSD method, identify which pairs of groups are significantly different.
(c) Are there significant interaction effects between simvastatinniacin and antioxidants? What does
an interaction effect mean in the context of this trial? (Hint: Use the linear contrast y, y4 − y2 −
y3 + y1, where the group numbers are in the same order as in the table above.
2
3. Suppose we have separately analyzed the effects of 10 SNPs comparing people with type I diabetes vs.
controls. The pvalues from these separate analyses are given in the table below.
Effects of 10 SNPs on type I diabetes
SNP pvalue SNP pvalue
1 .04 6 .62
2 .10 7 .001
3 .40 8 .01
4 .55 9 .80
5 .34 10 .005
(a) Use the Bonferroni method to correct for multiple comparisons. Which SNPs show statistically
significant effects?
(b) Use the FDR method to correct for multiple comparisons using an FDR=.05. Which SNPs show
statistically significant effects? How do the results compare with those in part (a)?
3
Compute the mean and median for the duration of hospitalization for the 25 patients.
2. Using the table from Problem 1, compute the standard deviation and range for the duration of hospitalization for the 25 patients.
3. Select all types of measures of location:
Mode
Median
Standard Deviation
Arithmetic Mean
Variance
4. Suppose 6 of 15 students in a gradeschool class develop influenza, whereas 20% of gradeschool students nationwide a develop influenza. Is there evidence of an excessive number of cases in the class? That is, what is the probability of obtaining at least 6 cases in this class if the nationwide rate holds true?
5. Using the data from Problem 4, what is the expected number of students in the class who will develop influenza?
6.Two events A and B are ___________________________ if they cannot both happen at the same time. (Hint: the answer is two words).
6. Serum cholesterol is an important risk factor for coronary disease. We can show that serum cholesterol is approximately normally distributed, with mean= 219 mg/dL and standard deviation =50 mg/dL.
7. What proportion of the general population has borderline highcholesterol levels–that is, > 200 but < 250 mg/dL?
0.380
0.648
0.7324
0.620
8. Much discussion has taken place concerning possible health hazards from exposure to anesthetic gases. In one study conducted in 1972, 525 Michigan nurse anesthetists were surveyed by mail questionnaires and telephone interviews to determine the incidence rate of cancer. Of this group, 7 women reported having a new malignancy other than skin cancer during 1971.
What is the best estimate of the 1971 incidence rate from these data?
9. Using the data from Problem 8, provide a 95% confidence interval for the true incidence rate.
10. Choose the correct word for the following sentence: If the results of a previous trial does not affect the result of the following trial, then the results are __________________ (dependent/independent).
Independent or Dependent
11. Approximately ___% of the probability mass falls within two standard deviations (2 ) of the mean of a random variable.
95%
50%
68%
99.7
12. Irondeficiency anemia is an important nutritional health problem in the United States. A dietary assessment was performed on 51 boys 9 to 11 years of age whose families were below the poverty level. The mean daily iron intake among these boys was found to be 12.50 mg with standard deviation 4.75 mg. Suppose the mean daily iron intake among a large population of 9 to 11 year old boys from all income strata is 14.44 mg. We want to test whether the mean iron intake among the lowincome group is different from that of the general population.
State the hypotheses that we can use to consider this question. Perform the hypothesis test using the criticalvalue method with an alpha of 0.05, and summarize your findings. Report the pvalue for this test (may give a range that includes the pvalue)
13. A clinical trial is called _____________________ if neither the physician nor the patient knows what treatment he or she is getting. (Hint: the blank is two words)
14. In a study, 28 adults with mild periodontal disease are assessed before and 6 months after implementation of a dentaleducation program intended to promote better oral hygiene. After 6 months, periodontal status improved in 15 patients, declined in 8, and remained the same in 5.
Assess the impact of the program statistically (use a twosided test)
15. Two drugs (A, B) are compared for the medical treatment of duodenal ulcer. For this purpose, patients are carefully matched with regard to age, gender, and clinical condition. The treatment results based on 200 matched pairs show that for 89 matched pairs both treatments are effective; for 90 matched pairs both treatments are ineffective; for 5 matched pairs drug A is effective, whereas drug B is ineffective; and for 16 matched pairs drug B is effective, whereas drug A is ineffective.
What test procedure can be used to assess the results?
McNemar’s test for correlated proportions
Fisher’s exact Test
The Paired t Test
The Sign Test
16. The standard screening test for Down’s syndrome is based on a combination of maternal age and the level of serum alphafetoprotein. Using this test 80% of Down’s syndrome cases can be identified, while 5% of normals are detected as positive.
What is the sensitivity and specificity of the test?
17. Lutein, an important carotenoid in the maintenance of ocular health, has been found postmortem in th macula of eyes. Hence, a study is planned to supplement patients with high doses of lutein in capsule form to possibly prevent agerelated macular degeneration, an important eye disease that can cause partial or total blindness in large numbers of elderly people.
To assess compliance in study participants, a blood sample will be drawn. It is estimated that a serum lutein would indicate that a participant is taking study medication.
The study began in 1999. A test sample of 9 participants had their lutein level measured in 1999 and again in 2003. The researchers found a calibration error in the 1999 assays, but the 2003 assays were correct. The data are shown in the table below.
Serumlutein data analyzed in 1999 and 2003
6.4
4.1
Sample 
1999 Serumlutein level 
2003 Serumlutein level 

1 
3.5 
6.4 

2 
2.9 
7.5 

3 
4.1 
8.4 

4 
5.1 
9.6 

5 
12.0 

6 
1.9 
4.2 

7 
1.3 
3.1 

8 
6.3 

9 
2.3 
4.4 

Mean 
3.511 
6.878 

sd 
1.616 
2.839 
Using regression methods, derive a calibration formula predicting the 2003 level as a function of the 199 level.
y=0.999+1.674x
y=0.999+1.674x
y=0.9991.674x
y=0.999x+1.674
18. The probability of a _________________ (Type 1 error/Type 2 error) is the probability of accepting the null hypothesis when H_1 is true.
19. In a ________________ study the same group of people is followed over time.
Longitudinal
Crosssectional
Paired
Double blind
20. A recent article by Kenfield et al. studied the relationship between various aspects of smoking and mortality among 104,519 women in the Nurses’ Health Study (NHS) from 19802004. One issue is whether there is a mortality benefit from quitting smoking vs. continuing to smoke and, if so, how long it takes for the mortality experience of former smokers to approximate that to never smokers. The data in the table below were presented comparing former smokers with current smokers.
Relationship of time since quitting to total mortality
Number of deaths 
Number of personyears of followup 

Current smokers 
3,602 
420,761 
Former smokers 

Quit <5 yrs 
889 
124,095 
Quit 59 yrs 
669 
113,056 
Quit 1014 yrs 
590 
111,701 
Quit 1519 yrs 
541 
117,914 
Quit 20+ yrs 
1,707 
336,177 
What is the estimated mortality rate and 95% confidence interval per 1000 personyears among current smokers?
1. In a study, 28 adults with mild periodontal disease are assessed before and 6 months after the implementation of a dentaleducation program intended to promote better oral hygiene. After 6 months, periodontal status improved in 15 patients, declined in 8, and remained the same in 5.
Choose one answer:
Assess the impact of the program statistically (use a twosided test).
We do not reject H_0 at the 5% level and conclude that patients have not significantly changed on the program.
We reject H_0 at the 5% level and conclude that patients have significantly changed on the program.
We reject H_0 at the 5% level and conclude that patients have not significantly changed on the program.
We do not reject H_0 at the 5% level and conclude that patients have significantly changed on the program.
2. data can be ordered but do not have specific numeric values. Thus, common arithmetic cannot be performed on
ordinal
data in a meaningful way.
Choose one answer:
ordinal
interval
ratio
nominal
3. A ______________ design is a type of
random
ized clinical trial in which each participant is randomized to either group A or group B which receive different treatments, then later switch treatments.
Choose one answer:
crossover
casecontrol
prospective
retrospective
4. Suppose researchers do an epidemiologic investigation of people entering a sexually transmitted disease clinic. They find that 160 of 200 patients who are diagnosed as having gonorrhea and 50 of 105 patients who are diagnosed as having nongonococcal urethritis have had previous episodes of urethritis.
Are the present diagnosis and prior episodes of urethritis associated? (Hint: a chisquare test with Yates’ correction)
Choose one answer:
Gonorrhea patients are significantly more likely to have prior episodes of urethritis than NGU patients.
Gonorrhea patients are not significantly more likely to have prior episodes of urethritis than NGU patients.
5. The _______ rate is defined as the proportion of participants in the placebo group who actually receive the active treatment outside the study protocol.
9. For any sample point ( x subscript i, y subscript i), the ___________________ of that point about the regression line is defined by ( stack y subscript i with hat on top minus top enclose y). (Hint: the blank is two words)
10. The following statistics are taken from an article by Burch relating cigarette smoking to lung cancer. The article presents data relating mortality from lung cancer to average cigarette consumption (lb/person) for females in England and Wales over a 40year period. The data are given in the table below.
Cigarette consumption and lungcancer mortality in England and Wales, 19301969
0.55
Period 
log_{10} mortality (over 5 years), y 
log_{10} annual cigarette consumption (lb/person), x 

19301934 
2.35 
0.26 

19351939 
2.20 
0.03 

19401944 
2.12 
0.30 

19451949 
1.95 
0.37 

19501954 
1.85 
0.40 

19551959 
1.80 
0.50 

19601964 
1.70 
0.55 

19651969 
1.58 
Source: Based on the Journal of the Royal Statistical Society, A., 141, 437477, 1978.
Compute the correlation between 5year lungcancer mortality and annual cigarette consumption when each is expressed in the log_{10} scale.
11. A ____________ variable is a variable that is associated with both the disease and the exposure variable. Such a variable must usually be controlled for before looking at a diseaseexposure relationship.
Choose one answer:
confounding
random
discrete
continuous
12. Researchers compare protein intake among three groups of postmenopausal women: (1) women eating a standard American diet (STD), (2) women eating a lactoovovegetarian diet (LAC), and (3) women eating a strict vegetarian diet (VEG). The m e a n space plusorminus space 1 space s d for protein intake (mg) is presented in the table below.
Protein intake (mg) among three dietary groups of postmenopausal women 
Group Mean sd n 
STD 75 9 10 
LAC 57 13 10 
VEG 47 17 6 
Perform a statistical procedure to compare the means of the three groups using the criticalvalue method. Report the pvalue from this test.
13. In a 1985 study of the relationship between contraceptive use and infertility, 89 of 283 infertile women, compared with 640 of 3833 control (fertile) women, had used an intrauterine device (IUD) at some time in their lives.
Use the normaltheory method to test for significant differences in contraceptiveuse patterns between the two groups.
Choose one answer
The difference in previous IUD usage rates is highly significant with cases significantly more likely to have previously used an IUD than controls, with p<0.001.
The difference in previous IUD usage rates is highly significant with cases significantly more likely to have previously used an IUD than controls, with p<0.01.
The difference in previous IUD usage rates is not significantly more likely to have previously used an IUD than controls, with p<0.001.
The difference in previous IUD usage rates is not significantly more likely to have previously used an IUD than controls, with p<0.01.
17. A negative confounder is a variable that either
Choose one answer:
is postively associated with both exposure and disease
is negatively associated with disease and positively associated with exposure
is positively associated with disease and negatively associated with exposure
is negatively associated with both exposure and disease
19. Consider the information from Problem 18. Two hundred subjects are selected randomly from the population and followed for various lengths of time. The average length of followup is 1.5 years. Suppose that at the end of the study, the estimated rate is 4 per 100 personyears. How many events must have been observed in order to yield the estimated rate of 4 per 100 personyears?
