Dept. of Biostatistics and Epidemiology at the :

BioEpi 691F: Practical Data Management and Statistical Computing

Final Exam

Due: 12/22/98
(9 AM)


Introduction

The prevalence of smoking cigarettes has been reported to be increasing among high school age children in recent years. Various surveys have been conducted to document this pattern. While data from national surveys describe tends in the US, these surveys do not contain the detail or immediacy of surveys in local communities. In part for this reason, two surveys were conducted in Western Massachusetts among teenagers. One survey was conducted by the UMASS Donahue Institute in Franklin, Hampshire and Berkshire county in 1995 using the Michigan Alcohol and Other Drugs School Survey. The other survey was conducted by the health eduction class at Amherst Regional High School in the Fall of 1997.


The 1995 Franklin, Hampshire, and Berkshire Survey

This survey was conducted using a multistage cluster-stratified sampling design. Junior high schools and high schools were enumerated in each county, and six schools were selected each from Franklin and Hampshire, with four schools selected from Berkshire County using probabilities proportional to the number of students. Within each selected school, a random sample of classrooms were selected from each of three grades (treated as strata), with the probability of selection proportional to the number of children in the classroom. The survey was administered to classrooms during homeroom. Amherst Regional High School was not selected as one of the schools in this survey.

Data were collected on mark-sense scannable forms. The instrument included 205 variables with completed questionnaires collected on 2644 students. A copy of the instrument is available. Data were read into a SAS data set for processing with variables named V1-V205. A subset of 11 variables from these data set has been abstracted for use in this exam. The subset was obtained by BEHS3.SAS, with the resulting data given in hbf1.sd2, and formats given in hbf1f.sd2.


The 1997 Amherst Regional High School Smoking Survey

In the fall of 1997, the Peer Health Education class of ARHS conducted a survey of smoking prevalence among high school students. The survey was conducted just prior to the Great American Smokeout in November using a simple instrument. Surveys were distributed to homerooms and completed by all students prior to the start of the school day. An EpiInfo program was written to enter and verify the data. The resulting questionnaire contained 23 variables. The questionnaire was completed by 977 students. The EpiInfo data entry file is given by ss1.qes, with entered data contained in the file ss0.rec.


Objectives

Health educators are interested in comparing results between the two surveys, particularly between the Hampshire County results from the 1995 survey and the results from the 1997 survey. Unfortunately, the timing of the surveys differed, as did the survey instruments. However, both surveys asked respondents whether or not they had smoked cigarettes during the past 30 days, and a question about how many cigarettes they usually smoked per day. These two questions were in some sense comparable. However, there were important differences. In the ARHS study, the question about number of cigarettes smoked per day was asked only of those self reporting to be current smokers. It may seem obvious that only these subjects would be smoking, but in fact this is not likely to be true. In the Hampshire County data, when comparing Question 10 with Question 11, it is evident that there are a substancial number of teenagers who do not consider themselves "regular smokers", but smoke a appreciable number of cigarettes per day. Such smokers may not classify themselves as "current smokers" in the Amherst study, and hence the rate of cigarette smoking may be under-reported.

With these considerations in mind, the most comparable measure of smoking cigarettes between the ARHS and Hampshire County survey data appeared to be a categorization of "smoke more than 5 cigarettes per day." It is possible to construct such a YES/NO variable in both data sets. Furthermore, when cross-classifying Question 10 with Question 11 in the Hampshire County data, it appears that nearly all respondents reporting over 5 cigs/day also reported themselves to be "regular smokers". For this reason, the conditional nature of the questionning in the ARHS survey (where only self identified "current smokers" were asked to report the number of cigarettes smoked per day) should not lead to substancial bias.

Both surveys also recorded the gender of the student, and the current grade. You have been asked to prepare a set of charts that illustrate the results of these studies as indicated below:

  • 1. A single chart that compares the prevalence of smoking (more than 5 cigarettes/day) between Hampshire County and Amherst Students by grade.
  • 2. Two charts (similar to chart 1) that make the comparison separately for males, and females.

Since more analyses may be required, you are also asked to do the following:

a. (20 pts) Create a permanent SAS data set for the ARHS survey data, including labels for all variables.
b. (20 pts) Create a permanent SAS format data set for the variables in the ARHS study.
c. (40 pts) Provide a brief write-up for the Principal of ARHS (1 page or less) that describes the two studies, any issue that the reader should know when evaluating the charts along with the charts themselves. Do not repeat the summary explaining the rational for using "over 5 cigarettes/day" to define smokers.
d. (20 pts) An appendix that includes:

i. A list of the Contents of the SAS data set in the appendix.
ii. A listing of the SAS programs in the appendix.


Last Update: 3/12/99
Comments: Ed Stanek
Email:
stanek@schoolph.umass.edu
\ed\web\be691f\webready\fe98d1.html