Practical Data Management and Statistical Computing (BioEp691F)


Contacts

Outline
Assignments

Resources


Research
Problems

Homework

Group Problems/
Exams

Grades


Homework 17

Due:


The Central Limit Theorem.

The Central Limit Theorem is an important theorem in Statistics. The theorem states that when simple random samples are selected repeatedly from a population, and the average is calculated from observations selected in each sample, then the distribution of these sample averages will follow a "normal" distribution (as long as the sample sizes are sufficiently large). How large "sufficiently large" is depends on the particular setting. In some populations, a verly large sample size is needed before the distribution of the sample averages will be approximately "normal". In other settings, a relatively small sample size (say n=5 or 10) is adequate for the distribution of sample means to be approximately normally distributed. Properties of the "normal" distribution can be identified, and used to help draw conclusions about the population. These conclusions are called inference.

It is possible to illustrate the central limit theorem with a simple example. As an example, we consider a population that is binomially distributed. For example, suppose that the probabily that a person is absent from school (due to illness) is given by "P". Let us define a random variable y that indicates whether or not the person is absent from school, where y =

1 if person is absent on a day or

0 if person is present on a day

where P = Prob( person is absent).

If we keep track of attendance over a week (n=7), then the total number of absent days will simply be the sum of days that the person is absent in the week, which we will denote by x. An estimate of the average number of days absent will be equal to x/n, which will estimate the Prob(person is absent). For different weeks, the estimate of the Prob(person is absent) will differ, since the person may be absent for different numbers of days on different weeks. Over a year, we might construct 52 estimates of the Prob(person is absent). The distribution of these estimates is the subject of the Central Limit Theorem.

The Central Limit Theorem says that if the sample size is large enough (if estimates are based on enough days, ie. large enough n) then the distribution of the estimates will follow a "normal" distribution. We will examine how large "n" must be for this statement to be approximately true. To do so, we will simulate selecting samples of size "n" from a population where the Prob(person is absent)=P is known. A total of 100 samples will be selected of a given size. We will then examine the distribution of the sample means (which are estimates of Prob(person is absent)) for the samples. By changing the sample size (n), we will see how the shape of the distribution approaches a common shape, which is a "normal" distribution.

 

Conduct a simulation using a program similar to LEC24P6.SAS for a given value of P (using one value of P assigned in class from p=.5 , .3, .1, .05, .01, .005 ) samples of size n=5, 10, 25, 50, 100, and 500. For each sample, calculate the estimated Prob(person is absent). Repeat this process for 1000 samples, and then plot a histogram of the distribution of the sample estimates for each sample size using an x-axis with values ranging from 0 to 5P, comparing this result with what one would expect from a normal distribution. Write a 1/2 page summary of the results on a WEB page indicating the members, with links to a sample program and output. Email me the WEB links.

Solutions:

P=0.5

P=0.3

P=0.1

P=0.05

P=0.01

P=0.005

 



Last Update: 12/9/99
Comments: Ed Stanek
Email:
stanek@schoolph.umass.edu
\ed\web\be691f\webready\hw17.html