A common experience when walking into a large group of people is to look around and see how well you fit in. For example, if you walk into a nightclub while visiting a town you have never visited before, you might note the age and dress of the people around you to gauge how well you fit in with the clientele of this nightclub. If everyone seems to be at least 20 years older than you and dressed much more formally, you might guess that this is not your kind of club and leave in search for a club more to your liking. What you are doing here, although you probably never looked at it in quite this way, is looking at the distribution of patrons on various variables, such as age or dress, and seeing where you fall in this distribution.
A frequency distribution simply organizes a large number of scores by counting how many people have each of the possible scores. It is the first step in organizing data. In this section, you will learn how to create both a frequency distribution and a grouped-frequency distribution. The next section will use the data from these distributions to produce visual representations of the distributions, called graphs.
Frequency distributions count the number of people or objects that have each possible score. For example, we could create a frequency distribution of the number of pages in each of the books in the public library or the ages of each of the people who buy a particular product. A hypothetical age distribution is shown in the table below. This might be the ages of people purchasing a given product, as in our hypothetical example, although for the purpose of this discussion, it does not matter. In such a frequency distribution, the age is listed in the first column and the frequency of people at each age is listed in the next column. For small samples, you can literally count the number of people in each age group, perhaps by using a tally system, in which you make a mark for each person next to the age of that person and then count up the tally marks. For the distribution below, that would be a tedious, but workable, way to construct the frequency distribution. With larger samples, it is much easier to have the scores listed on an electronic medium, such as a disk, and then have a statistical computer program do the counting for you. You can follow this link to see how that might be done using SPSS for Windows.
Although a frequency distribution is only a table, it is relatively easy to visualize the distribution by glancing at the table. For example, in the table above, you can see that the most frequent age is 12, with 85 people being 12 years old. You can also see that most of the sample is within two or three years of 12 and, as you move away from 12, the frequency of people tends to drop off. Many distributions in psychology show this kind of pattern or shape, with most of the people clustered toward the middle of the distribution and smaller numbers of people as you move away from the middle of the distribution. You will see this more clearly when we move onto graphs in the next section. Also in the table above is a third column, which is labeled cumulative frequency. This column lists the number of people for each score who have that score or a lower score. So there are 21 people who are 9 years old (the frequency of that age) and 32 people (21 at age 9 and 11 younger than age 9) who are 9 or younger (the cumulative frequency). You will see shortly that it is handy to compute the cumulative frequency column for later statistical computations, although if the computer is doing your computations, the cumulative frequency column adds very little. The cumulative frequency column is computed by adding the frequency for each score to the cumulative frequency for the score below it. For example, the cumulative frequency for a score of 13 is 85 (the number of people age 13) plus 214 (the cumulative frequency for a score of 12).
When there is a large number of potential scores, it is useful to group them into a manageable number of intervals (around 10 to 20) by creating intervals of equal widths and computing the frequency of scores that fall into each interval. Such a distribution is called a grouped frequency distribution. Technically, the distribution of ages in the frequency distribution above is a grouped frequency distribution in that each age is actually a 1-year wide interval. For example, anyone from 12 years, 0 days to 12 years, 364 days old would be said to be 12 using our conventional age notation (age at last birthday). Other distributions are more obviously grouped frequency distributions. For example, if you wanted a distribution of income for a sample you are working with, you might well have some people making as little as $15,000 a year and others making as much as $250,000. So you might decide to create intervals of $20,000. The first interval might be 0 to $20,000, the next $20,001 to $40,000, and so on. The advantage of a grouped frequency distribution is that it is small enough for you to get a pretty good idea at a glance how the scores are distributed. The disadvantage is that you are lumping scores together, thus losing some of the information in the original scores.