SEARCH
 
< previous section

Working through the Data



6.6 Using statistics to describe data

Analytical tools

Basic data analysis can be accomplished

  • with a computer software package (necessary for a large data base)
  • by putting data into table form using a spreadsheet as in Excel or
  • with the table function in a word processor

Other software at varying levels of complexity is available free or for purchase on the web.

Types of analysis

Analysis can take two forms. The first describes the data, shaping the results to bring out patterns and trends that may be hidden. This process is easy for non-statisticians using fairly basic mathematics and is the logical follow-up to data collection. The second more complex type of analysis applies mathematical tests to give a statistical estimate of the level of confidence one can have in the accuracy of the findings. That level of statistical analysis is only touched on here.

Descriptive statistics

  • percentage (the number of responses for each option/variable as a percentage of all responses)
  • mean or arithmetic average (the sum of scores divided by the number of responses to the question)
  • median or mid-point response with an equal number of responses above and below it)
  • mode (the most frequently occurring response)
  • range (the amount data are dispersed: difference between the highest and lowest values, or range of categories with at least one response)
  • standard deviation (the average of the distances between each value and the mean of all the values. See Glossary.)

How to use descriptive statistics

Most data used in descriptive statistics will either consist of

  • named categories: individual items, either one thing or another (e.g., male/female or English/French/Italian)
  • a choice that falls somewhere on a quantifiable spectrum of options (e.g., smaller to larger, less to more)

Categorical or nominal data

Categorical data are best described by counting how many informants’ responses fall within each category (frequency distribution.) The following example shows how basic descriptive statistics can allow programs to see, and show, such patterns.

Example:

Consider the following question:

Which activity in the community recreation program do you most like participating in?

a) art activities
b) computer skills
c) music workshops
d) pottery
e) reading buddies
f) gym games

Number of responses (N=23, 12 girls and 11 boys)
Responses: a = 2, b = 4, c = 5, d = 3, e = 3, f = 6

No one would ask, “what is the average gender of participants?” or “what is the average language spoken in the class?” You cannot create an average for discrete, named categories. Nominal or categorical data can best be described by counting how many informants’ responses fall within each category (frequency distribution.)

Since the responses are different from one another, but don’t lend themselves to any order, counting or measurement, they are categorical (or nominal.) Calculating an average (mean) or median would be meaningless for these responses, but the program can examine the range (which responses were selected ) and distribution (how frequently responses were selected) by calculating percentages or proportions (the number of times each response was given, divided by the total number.)

The results (arranged in descending order ) are:
F = 6 (26.086)
C = 5 (21.739)
B = 4 (17.391)
D = 3 (13.043)
E = 3 (13.043)
A = 2 (8.695)

Although readers can determine the relative position of responses from a list like this, plotting a bar graph of percentages will make it easier to show staff differences in response levels and ask for feedback. Initial results show that f) is the modal response, (the most frequently selected option) meaning that gym is the most enjoyed activity. However, this initial analysis may raise other questions. For example, did boys and girls have different preferences? Since there are almost matching numbers of girls and boys, looking at those results could provide further information.

Ordinal data

When the options given for a question can be arranged in some order (one is bigger, better or more of something than another), it is an ordinal scale. An example would be questions with word options like: very happy, happy, neither happy nor unhappy, unhappy, very unhappy, which have a definite order but no equal or even definite distance from one option to the next. Because they lack a measurable, mathematical interval between them, calculating a mean or average level of happiness for the group is also not really appropriate.

Median, mode and range

Instead of mean or average, a programmer can ask about the median, the person who is in the middle of the group in terms of attitude, with half the range of responses on one side and the other half on the other side. This is useful because it shows you the trend of the responses.

Looking at the following example:

8. How do you feel about playing with other children in the recreation program?

very unhappy
unhappy
neither happy nor unhappy
happy
very happy
1
2
3
4
5

Q8, N = 15, Responses from coding sheet

To find the median if there are not a lot of responses

  • put all responses in order then count to find the mid-point.

For this question and program, the median response is “happy.’

The mode for this data set (the most frequent response) is “very happy.”

If two options had both had the most frequent number of responses, they would both be modes and the data would be called bi-modal. (If all categories have the same number of responses, there is no mode.)

Both the median and mode for these results would be encouraging for programmers.

Distribution of responses

As in this example, responses may not be evenly distributed across all the response options. Often they will be clustered at one or more typical responses, with only a few people giving quite different responses.

The pattern of distribution has more effect on the mean value than on the median. A very few responses that are quite different from the majority can skew the mean either up or down, and provide a less than accurate picture of results. Because of this, the median is usually a more useful statistic and you may want to compare all three: mean, median and mode.

The range, another useful measure, can be easily illustrated in a table or bar chart. In the example given, the majority of respondents are happy playing with others in the program. However, results show a broad range, with fully one-third of children unhappy to some extent. A narrower range of responses with no child selecting the bottom two or three response options would have been preferable. Since the data refer to a specific and small group of children, the results may provide some clues about the operation of the program, relationships with volunteers or the dynamics of that particular group. It may raise possible questions about cliques or bullying that may be more fully explained by qualitative data from observations.

Quantifiable data

The full range of descriptive statistics already described can be used for quantitative ordinal data. In this type of data, each response option is ‘so many units more than another’ or ‘so many times more than another’ on the scale being used.

Examples from community programming data collection would be:

  1. quantitative questions (how much, how many, how often?) that provide a scale with numbers or
  2. questions that ask for measurements like height, weight, or test scores.

It is common to divide data into quartiles, the responses at the 25th and 75th percentiles then plot responses on a curve. This can be done simply by first finding the median, then finding the median of each group of responses on either side. See Glossary for a more detailed formula.

These types of data can provide more precise information and can be described in more ways:

  • by finding the mean response
  • the median response (and quartiles)
  • the mode
  • range ( determined by subtracting the smallest value from the largest value)
  • standard deviation (a measure of variability of responses not often used in community programming evaluations)

Handling ‘extreme’ responses

Statisticians often treat results that lie at the extreme ends of a distribution as expendable, especially when working with large numbers of responses. In smaller-scale community evaluations, results that lie at the bottom of a distribution, reflecting dissatisfaction or less positive results, may also provide information about program areas needing improvement. They may need to be viewed as red flags or challenges to seek further for explanations.

Cross-tabulation

Answers to the original evaluation questions are often found with simple tabulation of responses and descriptive statistics as explained above. However, there may also be a need to look at subgroups among respondents and compare results from certain questions. Cross-tabulation, which examines the relationship between responses from two questions, allows for more complex ways of looking at the data. For example, a cross-tabulation might look at the attendance/participation records of respondents in a skills development program compared to changes in skill levels before and after a program. This is an easy operation for simple statistical software, some of which can be downloaded without cost from the Internet (See References and Resources.)

 

< previous section
Last updated: July 2004

© 2004