- Ryan Brooks
Simple Statistics for Nonprofits
Updated: Nov 9
When we examine our organization’s data, it’s great to have one number, one value, to summarize the big picture and help us tell your story. A measure of central tendency is a single value that represents the central or typical value of a set of data. There are three main measures of central tendency: the mean, the median, and the mode.
Measures of central tendency help you understand your nonprofit's data and make simple comparisons. For example, you could compare the average income of participants in the past 5 years to determine if they are better off, worse off, or about the same.
In this post you will learn:
Three measures of central tendency and how to calculate them
The ideal data to use with mean, median, and mode
Data that is not ideal for each mean, median, and mode
The Mean
The mean is the most common measure of central tendency. Most of us already know the mean and call it the “average”.
Example Data:
2, 4, 6, 8, 10
Mean Calculated By:
Step 1: Add all the values in the dataset.
Step 2: Divide the total by the number of values in the dataset.
Step 1 :Add all values: 2+4+6+8+10 = 30
Step 2: Divide the total by the number of values: 30 / 5 = 6
Mean:
6
Ideal Data for Means
The mean is used to describe interval and ratio data - basically data that can be put on a number line and the numbers have a true meaning. Examples of interval and ratio data include age, height, number of family members, and years of education.
The mean is a good measure of central tendency when there are no outliers and the data is evenly distributed around the mean.
An outlier is a value that is much different than the rest. For example, if the members of your group have 4 apples, 3 apples, 5 apples, and 75 apples, then the person with 75 apples is an outlier.
The mean can be misleading when data is not evenly distributed around the mean. For example, this dataset {2, 4, 6, 8, 10, 50} as a mean is 13. However, this mean is not very representative of the data, since five (5) of the data points are below the mean of 13 and only one data point (50) is above thirteen.
If you know you have skewed data or you often have outliers, then it’s better to rely on the median as your measure of central tendency.
Data That is Not Ideal For Means
In addition to data with outliers and data that's unevenly distributed, we must be careful trying to use means with categorical data (i.e. nominal and ordinal data). We typically don't use means with categorical data because the choices do not have a meaningful underlying value. A couple of examples might help us understand this limitation.
It's makes sense that we wouldn't calculate a mean of favorite colors (Red, Blue, Green). The categories Red, Blue, and Green don't have a meaningful numeric value. Of course, we could assign them values (Red=1, Blue = 2, Green = 3), collect data, and calculate a mean based on those values. But, what would a "mean favorite color" of 2.3 actually tell us?
In Table 1 below, you see an example of categorical data, Educational Attainment, with the numeric values you've assigned to each category. So, Less than High School is assigned a value of 1, while Advanced Degree is assigned a value of 5.
Table 1: Count of Participants with Each Level of Educational Attainment, including the Value Associated with Each Level.
Educational Attainment | Count of Participants |
Less Than High School (value =1) | 45 |
High School Graduate (value = 2) | 60 |
Some College (value = 3) | 60 |
College Graduate (value = 4) | 15 |
Advanced Degree (value = 5) | 5 |
If we calculate the mean from this table, we get 2.32. The problem with this example is is less obvious compared to the favorite colors example, but it's still not idea data for calculating a mean.
Why?
It seems logical for the values to go up as you increase levels of educational attainment, but the values we assign to each educational attainment level could be anything. There is no meaningful or ideal numeric value that we can assign to "College Graduate" or "Less Than High School". We could just as easily assign the value of Less Than High School to be 0 or the value of Advanced Degree to be 25 or 50. There's no correct answer for these values, so the mean we calculate is somewhat meaningless. (pun intended)
Calculating a Mean on Categorical Data - Likert Scale Exception
Even with logically ordered categories (i.e. ordinal data), we typically should not calculate a mean because its value (and how we interpret) is full of potential problems. However, you will see exceptions. It's common to see means calculated on data from Likert scales. You've probably seen Likert scales many times. Here's an example:
Please state how much you agree with the following statement: "Kittens are cute."
Strongly Disagree (Value=1)
Disagree (Value = 2)
Neutral (Value=3)
Agree (Value=4)
Strongly Agree (Value=5)
Since each response is assigned a value, we can easily calculate a mean.
It can be useful to calculate the mean on a Likert scale, especially if you want to compare responses to the same question over time. For example, you might collect feedback about a home ownership workshop you offer several times per year and are constantly trying to improve. You might ask something like:
"Please state how much you agree with the following statement: "I am more prepared for home ownership because of what I learned"
Over time, you can evaluate how the mean changes.
The Median
The median is the middle value in a set of data, when the data is arranged in order from least to greatest. The way you determine the median is slightly different depending on whether you have an even or odd number of values. Both approaches are explained below.
Example Data
2, 2, 3, 4, 5, 5, 7, 8, 9, 10, 30
n=11 values
Median Calculated By: (Odd Number of Values)
Step 1:
Order all of your data points from least to greatest
Step 2:
Count the number of data points
Step 3:
Add 1 to the number of data points and Divide by 2 or (N+1) / 2
There are 11 data points in our dataset, so (11 Data Points + 1) = 12
12 / 2 = 6
Step 4:
Starting from the lowest value, count up to the 6th value, which is 5.
{2, 2, 3, 4, 5, 5, 7, 8, 9, 10, 30}
Median Odd Number of Values
5
Median Calculated By: (Even Number of Values)
Example Data
2, 3, 4, 5, 5, 7, 8, 9, 10, 30
n=10 values
Step 1:
Order all of your data points from least to greatest
Step 2:
Count the number of data points
Step 3:
Add 1 to the number of data points and Divide by 2 or (N+1) / 2
(10 Data Points + 1) = 11
11 / 2 = 5.5
Step 4:
Starting from the lowest value, count up 5.5 values. You will land between 5 and 7 in our data set. You need both to calculate the median.
{2, 3, 4, 5, 5, 7, 8, 9, 10, 30}
Step 5:
To get the median, you take the average (i.e. mean) of the two values
(5+7) / 2 = 6
Median Even Number of Values
6
Note, the median value of 6 is not in our data set. That seems odd, but remember that our goal is to determine a single value that represents our entire data set. The mean that we calculate is often not in our dataset either.
Ideal Data for Medians
The median can be used with interval and ratio data (e.g. height, age, years of education) Medians can also be used with categorical data with a meaningful order (i.e. ordinal data) such as educational attainment or scores on a Likert scale.
The median is not sensitive to outliers, so you often see it used when there are very large values that can skew your results. For example, it’s common for researchers to use medians when they analyze wealth in the United States. Wealth can be skewed by very large values (e.g. Bill Gates and Elon Musk), and the mean wealth would misrepresent “average” wealth in the population.
Credit Suisse’s Global Wealth Data Book (automatic PDF Download) finds that the median wealth in the US is $93,271 per adult, while the mean wealth is $579,051 per adult. Extremely wealthy individuals push the mean wealth much higher than median wealth.
Data That is Not Ideal for Medians
Median should not be used with categorical data which has no meaningful order, such as language spoken (English, Spanish, French, Mandarin) or favorite color. Remember, the first step in finding the median is to sort your data from lowest to highest value. If these categories have no meaningful order, then you can't sort it.
The Mode
The mode is the most frequently occurring value in a set of data. For example, the mode of the set of data {4, 4, 4, 6, 6, 10} is 4.
Mode is Calculated By:
Step 1:
Count the number of times each value occurs.
Step 2:
Select the value that occurs the greatest number of times.
The mode is a good measure of central tendency when the data is categorical. For example, data about educational attainment is often categorical: High School Graduate, Some College, College Graduate.
In Table 2 below, we can see that High School Graduate is the mode because it occurs the most frequently (n=3) in our data.
Table 2: Educational Attainment for Five Program Participants
Name | Educational Attainment |
Diane | Some College |
Maria | High School Graduate |
Lee | High School Graduate |
Antonio | College Graduate |
TJ | High School Graduate |
We can use this data to say something like “Most of our participants completed high school but never attended or completed college. Therefore, our skills training program includes XYZ content to best prepare them for jobs.”
One challenge you might encounter with the mode is that there are sometimes multiple modes. That is, multiple categories that have the most common value.
Looking at educational attainment data again in Table 3 below, we see the number of participants with different levels of educational attainment. In Table 3, there are two modes: High School Graduate (n=60) and Some College (n=60), so our “central tendency” is two categories.
Table 3: Count of Program Participants with Selected Levels of Educational Attainment
Educational Attainment | Number of Participants |
Did Not Complete High School | 45 |
High School Graduate | 60 |
Some College | 60 |
College Graduate | 15 |
Advanced Degree | 5 |
Ideal Data for Modes
The mode best used with with categorical data. The categories can have a logical order such as educational attainment, or they might not have a logical order, such as gender, race, or primary language spoken at home.
Data That is Not Ideal for Modes
In general, interval and ratio data are not great for modes. One of the reasons is that there are too many possibilities. For example, if you tried to collect the mode income at a university with 10,000 students, then you'd probably get nearly 10,000 different responses. The mode, in this case, would probably not be very meaningful.
You could, however, create income categories after the data is collected (e.g. $0-$5,000, $5,0001 to $10,000, etc), and then determine the mode from that data. This could be a useful way to categorize and understand your students as "very low income", "low income", "moderate income", and etc.
Which Measure of Central Tendency Should You Use?
If it's not clear by now, the measure of central tendency that you use largely depends on the type of data you have and the purpose of your analysis. If you have evenly distributed data, the mean is a good choice. If you have skewed data, the median is a good choice. And if you have categorical data, the mode is a good choice.
Examples:
Here are some examples of how measures of central tendency can be used in real life:
Your nonprofit might use the mean to calculate the average amount of financial assistance you provide.
You might use the median to calculate the estimated average income of your donors.
You might use the mode to calculate the most common employment status among your participants.
Focus on Understanding the Concepts
You will rarely have to calculate the mean, median, or mode yourself. Your spreadsheet software should have built in functions to help you do this task. For example, in Google Sheets, you can determine the median value in your data by typing the following:
=median(StartCell:EndCell)*. Google Sheets will identify the median for you even in unsorted data.
You can do the same thing in Google Sheets for mean =average(StartCell:EndCell) and mode =mode(StartCell:EndCell). Other spreadsheet applications like Excel will include this type of functionality.
*StartCell is the letter and number of the cell where your data starts (e.g. A2) and EndCell is the letter and number of the cell where your data ends (e.g. A25)
If your nonprofit uses specialized data tracking and case management software, that software probably allow you to produce reports with measures of central tendency.
Since you won't need to calculate these measures of central tendency, the most important things for you to understand are the ideal data and limitations of each measure of central tendency, rather than focusing on how to calculate them.
Remember to Look at Counts and Percentages
In another post, we talked about using counts and percentages to understand your nonprofit’s data. Even with measures of central tendency in your toolkit, you should still examine your data using counts and percentages to understand it more completely.
For example, race is typically collected as categorical data. As categorical data, you could identify the most commonly selected race (i.e. the mode race) among your participants.
But, it’s probably better to simply look at the number and percent of participants that fall within each racial group rather than just looking at the most common race. This would give you a more complete understanding of your participants.
Also, remember your audience and your goal(s) with your data analysis. Measures of central tendency simplify a lot of data into a single number. That number is meant to represent something important. For example, you can analyze the average income (mean or median) of your participants when they enter and exit your program. Those are great, simple numbers to tell your donors. “Our job program participants increased median income by $1,000”.
But, that very simple value hides all of the nuances and special cases, areas of struggle and strength, within the program.
What percentage of participants increased their income $500, $5,000, or zero dollars?
Of those participants, can we see anything interesting about them? Do they tend to have prior work experience, criminal justice system involvement, health problems, (un)reliable transportation, and etc?
So, use measures of central tendency to help you understand your programs and your data. Use those values when you want to tell a simple story to your staff and your donors. But make sure they are just one piece of data you use to understand your programs and participants.
Learn More About Nonprofit Data Management
This post is part of our nonprofit data bootcamp series. Check out the complete list of nonprofit data bootcamp topics with links to other published posts.
Reporting your impact is hard when you’re juggling spreadsheets. countbubble makes it easy so you can focus on your mission.
Learn how countbubble can help your nonprofit measure and report your impact. Email us at contact@countbubble.com or sign up for email updates on blog posts, product news, or scheduling a demo.
Founder, CountBubble, LLC
Please connect with us on social media: Facebook and LinkedIn