Measuring Nonprofit Success is Hard Part 2: How To Compare Pre- and Post-Program Data

Ryan Brooks
Nov 9, 2022
8 min read

Updated: Jan 29, 2024

How to Measure Nonprofit Success: Comparing Pre- and Post-Program Data

It can be difficult for human service nonprofits to show that your programs are “successful” because they deal with complex issues. It is challenging to achieve outcomes like literacy, wellness, or stability. Nonprofits have to define what those things mean (what is “wellness” after all?), we also have to find a way to show that our work moves the needle.

Nonprofits struggle to show that their work “works” because it is, in fact, hard to show that convincingly. In this post we’ll talk about an approachable way to improve how you collect, analyze, and present your results. Then, we'll go a little farther.

Post Program Data Has Limitations

Let’s say you have an after school tutoring program that serves 100 children, and you want to show how effective it is.

You ask the school teachers of every child in the program to complete a survey near the end of the year and here are the results:

Table 1: Post Program Results

Metric	Result (Post Program)
Child Enjoys Learning	92%
Child has Improved Their Handwriting	85%
Child Reads at Grade Level	72%

You might say something like: The children in our program are doing really well. This shows that the tutoring program works. Let’s expand it to 200 children!

Unfortunately, we can’t be sure for multiple reasons.

First, self-selection bias could be present. Maybe these children have the most motivated and involved parents or parents who can pick them up because there's no transportation. We can’t know with just this data, and we can't dismiss these concerns.

Second, these results have no context or comparison group for us to consider. The values look high, but how can we know if they are good? For example, 72% of children read at grade level looks pretty good to me. Is that higher than the school average? What percent of them could read at grade level at the end of last year? Were they struggling with reading at the beginning of the year? With this data alone, it’s impossible for us to make a meaningful judgment about the impact of the program.

Third, even if the listed results are great, we can’t be certain that the program contributed to any of these results. For example, we would expect that handwriting to improve between the beginning and end of the school year for most elementary school students. How can we know that 85% of children have improved their handwriting as evidence of the tutoring program’s effectiveness?

By only providing post-program data, it leaves a lot of room for people to question the program's effectiveness, and we can't be sure about the areas where the program is doing really well and where it has room for improvement.

Showing Pre-Program and Post Program Results (i.e. a Pre-Post Comparison) Improves Understanding

One way to have more convincing results is to provide pre-program and post-program results. This is more work for your program because you have twice as much data to collect. And, your analysis is a bit trickier because you now have to compare data rather than simply summarize it.

Table 2: Comparison of Pre-Program and Post-Program Values

Metric	Pre-Program	Post-Program	Change*
Child Enjoys Learning	72%	92%	20%
Child Has Grade-Level Appropriate Hand-Writing	45%	85%	40%
Child Reads at Grade Level	62%	72%	10%

*Change is calculated by Post -Program % - Pre-Program % (e.g. 92%-72% = 20%) to get the raw number of percentage points of change.

Change could also be calculated by (Post-Program % - Pre-Program %) / (Pre-Program %) to calculate the percentage of change from the original value.

With Pre-Post comparison, you can show how the children you serve have changed (and hopefully improved!) by the end of the year. Here, higher percentages of children enjoy learning and meet grade-level expectations for writing and reading at the end compared to the beginning.

In addition, we now know areas where students are improving the most and the least. We can see that a lot of students in our program are making progress in hand-writing (40% increase in grade-level proficiency) but not nearly as many (only 10%) are making progress in reading.

This additional data helps us reflect on what we are doing well and what we can change to support further growth. We can talk with parents, teachers, and students to figure out areas for growth. We can talk with them to understand how important each metric is and if we need to double our efforts to achieve it more frequently. We can also dig deeper to figure out if students in our program are simply farther behind in reading compared to writing.

Simply, having pre- and post-data drives us to ask more, better informed questions. And it gives us better insight into how well our program works. Also, donors can view this data and see that children in your program are better off after they’ve been in your program. Presenting this type of data takes away a layer of potential skepticism because you are showing that you understand your data and the impact you are making.

Limitations of a Pre-Post Comparison

Even with our better approach, we still have issues that make it hard to conclude that our program is effective. There's still a possibility of self-selection bias.

We now have meaningful context about the students we are tutoring, so that’s great, but we don’t have a comparison group (other than the students at an earlier time). We don’t know how the students we tutor compare to other students, so that makes our claim of effectiveness weaker.

Finally, we still can’t be certain that the program contributed to any of these results. Maybe some of the kids who struggled in school had great teachers. Maybe parents that enroll their kids in after school tutoring are also more likely to read or write with their kids an extra 15 minutes to help them catch up.

Before You Take the Next Step

Definitely read the rest of this post, because the information is (I hope) useful. But, in practice, you might decide to stop here. If you do, you’ve taken a big leap forward. However, the next step (creating a comparison group) takes significant effort and it has a big payoff - very convincing results. Read on...but consider the investment carefully in your program.

Pre-Post Data with a Comparison Group

We’re still dealing with self-selection bias, we still don’t have a comparison group (other than the students themselves at an earlier time), and we still don’t know if our program actually contributed to the success of these students.

The next thing we can do is create a comparison group. But, there's no easy path to creating a good comparison group and gathering its data. So, let’s talk about the hard path.

What’s a comparison group and where do I get one?

The comparison group is a group of people who are similar to the people in your program. You will collect data about your comparison group just like you do with your program participants. In this scenario, the comparison group would be a set of students that you aren't tutoring, but you will get the same Pre & Post data about enjoying learning, handwriting, and reading.

You can’t just compare your participants to anyone. You must have a reasonable comparison group. The goal is to create a comparison group that’s similar to your participants in every way except the program that you are offering.

The easiest way to create your comparison group is to use people on your waiting list. If 150 people signed up for our tutoring program with 100 students, then we have a 50 student waiting list. We can randomly select 100 students to receive tutoring and the other 50 students become our comparison group.

((Let's pause right here and acknowledge that you probably don't view your program as a social experiment. Randomly assigning people to receive services or not might not feel right to you. You might want to target participants with the greatest need or use other criteria, and that's reasonable. Randomly assigning people to the program or the comparison group is pretty important for this approach to be valid. If you aren't interested in that, then this might not be the right approach for you. Before you dismiss this approach, please read to the end anyway. ))

Those 50 students are an ideal great comparison group. Using our waiting list addresses self-selection bias. The 100 students in the program and the 50 on the waiting list all signed up for tutoring, so they’ve demonstrated they are interested in tutoring and probably improving academically.

Of course, we now have comparison groups, so we can examine whether the students in our program grew more between the beginning and end of the year. This is powerful data to show where our program is making an impact and where we have room for growth.

Finally, we have greater confidence that our program contributed to any of these results. If our results indicate that the students we tutored grew more than students not in our tutoring program, then we have strong evidence that our program makes an impact.

So why is the approach comparison group hard? It doesn't seem all that hard.

Creating a good comparison group can be challenging. If you have a waiting list, then you have a natural comparison group. Otherwise, you have to find a way to find a group of people that's similar to your participants. That can be a big challenge and you need to get it right for the comparison to be fair.

Also, you have even more data to collect now. Not only are you collecting outcomes for people in your program, you are collecting that data about people who are not in your program. That data can be harder to collect.

Table 3: Pre-Post Comparison of Students in Tutoring Program

Metric	Pre-Program	Post-Program	Change
Child Enjoys Learning	72%	92%	20%
Child Has Grade-Level Appropriate Hand-Writing	45%	85%	40%
Child Reads at Grade Level	62%	72%	10%

Table 4: Pre-Post Comparison of Students Not In Tutoring Program

Metric	Pre-Program	Post-Program	Change
Child Enjoys Learning	75%	90%	15%
Child Has Grade-Level Appropriate Hand-Writing	50%	75%	15%
Child Reads at Grade Level	55%	60%	5%

Pre-Program Values Comparison

Now that we have results for our tutored students and our comparison group of students, we can compare them. If we look at the pre-program values, we can see that our 2 groups were pretty similar, so that's helpful to know. They aren't exactly the same, but there aren't huge differences either

Comparing Change

If we look at how they changed between the beginning and end of the program, we can see that the students in our tutoring programs demonstrated larger percentage increases across all 3 of our metrics.

This is really strong evidence that our tutoring program works!!!

As a staff member, I can be confident that our program matters. As a donor, I can see pretty clearly that this tutoring program helps students enjoy school and succeed. The careful data collection and analysis remove the concerns I might have about the program’s effectiveness. What's more, I can use these results to continue asking questions about where are are doing really well and where we can improve.

How to Get Started Measuring Nonprofit Success with Pre-Post Comparisons

These approaches are useful for practically any type of human services program, not just tutoring. For example, workforce development nonprofits can examine how much they improve incomes over time for their participants, and they can even compare that growth to similar people who didn’t complete their program.

The easiest way to get started is to collect a little bit of pre-program data on all of your participants. Maybe you don’t need to collect data on everything, but you can look at just a few key pieces of data (income, reading scores, financial literacy, etc). Just having a few key pieces of data to compare helps you learn about the value of this type of comparison, and it shows you areas where you can get a lot of value with additional data.

Creating a comparison group is a sizable additional effort. And, maybe you don’t need to do that every year or every cohort. Maybe you can do that every other year because it cost more time and money, but the payoff is better data, stronger conclusions, and better insights into your program.

While you're here, check out our first post on this topic - Measuring Nonprofit Success is Hard Part 1, where we talk about counting outcomes, success rates, and calculating cost per outcome.

Reporting your impact is hard when you’re juggling spreadsheets. countbubble makes it easy so you can focus on your mission.

Learn how countbubble helps nonprofits track and report their impact.

Feel free to email us at contact@countbubble.com to start a conversation, or sign up for email updates.

Ryan Brooks

Founder, CountBubble, LLC

Please connect with us on social media: Facebook and LinkedIn