How Many Participants Are Enough?

Bill Hackos, PhD
Vice President, Comtech Services, Inc.
www.comtech-serv.com

Often, one of the questions that I am asked in my consulting practice is “How many participants are required in a customer study or usability test to ensure a scientifically valid result?” Some of my clients tell me they have heard quotes of as many as 20 or 30 participants. Statistics texts will tell you the same thing. We also know that Gallup and Harris use a rather large number of participants in their political polling.

The problem for information developers is that customer studies and usability testing are very labor-intensive and, therefore, can be cost-intensive. Typically, at least three people are involved in the process of usability testing and customer studies: an administrator, a recorder, and the participant. Typical fully loaded hourly costs for these people range from about $80 to $100 per hour. Assuming a usability test duration of about four hours, the labor cost of testing one participant ranges from $720 to $900. If we were to use 20 or 30 participants, we would experience a testing cost of $14,000 to $27,000. Add to that the cost of planning, analysis, and reporting and you’re looking at a total cost of $20,000 to $35,000 for a single usability test. Most of us would consider these figures to be prohibitive. Remember that the psychologists who are writing the textbooks use graduate students for their labor who basically work for free!

The basic issue in statistical studies is not using a lot of participants but instead picking your sample very carefully. It’s important that your sample reflects the total population in terms of the data you want to collect. If you were to obtain data from everyone in the population, like the US census every ten years, you can be absolutely certain that your results are accurate. (Even the US census is biased, though, because it misses a disproportionate number of aliens and homeless people.) For any sample less than the entire population, you need to be concerned about bias. One of the reasons some researchers use very large samples is so they can be sloppy about how they pick their samples.

What does all this tell information developers about usability testing?

  • Before you begin usability testing, study the users who use or are likely to use the product you’re testing. Study their workflow, look at their education and experience. Determine if there are classes of users that should be tested independently. Then, carefully select a small group of users that you are certain are representative of the class of users you want to test. Remember that the selection of your participants is important to the success of your testing.
  • Rather than guess how many participants you need for your test, start with a small number, three to five. After each participant is tested, determine how many independent pieces of data you were able to extract from the test that did not surface in previous tests. At first, each participant is likely to provide a lot of new data, but by the time you are on your fourth or fifth participant, you will find diminishing returns, because many of the issues that develop were found during earlier testing. Eventually, new participants provide little if any new information. At this point, stop your testing. The number of participants you need to achieve this result will vary from test to test, but I have found that the number is lower than you might think.

Why is it possible to get definitive results with so few participants? Unlike a survey or Gallup poll, with usability testing you get information from a wide variety of sources. First, you have good demographic information about your participants; you can control their activities through the use of scenarios; you watch them in action; you talk to them while they work; you get to interview them after the testing. In short, you have multiple sources to obtain the same information. Each participant in your usability test is worth many survey data points! Additionally, you can control your costs by letting the test results determine when you should stop rather than following some statistician’s rule or making an uneducated guess.