Even if you disagree on the exact number of people you need to test with, the key point here is that you don't need to have access to an expensive usability lab to get good feedback on your product. Just looking over someone's shoulder as they fumble their way through the interface you designed can be a huge eye opener.
My partners and I have been building a family intranet service (Kinverge.com). As soon as we had a working prototype together, we each sat down with various members of our family and asked them to perform several tasks on the site (register, invite others, add photos, post a message, etc.)
After the initial instructions, we kept our mouths shut and simply took notes. And this is important -- don't jump in and try to help when the tester gets confused or lost. Seeing the actions users make when they're lost is just as important as seeing how they got lost in the first place.
The notes we gathered during these sessions later helped us decide what changes to make to the site and in what priority.
We've found a brief interview where you can ask what they thought of certain things was very helpful afterwards too.
We also do usability on the cheap, using ScreenFlow (varasoftware.com) which is only $99 compared to $1500 or so for Morae, which seemed to be the industry leader in usability testing. I think the analytics side, which was the big difference, is overrated and unnecessary.
We're planning on writing to the ScreenFlow folks to show them the results of our tests, since they didn't seem to know of any users using it for this yet and it could be a good secondary market for them to get into...
I'm also curious about Silverback (silverbackapp.com) but wasn't able to get into an early beta spot in time for our testing. It actually looks really similar to ScreenFlow in any case.
I wanted to write an app to do usability testing for a while, and then screenflow came out and my first thought was "This is exactly what I wanted, and even better to boot." I have a hard time believing no one else is using it for this.
I'm glad it's working out well for you. We'll be starting to use it for this in the near future.
First of all, I love the fact that Nielsen thinks you can quantify "usability problems found" as being on a measurable scale. To think you can identify "100%" of usability problems with 15 users has always cracked me up.
I completely agree with his suggestions of small groups of users and iterative design. However the problem with his approach is it's like a shotgun - he admits, you get several users to make sure you find the non-overlapping areas.
What I found is that while it's good running sessions with groups of 'naive' users, a few targeted, repeated sessions with individual users that match personas being designed for is incredibly useful, and produces results that are surprisingly universal. Not only does the usability engineer discover new things about users each time someone comes in, each time the user comes in they find something new too. Repeated tests with individuals is a method that is rarely covered except in perhaps user-centered and participatory design approaches.
Why can't you quantify usability problems found? Isn't it just a matter of counting? More users yields diminishing returns in number of problems found, and with 15 users you basically find all problems that you will find using any larger number of users. That sounds sensible to me. What am I missing?
The issue is with false negatives (not finding real problems). It is hard to say that 100% of issues were found, because how can you quantify the number of problems that weren't found?
The claim of this article that you can somehow quantify the percentage of usability problems is rather absurd. How can you quantify the total number of usability issues in a program a priori?
Also, I don't think that graph ever reaches 100%, I think it approaches 100% (but then I'm not a math-whiz, so I will gladly accept correction).
They've successfully performed statistical sleight of hand. The assumption that you can quantify the total number of defects is untenable, but they leave that part out of their article and just show the nice charts and math to provide strength to their broken hypothesis.
I think this is unfortunate because I think that there is probably some great insights in this article and the intent is good.
Usability and Human-Centered Computing/Human-Computer Interaction has a tendency to suffer from this type of "pseudo-science" of using fuzzy statistics to present great sounding finding. I was made aware of this trend by the following article:
Wayne D. Gray and Marilyn C. Salzman, "Damaged merchandise? A review of experiments that compare usability evaluation methods", Human-Computer Interaction, vol. 13, pp. 203-261, 1998.
Well the idea of a usability problem that is never found sounds pretty metaphysical to me. I mean - if no one actually experience the problem, then how is it a problem? Also, how can a problem experienced by a user be a false positive?
Nielsen just claims that if you test with 15 users, you can be pretty sure you have found all problems you will ever find - you most likely won't find any new problems by using an additional 15 or 50 tests. (But obviously you cannot be 100% sure.)
But whether his numbers are based on sound research I wont judge.
The problem isn't with Nielsen's recommendations, it is his methods. He presents his findings as being based on scientific research, but unfortunately his methods are deceitful.
Have you ever seen the infomercial for "Dual Action Cleanse" with Klee Irwin? Irwin and Nielsen use very similar methods. The make claims that pass the common sense test, then they use something resembling science to "prove" that they are spouting fact. They are both trying to sell you something, Nielsen has just found a much more profitable customer.
I imagine selling super-laxatives to homebodies and the sleep deprived isn't anywhere as lucrative to selling consulting services to businessmen.
The problem with Nielsen is his "research" is dangerous to the field of HCI and usability. Anybody that uses his "findings" to support their own research is building a house on a broken foundation.
For his business clients this "academic navel gazing" doesn't matter, they probably see the results they were looking for (since he already told them exactly what to look for). Irwin's customers probably get the results they were after too, and whether or not the FDA monitors the claims made by "herbal supplements" probably doesn't bother them much either.
Firstly, with the simplistic view that there are a finite number of problems - how do you know that you've reached 100%? Isn't the point of getting more people in because you don't know about all the problems? So how can you ever say "I've found 100%"?
Next problem is how do you define a universal usability issue? What might cause problems for one person is a feature for the next. I suppose you could argue you've at least identified a concern, but even then you still need a very large n to say with significant certainty you've found most problems. But it's only ever certainty, unless it's the world's simplest interface.
From our experience, this has been correct. It may be better said that a lot of time with one user walking through and testing the UX of a site is worth more than several shorter tests with more people. If you start multiplying that deep dive with fewer users, you'll quickly see the major UX fixes that are needed.
Additionally, the most obvious UX fixes are the ones that will greatly improve your site anyway. As more UX fixes that are suggested, you'll find that they are a product of differentiated user preferences than needed UX fixes. That's not a UX fix. It's having certain parts of your site customizable in look and feel.
I think this is incorrect for social software, where the dominant user interactions are between users, mediated by the culture. Traditional usability testing bares mainly upon user interaction with software. There are usually a limited set of goals for users, and so a sample of five will test many of them. That's not true for social software.
Google targets people using screeners that have existing social networks and brings them in to trial new applications that make use of the existing network.
It probably depends on how "focused" your software is. If you have a very specific user in mind, you can test against as few as 3 users that fit the description, just to see if your specific user actually works the way you thought. If your software is trying to appeal to a wide range of users you might have to go up to 15 or even more.
First let me say I've never done usability testing.
But from what I've read, the right way to do it would be multiple distinct tests, each with a group of 3-5 users, with one group for each major user category. Lots of tests with small user groups is much better than fewer tests with more users.
I don't think 5 users is a right sample size. Lets face it..if you are a techie, chances are you'll get other techies to test your stuff. So whats going to happen when a 80 year old man logs onto your site and does something to crash it?
You know that combination of inputs that you would never suspect a sane person would ever want to do?
My partners and I have been building a family intranet service (Kinverge.com). As soon as we had a working prototype together, we each sat down with various members of our family and asked them to perform several tasks on the site (register, invite others, add photos, post a message, etc.)
After the initial instructions, we kept our mouths shut and simply took notes. And this is important -- don't jump in and try to help when the tester gets confused or lost. Seeing the actions users make when they're lost is just as important as seeing how they got lost in the first place.
The notes we gathered during these sessions later helped us decide what changes to make to the site and in what priority.