Imagine being able to hire someone to wrangle up a herd of users to provide you with critical information about your target demographic, or having enough funds to have your own, private department, doing the same. Imagine being able to track details as minute as individual bullets in-game, or having a lab where you could learn exactly how people are interacting with something you created, and how you could make it even better.
Odds are, if you’re an indie developer, you have exactly none of this.
There are a number of articles which go into detail about why all developers, not just large companies, should always perform user testing (see “Further Reading” below). In short: User testing is the investigation of user experiences, and user tests are experiments in which the scientific method is applied to objective experiences, and by treating design decisions as hypotheses, we can design experiments which give us actionable data for creating the best user-experience possible.
Ⓒ Cartoon Network
Independent developers are limited by a distinct lack of several critical resources. Namely: space (literally or virtually) in which to perform user testing, time (a limited resource when it seems as if the time being used could be “better spent” programming) and access to certain types of hardware, testing software, and test subjects. However, while these limitations may seem daunting, they are not impossible to overcome. It is possible to design and run efficient user tests so long as those performing them are capable of objectively answering three critical questions:
- What resources do you have available?
- What information do you need?
- How will the information be used?
Only then should you move on to choosing a testing method.
Before jumping into the testing options themselves, a note on finding participants: You do not need as many people as you think you do.
Possibly one of the most intimidating aspects of conducting user testing is the idea of just finding test participants in the first place. The idea of finding people who are both willing and able to provide your team with the information you need can be daunting, but it turns out you can begin testing, and gaining valid insights from the results, with as few as five people. According to research conducted by Robert Virzi and subsequent investigations by Jakob Nielsen and Thomas K. Landauer, the relationship between the total number of usability problems (N) and the number of problems found by a single user (L), with n being the number test subjects, can be modeled as follows:
N (1-(1- L )n)
Nielsen and Landauer found L to be, on average, ~31%, which means a test group of 5 should be able to identify ~85% of usability issues present at the time of testing. It is worth noting, however, that while Nielsen has claimed this means one should only test with 5 users (but do multiple tests), this statement is based on the assumption that those five users are both an accurate representation of your target population and actually capable of doing their “job”. Unfortunately, this assumption does not always hold true, it is the opinion of this author that developers should always strive for as much input, and as many test participants, as possible. (See also: The Sample Size Calculator For Discovering Problems In a User Interface)
To find them, start with the people you already have available: friends, family, associates, and even strangers to whom you may only be connected via friends-of-friends or social media. When it comes to things like surveys, a large social network of people who are invested in you, or your company, can bring in far more participants than advertising alone. As for in-person testing, consider incentivizing participation through both short-term incentives (e.g. pizza), the promise of future rewards (e.g. their name in game credits for return-participants), and through the proper application of classical conditioning (e.g. “Participants will be entered to win [insert some prize here]”).
Designing User Tests
Now that that’s out of the way, let’s move on to the fun stuff. The following is a basic overview of five traditional testing options: Heuristic Evaluation, Paper Prototyping, Surveys, Card Sorting, and Direct Observation. Which is most appropriate for a limited budget? That depends entirely on your answers to those three questions from earlier.
What It Is: An inspection method in which “experts” check UI design against a list of pre-established criteria*. Heuristic evaluation should be used in addition to actual playtesting-based experiments, but it is demonstrably useful for preventing common mistakes, and identifying those which arise during development.
Useful for: Keeping in mind “best practices” while designing interfaces. Be sure, however, to bring in outside evaluators, preferably experts, at logical break points in order to avoid bias. Consider also developing your own lists of heuristics if you find yourself in a situation where they may be useful to your project or company as a whole.
- Heuristic Evaluations and Expert Reviews - Usability.gov
- How to Do a Heuristic Evaluation with Scores - Ben Judy
- Heuristic Evaluation - a Step By Step Guide Article - Nicky Danino
What It Is: Designing and testing user interfaces before you build them. They can range in complexity from literal paper mock-ups of UI elements, to “Wizard of Oz” experiments, in which the interface being tested is actually being controlled by another, unseen, person.
Useful for: Testing new ideas to see whether your users are behaving the way you want/expect them to with the interface before spending time and energy building anything new or unproven.
- POP (iPhone and Android) - Paper-prototype creation app
- Justinmind Prototyper - Responsive design prototyping
- Dottedpaper - Literally a PDF for dotted paper
- More applications and software
What It Is: Test participants are given a set or cards with ideas, statements, or specific terms already written on them, then asked to sort them. In “open” card sorts, participants name these categories, and/or explain why the cards were placed where they were. In “closed” card sorts, (such as the Q-sort, below), subjects are asked to organize the cards into a pre-existing structure.
Useful for: Creating classification systems/organization (e.g. designing menus)
- Actual index cards
- Optimal Workshop - Online card sorting
- ConceptCodify - Online card sorting
- More card-sorting resources
What They Are: A method of collecting quantifiable data about a wide variety of topics. Many online survey tools include tools for data visualization, which facilitates the process of identifying trends or critical pieces of information. Remember to ALWAYS allow for additional input, questions, doodles, etc…
Useful for: Gathering large amounts of data, determining users’ “average” opinions, and for establishing “baselines” in which game elements can be objectively compared. (An example of this final element would be a question which is asked in exactly the same manner at multiple stages of development.)
What It Is: Watching users interact with your product. There are a number of ways to go about doing this, including:
- Naturalistic observation - In which subjects interact with the product without any manipulation, and are not asked to do anything other than, in this case, play the game.
- Verbal reports - In which players are asked to either narrate what they are doing while playing (“Think aloud”)
- Post-play interviews - Either as a stand-alone interview, or while reviewing a video of their playtesting session.
Useful for: Learning how users actually interact with the things you've created; how their behaviors differ from your expectations, and the single most effective method of determining what works and what doesn't.
- Recording user behavior alone: Camera, audio equipment
- Screen recording: Jing, XSplit (PC, $15), FRAPS (PC, $37), QuickTime (Mac only)
- Screen recording and user behavior: Usability Studio (PC, $70), Silverback (Mac, $70, Free trial)
- Individual Interviews - Usability.gov
- Contextual Interview - Usability.gov
- Tracking Psychophysiology on the Cheap - For tracking biometric data
A Note on Group Feedback
Focus groups are not your friend. They rarely produce clear answers, force people to make snap judgments, and create a situation in which people are less likely to say what they really think than if they were speaking to you one-on-one (as the group might start trending towards “groupthink” or just saying what they think the developer wants to hear).
If you find yourself in a situation where you absolutely have to use a focus group for some reason or another, you may be able to mitigate this issue by having participants provide their answers “privately” by writing them down to turn in at the end of the session, or by playing a modified version of “Heads Up 7 Up” (To hide identities). For more information, see the “Further Reading”, below.
Extracting and using the data acquired through user testing is a five step process.
1. Collect and organize the data.
While many of the resources listed above come with bundled data-visualization tools, this is not always an option. Furthermore, third-party data analysis tools such as R Studio, Tableu, and Splunk are often much better-equipped to handle large amounts of data, and allow for far more flexibility in the data-analysis process. (See also: GameAnalytics)
2. Identify trends and critical information.
“Critical information” isn't simply a matter of what’s most “popular”, it can be something as simple as an issue identified by a single test-user. The value of the feedback received may be entirely subjective, and it’s your job to make that call.
3. Dig down to identify the “real” issue(s).
People talk about their experience, not what’s actually happening, and it may be necessary to make a conscious effort to distinguish between the two. What’s more, more often than not, test participants will be unable to suggest solutions to the problems they are encountering. It’s up to the development team to find them.
4. Make an attempt to address the issue(s) you discover.
An example of the situation outlined above took place during the development of Gearbox’s “Borderlands”. After receiving feedback that one of the game’s areas was “boring” because users encountered “too many” enemies while travelling through the region, the team responded by tripling the number of enemies, thus changing the map from a “travel area that had too many enemies getting in the way” to a “combat area”. Not only did users find the area to be more fun after the change, but their expectations about the in-game situation in which they found themselves had been altered entirely.
5. Test again. Then repeat the process.
User testing is an iterative process, and the only way to determine whether or not your solution(s) have worked is by actually going through the process a second time.
Even if your game is “done” once you've published it (i.e. no plans to release DLC, game-changing updates, etc.) it is still advisable to continue collecting as much data as you can from your users, at least passively. This can be accomplished by keeping in touch with previous test participants (who might have insight for patches or future games, based on their experiences) as well as by opening the lines of communication between your consumers and your team. This can be as simple as placing a feedback form on your company website, or by including a similar system in the game itself.
User testing is critical, especially when you don’t have resources to spare. The long-terms benefits far outweigh the short-term costs, and if you do it right, you'll make a better game without breaking your budget.
On User Testing
- How Gearbox's 'Truth Team' outwitted Borderlands feedback - Griffin McElroy
- Infographic: User Testing on Any Budget - UserTesting.com
- User Research for Indie Games: Playtesting on Morphopolis - Seb Long
- “The Xbox Science Machine” - Scott Butterworth, Official Xbox Magazine May 2014 (Print only)
- Deriving A Problem Discovery Sample Size - Jeff Sauro
- Opinion: Some Hows And Whys Of Usability Testing - Emmeline Dobson
- Usability Breakthroughs: Four Techniques To Improve Your Game - Shawn Stafford, Eric Preisz, and Adams Greenwood-Ericksen
- How Many Test Users in a Usability Study? - Jakob Nielsen
- Campbell, Donald T., Stanley, Juilian C. Experimental and Quasi-Experimental Designs for Research. Chicago, IL: Rand McNally (Print)