Evaluation, UX Testing and Wizard of Oz

Summary

Evaluation matters because designers are not designing for themselves. Evaluation supports iterative development, helps teams fail early, and provides evidence for design decisions.

Three broad evaluation types are usability testing or UX evaluation with users, field evaluation in natural settings, and analytical evaluation by experts without users. Usability is often described through learnability, efficiency, memorability, errors, and satisfaction.

Greenberg and Buxton warn that traditional usability testing can be harmful too early because it focuses on measurable errors, speed, and negative findings, which can prematurely kill promising but incomplete concepts. Early UX evaluation should also examine emotions, enjoyment, aesthetics, expectations, context, and non-instrumental experience.

Anticipated UX concerns expected experiences, feelings, needs, and wishes before the final product exists. Prototype fidelity can be mixed across appearance, functionality, interactivity, data, and spatial or physical structure. Wizard of Oz evaluation lets users interact with a seemingly working system while a hidden human simulates part of the functionality.

Typical evaluation planning includes consent, introduction, background questions, icebreaker, tasks, post-questionnaire, semi-structured interview, documentation, and participant debriefing. Roles can include facilitator/interviewer, observer, documenter, and operator.

Key Terminology

Usability testing: controlled evaluation of how easy a system is to use.
UX evaluation: evaluation of user experience, including emotion, aesthetics, meaning, and context.
Field evaluation: evaluation in natural use settings.
Analytical evaluation: expert evaluation without users.
Learnability: ease of first-time task completion.
Efficiency: speed and effort after learning.
Memorability: ease of returning after time away.
Errors: number, severity, and recoverability of mistakes.
Satisfaction: pleasantness of use.
Anticipated UX: expected experience before actual use.
Mixed-fidelity prototype: prototype where different aspects have different realism levels.
Wizard of Oz: hidden human control simulates system intelligence or automation.
Think aloud: participant verbalizes thoughts while performing tasks.
Pilot study: trial run used to refine procedure, tasks, and questions.