Usability testing of a voice interface

As part of a client driven academic project, my team and I conducted usability tests of a voice user interface. We presented our findings and redesign recommendations to our client stakeholders – an esteemed insurance company based out of Boston, Massachusettes.

When the conventional testing methods proved ineffective for testing a voice interface, we came up with a new testing methodology called Voice Visualization and Retrospective Walkthrough.

My team and I have presented our new methodology at the UXPA Boston, 2018.


Location: Bentley University, Waltham, Massachusetts

Duration: 3 months


My Testing and Assessment course team (Anderson Blanton, Amanda Holmes, Cameron Cross and I) were consultants to an esteemed insurance company that has recently established a presence on the Alexa Skill platform, accessible through Amazon Echo devices.

We were assigned to conduct individual expert reviews and usability tests of the First Notice of Loss (FNOL) process via Alexa Skill to determine the current usability of the application. My team worked very closely with the heads of the UX Design and Development teams at the insurance company to test the efficacy and gain first impressions of their newly designed interaction on a voice user interface.



A few meetings with the clients helped us understand their business goals and expectations for the review. We aligned these with general usability goals and voice interface industry best practices to design a set of heuristics to test the product against.


Each of us performed a heuristic review of the interaction, analyzed each other’s individual findings to minimize evaluator bias and collected them in a cohesive report. We used this report to triangulate results from the usability study and present recommendations to our client.

I designed the “verbal screenshot” visual to present our findings in a compelling way to the  stakeholders.


The Challenge:

  • Testing a voice user interface proved challenging with the conventional methods used for graphical user interfaces like Think Aloud and Cognitive Walkthrough.
  • Participants couldn’t talk over Alexa and we couldn’t pause the interaction to ask for feedback because that would break their flow.
  • We also didn’t want to rely on participant feedback solely from their memory of the interaction.

Our Solution:

To get accurate and insightful feedback we came up with the Voice Visualization technique. After
participants completed their task we asked them to draw out their interaction with Alexa based on
their experience with the task.

  • This gave participants:
  • The time to decompress after completing what was a potentially stressful task
  • The opportunity to transfer their emotions to a physical artifact, a drawing, that the moderator could then use to launch discussion.
  • A way to narrate their emotional experience with the interaction without increasing their cognitive load by having to accurately express their emotions solely from memory.


We recruited 8 participants with a wide range of experience with filing insurance as well as using voice user interfaces.


We found strong relationships between the Voice Visualization drawings, task accuracy and efficiency, participant’s feedback, and the survey results. We triangulated these findings to provide our client with design recommendations along with a report of proof of concept – which was one of their business goals.


Participants’ ratings of Alexa’s friendliness were consistent with their drawing type:

Users who considered Alexa simply a device rated her less friendly which correlated to the relatively higher errors encountered, while users who drew a person rated her as more friendly and experienced a fairly smooth interaction with the least errors

Based on the qualitative and quantitative data we collected from the usability test sessions, we assigned severity rating to the heuristics violated (determined during the heuristic review) and then multiplied the rating with the frequency of errors to determine the “score”.

This score helped us organize our findings into four categories to present to the client in order of priority