Problem Statement: Application of Sequential Design and Testing Methods and Adaptive Sampling Methods to the Design and Evaluation of Usability Tests
BLS behavioral scientists often carry out usability tests intended to identify and correct problems with, e.g., questionnaires and other instruments used for data collection, and websites used for data dissemination. These instruments or websites are generically called "interfaces." The standard approach is to ask several potential users to attempt to use the interface and then identify specific problems they encountered during the attempted use. Through sequential identification and correction of these problems, the researcher intends to produce an improved interface.
For some general background on usability testing, see, e.g., Blair and Conrad (2005), Conrad and Blair (2004), Nielsen and Landauer (1993), Nielsen (1994) and references cited therein. For the current discussion, three important questions in this literature are:
Issue: To what extent can methods of sequential and adaptive experimental design, adaptive sampling, capture-recapture sampling methods, or response-surface experimental design shed some light on questions (A)-(C) above?
Within the statistical literature, there is a substantial body of work on sequential and adaptive design of experiments, and in the interim and final analyses of such experiments. This literature has arisen primarily in biostatistics and especially in the sequential design of clinical trials, e.g., Müller and Schäfer (2001), Rosenberger (1996, 2002); Wei, Su and Lachin (1990); Yao and Wei (1996) and references cited therein. Much of this literature is focused on, e.g., comparison of two specific medical treatments or other comparative work that is qualitatively different from the usability testing framework encountered at the BLS. However, it appears that much of the underlying mathematical structure developed in the sequential and adaptive literature could potentially be applicable to usability testing.
In addition, there is some related work specifically in the literature on software testing, e.g., Dalal and Mallows (1988).
In addition, there is a substantial statistical literature on adaptive sampling. See, e.g., Christman and Feng (2001); Seber and Thompson (1996); Schwarz and Seber (1999) and references cited therein. This literature covers several complex topics, but a simple motivating example is the estimation of the total number of members of a species in a given area, when the members of the species tend to move in herds of unequal sizes. Stated in a slightly more abstract form, much of the literature considers estimation of population totals and identification of relationships (or links) among members or groups of members in the presence of population clustering and linking.
Some usability testing sub-topics have some features similar to those in capture-recapture methodology (e.g., Alho, 1994; Alho et al., 1993; Bunge and Fitzpatrick, 1993; Ding and Fienberg, 1994; Pollock et al., 1994; Wolter, 1990) and response surface methodology (e.g., Myers and Montgomery, 2002; Khuri, 1996; and references cited therein).
The author thanks John Dixon for helpful comments on an earlier draft of this topic statement. The views expressed here are those of the author and do not necessarily represent the policies of the Bureau of Labor Statistics.
Last Modified Date: January 06, 2006
