Michael Good, Thomas M. Spine, John Whiteside, and Peter George
Digital Equipment Corporation
Nashua, New Hampshire, USA
Originally published in Proceedings of CHI ’86 Human Factors in Computing Systems (Boston, April 13-17, 1986), ACM, New York, pp. 241-246. Included here with permission. Copyright © 1986 by ACM, Inc.
Abstract
A unified approach to improved usability can be identified in the works of Gilb (1981, 1984), Shackel (1984), Bennett (1984), Carroll and Rosson (1985), and Butler (1985). We term this approach “usability engineering,” and seek to contribute to it by showing, via a product development case study, how user-derived estimates of the impact of design activities on engineering goals may be made.1
1. Background
This paper introduces user-derived impact analysis as a tool for usability engineering. User-derived impact analysis arose and is discussed here in the context of applying usability engineering to a specific product. Thus, the paper is both a case study of the usability engineering approach and a contribution to the approach.
Usability engineering is a process, grounded in classical engineering, which amounts to specifying, quantitatively and in advance, what characteristics and in what amounts the final product to be engineered is to have. This process is followed by actually building the product, and demonstrating that it does indeed have the planned-for characteristics.
Engineering is not the process of building a perfect system with infinite resources. Rather, engineering is the process of economically building a working system that fulfills a need. Without measurable usability specifications, there is no way to determine the usability needs of a product, or to measure whether or not the finished product fulfills those needs. If we cannot measure usability, we cannot have a usability engineering.
Usability engineering has the following steps:
- define usability through metrics,
- set planned levels of usability,
- analyze the impact of design solutions,
- incorporate user-derived feedback, and
- iterate until the planned usability levels are achieved.
This formulation of usability engineering relies heavily on the works of Gilb (1981, 1984), Shackel (1984), Bennett (1984), Carroll and Rosson (1985), and Butler (1985). Gilb has provided the major categories above. Shackel has outlined an approach to the measurement of usability. Bennett has discussed the setting of planned levels and managing to meet planned levels. Carroll and Rosson have described the use of usability specifications in iterative development. Butler has written a case study involving the setting of planned levels. We seek to contribute to this valuable work.
2. A Case Study of Impact Analysis
Usability engineering was applied to the MicroVMS Workstation Software (VWS) provided with the VAXstation I workstation.2 This software product provides users access to a windowing environment on a high-resolution, bit-mapped, single user workstation.
2.1. Define Usability Through Metrics and Set Planned Levels
In an effort to improve the initial usability of the VWS user interface from Version 1 (V1) to Version 2 (V2), measurable goals were set for user performance on a benchmark task, and for the evaluation of the system by these users after completing the benchmark task. These goals, which were worked out jointly between human factors specialists and the VWS developers, are shown in Table 1. The format of this table is taken from Gilb (1981) and Bennett (1984). It shows the various attributes that define usability for the product. Each attribute is defined in terms of a measuring technique and a metric (operational definition). Further, worst case, planned level, and best case levels are identified.
Table 1: Usability Attribute Specification for MicroVMS Workstation Software
Attribute | Measuring Technique |
Metric | Worst Case |
Planned Level |
Best Case |
Now Level |
---|---|---|---|---|---|---|
Initial performance |
Windowing benchmark task |
Work speed |
Same as V1 |
20% > V1 |
3 times V1 |
Same as V1 |
Initial evaluation |
Attitude questionnaire |
Semantic differential evaluation |
0 | .25 | 1 | -0.5 – 0.5 |
Note: Now levels for evaluation based on existing systems such as reported by Whiteside et al. (1985).
This particular table shows that the planned level for initial performance was a 20% improvement over the V1 interface, as measured by a work-speed metric (Whiteside et al., 1985) on a benchmark task of our devising. The benchmark task involves creating windows, attaching the keyboard to windows, printing the contents of windows, and moving, pushing, and popping windows. Work speed is a measure of the percentage of the benchmark task completed per unit of time.
The planned level for initial evaluation was 0.25 on a scale of -3 to +3, using a semantic differential attitude questionnaire that we devised. The questionnaire was administered after users performed the benchmark task.
The performance (work speed) goal is expressed in terms of improvements over V1, whereas the evaluation goal is expressed in terms of an absolute score on the evaluation questionnaire.
2.2. Analyze the Impact of Design Solutions
After defining usability and setting planned levels, there were six weeks left in which to make changes to the V2 software. In order to meet the planned levels for V2, the software development group needed timely and useful information about the usability of the existing V1 software.
Impact analysis (Gilb, 1984) is a method of estimating the probability that a set of proposed design solutions (activities) will result in successfully meeting the engineering goals of a project. Impact analysis is a technique for estimating which solutions will be most effective for meeting planned levels for various attributes, as well as estimating the likelihood that the solutions will be sufficient for meeting these attributes. It is an aid for deciding how to allocate scarce engineering resources.
Here we show how an impact analysis may be performed, based not on subjective estimates of the effectiveness of certain solutions, but on data derived from user behavior. User-derived impact analysis provides a method for analyzing user behavior and presenting the results in a form that is useful for engineering groups. User-derived impact analysis involves:
- measuring current usability levels,
- analyzing the sources of user difficulty,
- predicting possible usability improvements, and
- rank-ordering the difficulties.
2.2.1. Measure Current Usability Levels
Since the user performance goal for VWS Version 2 was expressed in terms of performance on VWS Version 1, the first step was to measure V1 user performance. The initial usability of the V1 interface was measured by testing six experienced VAX/VMS users, who had never used mouse-oriented windowing software, on a simple windowing benchmark task. Table 2 shows their performance and evaluation scores.
Table 2: VWS V1 Initial Usability
Time taken |
% of task completed |
Work speed |
Evaluation rating |
|
---|---|---|---|---|
S1 | 21:02 | 100 | 14.3 | 2.0 |
S2 | 54:34 | 87 | 4.8 | 1.2 |
S3 | 25:20 | 94 | 11.1 | 1.6 |
S4 | 54:38 | 47 | 2.6 | 1.6 |
S5 | 40:26 | 94 | 7.0 | 2.0 |
S6 | 14:40 | 100 | 20.5 | 1.8 |
Mean | 35 06 | 87 | 10.0 | 1.7 |
SD | 17:19 | 20 | 6.6 | 0.3 |
Performance was measured using the work-speed performance score introduced by Whiteside et al. (1985, p. 187). In this case, the performance is shown in terms of percentage of the task completed per 3-minute period (3 minutes is the time it takes a well-practiced expert to complete the task). It is computed by multiplying the percentage of the task completed by the 3-minute constant and then dividing by the time taken on the task. High scores represent fast performance.
On the average, these users were able to complete 10% of the task per 3-minute period. Thus for V2, an average rate of 12% of the task per 3-minute period would represent the specified 20% improvement in initial performance.
The mean evaluation score of 1.7 was more favorable than the planned level of 0.25 for V2, indicating that no further work needed to be done in this area. Usability engineering requires that if a planned level of an attribute is demonstrably met, then further work on that attribute is not only unnecessary, but undesirable, because such work would involve taking resources away from product attributes whose planned levels were not yet met.
2.2.2. Analyze the Sources of User Difficulty
By reviewing videotapes of the users’ sessions, a list of 13 user/computer interaction problems was compiled. These are shown as the left-hand column of Table 3.
Table 3: Time Spent Due to Problems, in Minutes and Seconds
Problem | S1 | S2 | S3 | S4 | S5 | S6 |
---|---|---|---|---|---|---|
Window positioning | 0:38 | — | 0:21 | 6:51 | 0:10 | 1:42 |
Menu choice off by 1 | 0:17 | — | 0:14 | — | — | |
Confused between 2 menus | 0:19 | 0:37 | 0:58 | |||
Print origin off | 0:39 | 0:23 | 0:22 | |||
Moving window before login | 0:46 | — | 0:11 | |||
Attaching keyboard | 6:41 | 4:44 | ||||
Obscured help window | 0:30 | 3:02 | ||||
Pressing near border | 7:39 | 0:40 | ||||
Inside/outside window | — | 0:23 | ||||
Click/press confusion | 1:30 | |||||
CTRL/S does not light LED | 3:46 | |||||
Deleting windows | 6:53 | |||||
Get menu when moving window | 0:55 | |||||
Total problem time | 2:09 | 14:50 | 3:44 | 18:46 | 8:58 | 2:44 |
Total task time | 21:02 | 54:34 | 25:20 | 54:38 | 40:26 | 14:40 |
After identifying the problems, the videotapes were viewed again to estimate how much time was spent in each problem. Table 3 also shows these estimates for each subject. Dashes indicate that the subject encountered the problem, but that the amount of time spent due to the problem was negligible.
2.2.3. Predict Possible User Performance Improvements
Table 4 shows the estimated amount of increase in the work-speed performance score for each subject if all of the problems were solved. The estimate is made by subtracting the total problem time from the total task time and computing a new estimated work speed based on the reduced time. This assumes that there would be no interactions among the individual problems and their solutions. That is, it is assumed that solving one problem, say window positioning, would not interact with another problem, say moving window before login. Gilb recommends this simplifying assumption in order to make the impact analysis calculations more practical.
Table 4: Effect of Solving Problems on Work Speed
Measured work speed |
Est. increment if all problems solved |
New (estimated) work speed |
|
---|---|---|---|
S1 | 14.3 | 1.6 | 15.9 |
S2 | 4.8 | 1.7 | 6.5 |
S3 | 11.1 | 1.9 | 13.0 |
S4 | 2.6 | 1.3 | 3.9 |
S5 | 7.0 | 1.9 | 8.9 |
S6 | 20.5 | 4.6 | 25.1 |
Mean | 10.0 | 2.2 | 12.2 |
Overall, these calculations predict a 22% increase in VWS usability as measured by initial work speed. This 22% predicted improvement corresponds closely to the 20% planned level of improvement shown in Table 2.
2.2.4. Rank-Order the Difficulties
Table 5 shows the thirteen windowing problems ranked by their impact on the improved work-speed scores when totaled across all six subjects. The window positioning and pressing near border problems have the largest effects on initial usability, and together they account for more than 50% of the total impact.
Table 5: Windowing Problems Ranked by Impact
Problem | Relative impact on initial use |
V2 software change implemented? |
---|---|---|
Window positioning | 32 % | yes |
Pressing near border | 23 % | yes |
Attaching keyboard | 7 % | yes |
Print origin off | 7 % | no |
Deleting windows | 6 % | yes |
Click/press confusion | 6 % | no |
Confused between 2 menus | 5 % | yes |
Get menu when moving window | 5 % | no |
Menu choice off by 1 | 3 % | yes |
CTRL/S does not light LED | 2 % | no |
Inside/outside window | 2 % | no |
Obscured help window | 1 % | yes |
Moving window before login | 0 % | yes |
This rank-ordered list is an user-derived estimate of the percentage impact of various design solutions on the goal of improved initial work speed. It corresponds to a single row in a Gilb impact analysis table.
This completed the user-derived impact analysis portion of the project.
2.3. Incorporate User-Derived Feedback
Once the evaluation of the V1 software was completed and the ranked list of V1 windowing problems was produced, the information was presented to the VWS engineering group. The engineering group then evaluated the merits of various technical solutions to the specified windowing problems, based on both the cost and potential impact of implementing the solution. Once this analysis was completed, those solutions with the highest relative impact and the smallest demand on engineering resources were incorporated into the V2 software. The right-hand column in Table 5 shows those problems for which specific software solutions were made. Notice that while not all of the recommended changes were made, four of the top five were.
2.4. Iterate Until Planned Usability Levels Are Achieved
Usability engineering requires demonstration that the planned levels of usability attributes have indeed been achieved. We tested an early version of the revised V2 software to see if the software met the planned levels of usability. Six subjects were run on VWS V2 using the same procedure as for the V1 subjects. Again, all subjects were experienced users of the VAX/VMS operating system, but none had used the VAXstation I or other mouse-oriented systems before.
Table 6 shows the work.-speed and attitude scores for the VWS V2 subjects, and the means and standard deviations for both the V1 and V2 subjects. The mean performance score for the V2 subjects is 37% better than the mean for the V1 subjects: 10.0 for VI., 13.7 for V2. All of the VWS subjects, both for Vi and V2, gave the system a positive evaluation, with mean evaluation scores of 1.7 for V1 and 1.3 for V2.
Table 6: VWS V2 Initial Usability
Time taken |
% of task completed |
Work speed |
Evaluation rating |
|
---|---|---|---|---|
S7 | 18:06 | 100 | 16.6 | 0.8 |
S8 | 25:22 | 93 | 11.0 | 1.0 |
S9 | 15:48 | 94 | 17.8 | 1.4 |
S10 | 21:00 | 93 | 13.3 | 1.8 |
S11 | 40:20 | 100 | 7.4 | 2.0 |
S12 | 18:26 | 100 | 16.3 | i.0 |
V2 Mean | 23:10 | 97 | 13.7 | 1.3 |
V2 SD | 9:01 | 4 | 4.0 | 0.5 |
V1 Mean | 35:06 | 87 | 10.0 | 1.7 |
V1 SD | 17:19 | 20 | 6.6 | 0.3 |
The 37% improvement in initial work speed from V1 to V2 is almost twice the amount of improvement planned. While evaluation scores declined slightly from V1 to V2, the initial evaluation of 1.3 is still much higher than the planned level of 0.25. Both initial usability goals for the software were met before field test.
Software changes were made in response to eight of the thirteen interface problems reported in the V1 software. Since not all suggested changes were made, we would have estimated that these changes would lead to a 17% improvement in initial work speed. This is less than half of the 37% improvement actually observed. The discrepancy is not surprising given that this was our first attempt to quantitatively estimate the effect of software changes on usability. One possible explanation is that we estimated the change in time needed to complete the task, but did not estimate changes in the percentage of the task completed.
Most of the problems that had bothered V1 users, and had been targets of change in the interface, did not recur in the V2 testing.
3. Summary
This paper has described a method for evaluating the predicted effectiveness of a set of solutions on a set of goals. The method is based on the impact analysis of Gilb, but seeks to ground the estimates in actual user-performance data.
The VWS development organization responded very favorably to the user-derived impact analysis. It presents proposed changes in a rational way, defusing many of the issues of personal taste that often cloud user-interface design efforts. Initial goal-setting allows a shared definition of success with respect to usability, and impact analysis indicates what steps appear to be necessary to achieve success. The developers traded off usability against ease of implementation: not all the changes were made. However, this was done with full awareness of the likely impact on the usability goals. Using usability engineering with user-derived impact analysis, the developers met these goals for user performance and user satisfaction, on schedule.
Acknowledgments
Alana Brassard recruited the subjects and administered the experimental sessions. Stan Amway participated in setting the usability goals.
Footnotes
1The views expressed in this paper are those of the authors and do not necessarily reflect the views of Digital Equipment Corporation.
2VAX, VAXstation, and VMS are trademarks of Digital Equipment Corporation.
References
Bennett, J. L. Managing to meet usability requirements: establishing and meeting software development goals. In Visual Display Terminals, J. Bennett, D. Case, J. Sandelin, and M. Smith, Eds., Prentice Hall, Englewood Cliffs, NJ, 1984, pp 161-184.
Butler, K. A. Connecting theory and practice: a case study of achieving usability goals. In Proc. CHI ’85 Human Factors in Computing Systems (San Francisco, April 14-18, 1985), ACM, New York, pp. 85-88.
Carroll, J. M. and Rosson, M B. Usability specifications as a tool in iterative development. In Advances in Human-Computer Interaction, Vol. 1, H. R. Hartson, Ed., Ablex, Norwood, NJ, 1985, pp. 1-28.
Gilb, T. Design by objectives. Unpublished manuscript, 1981. Available from the author at Box 102, N-1411 Kolbotn, Norway.
Gilb, T. The “impact analysis table” applied to human factors design. In Proc. Interact ’84, First IFIP Conference on Human-Computer Interaction (London, September 4-7, 1984), Vol. 2, pp. 97-101.
Shackel, B. The concept of usability. In Visual Display Terminals, J. Bennett, D. Case, J. Sandelin, and M. Smith, Eds., Prentice-Hall, Englewood Cliffs, NJ, 1984, pp. 45-87.
Whiteside, J., Jones, S., Levy, P. S. and Wixon, D. User performance with command, menu, and iconic interfaces. In Proc. CHI ’85 Human Factors in Computing Systems (San Francisco, April 14-18, 1985), ACM, New York, pp. 185-191.
Copyright © 1986 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.
This is a digitized copy derived from an ACM copyrighted work. ACM did not prepare this copy and does not guarantee that is it an accurate copy of the author’s original work.