Software Usability Engineering

Michael Good
Digital Equipment Corporation

Usability is an increasingly important competitive issue in the software industry. Software usability engineering is a structured approach to building software systems that meet the needs of users in various environments with varying levels of computer experience. This approach emphasizes observation of people using software systems to learn what people want and need from software systems. The three principal activities of software usability engineering are on-site observations of and interviews with system users, usability specification development, and evolutionary delivery of the system. These activities are parallel steps in the development cycle.

Computer system designers have not always adopted a user-centered perspective on software design. Instead, many designers resolved design questions about the human-computer interface by using introspective criteria such as personal preference or conceptual appeal.

This introspective approach to user-interface design might produce a usable system when software engineers represent actual users. However, computer systems today are being built for a wide range of people whose needs often have little in common with the needs of system designers.

In response to market demand for systems that satisfy a growing and varied user community, usability is becoming an increasingly important competitive issue. Designers are striving to create computer systems that people can use easily, quickly, and enjoyably. Indicative of this trend is increased membership since 1982 in professional groups such as the Association for Computing Machinery’s Special Interest Group on Computer-Human Interaction (ACM SIGCHI) and the Computer Systems Group of the Human Factors Society.

Digital’s Software Usability Engineering Group believes that engineers must learn about the needs and preferences of actual users and should build systems to accommodate them. With an understanding of customer environments, an awareness of technological possibilities, and imagination, we have produced many ideas for products that meet users’ needs.

The Software Usability Engineering Process

The role of engineering is to apply scientific knowledge to produce working systems that are economically devised and fulfill specific needs. Our software usability group has adapted engineering techniques to the design of user interfaces. To understand user needs, engineers must observe people while they are actually using computer systems and collect data from them on system usability. Observation and data collection can be approached in the following ways:

Visiting people while they use computers in the workplace
Inviting people to test prototypes or participate in usability evaluations at the engineering site
Soliciting feedback on early versions of systems under development
Providing users with instrumented systems that record usage statistics

Our group uses these methods to gather information directly from users, not through second hand reports. We use these methods to study the usability of current versions of our products, competitive systems, prototypes of new systems, and manual paper-based systems.

Our software usability engineering process evolves as we use it in product development. As of 1987, the process consists of three principal activities:

Visiting customers to understand their needs. By understanding a customer’s current experience with a system, we gain insight into our opportunities to engineer new and better systems. We collect data on users’ experiences primarily through contextual interviews, that is, interviews conducted while users perform their work.
Developing an operational usability specification for the system. We base the system specification on our understanding of users’ needs, competitive analysis, and the resources needed to produce the system. This specification is a measurable definition of usability that is shared by all members of the project team.
Adopting an evolutionary delivery approach to system development. Developers start by building a small subset of the system and then “growing” the system throughout the development process. We continue to study users as the system evolves. Evolutionary delivery is an effective method for coping with changing requirements — a fundamental aspect of the development process.

These three development activities are parallel, not sequential. We do not view user-interface design as a separate and initial part of the development process but as an ongoing process in system development.

These usability engineering techniques apply to most software development environments and are most effective in improving software usability when applied together. However, designers who use any single technique can improve a system’s usability. Our group has used this process in the development of several of Digital’s software products, including the EVE text editor and VAXTPU (Text Processing Utility) software, VAX NOTES software, MicroVMS workstation, VAX Software Project Manager, VAX COBOL Generator software, VAX Language-Sensitive Editor, and VAX DEC/CMS (Code Management System) software.

Visiting Customers to Understand Their Needs

Data collected at the user’s workplace provides insight into what users need in both new and modified systems. During interviews of users actually working with their systems, we ask about their work, about the details of their system interfaces, and about their perception of various aspects of the system. The user and the engineer work together to reveal how the user experiences the system as it is being used. These visits with users are the best way for engineers to learn about users’ experiences with the system.

Ideally, the number of interviews conducted per product depends on how much data is being generated in each succeeding interview. The interview process stops when new interviews no longer reveal much new usability data. In practice, resource and time limitations may stop the interview process before this point. In any event, our approach is to start with a small number of interviews (four or less) with people in various jobs We use these interviews to determine how many and what type of users will be most useful for uncovering new usability data.

Information Gained in Field Studies

Contextual interviews reveal users’ ongoing experience of a system. Other types of interviews, which are not conducted while the user works, reveal users’ summary experience, that is, experience as perceived after the fact. Data on ongoing experience provides a richer source of ideas for interface design than data on summary experience.

For example, data collected from field studies has revealed the importance of interface transparency to users. A transparent interface allows the user to focus on the task rather than on the use of the interface. Our understanding of transparency as a fundamental usability concept comes from an analysis of data on ongoing experience.

Some interface techniques can help keep the user in the flow of work, thus increasing interface transparency. One example can be drawn from a workstation application for desktop publishing. Pop-up menus that appear at the current pointer location create a flow of interaction that reduces mouse movement and minimizes disruption to the user’s task. Users do not have to move their eyes and hands to a static menu area to issue commands, making this an effective interface feature for experienced users.

We will consider using pop-up menus in new workstation software applications when we believe their use will keep the user in the flow of work.

We have developed our understanding of transparency by observing people using a variety of applications in different jobs. Transparency is an aspect of usability that we find across many different contexts. In developing new products, it is also important to consider the diversity of environments in which people will use the system. Different users in different contexts have different usability needs. Some important aspects of user’s context are

Type of work being performed
Physical workplace environment
Interaction with other software systems
Social situation
Organizational culture

All these aspects influence the usability of a system for each individual. As with other products, software systems are used in the field in ways not anticipated by the designers.

Because the context in which a system is used is so important, we interview a variety of users who use particular products to perform different tasks. We look for common elements of usability for groups of people, as well as the distinctive elements of usability for individual users.

Conducting Contextual Interviews

Interviewers bring a focus, or background,¹ to their visits with users. The focus determines what is revealed and what remains hidden during a visit. The engineer needs to enter an interview with a focus appropriate to achieve his goals. For example, in some visits an engineer may need to look for new product ideas; in others, the engineer may need ideas to improve an existing product.

To avoid losing data, interviewers should not try to extensively analyze their data during the session. We use two-person teams, where one team member concentrates on the interview and the second member records the data. Contextual interviews rapidly generate large amounts of data. The data derives from an understanding of a user’s experience of a system, as shared by a user and an interviewer. To generate such data, interviewers need to concentrate on their relation ships with users and understand what users do during the session.

Whenever possible, we videotape interviews. If users are unwilling to have their work videotaped, we audiotape the session while the second team member takes detailed notes to supplement the taped information. The two team members meet after the interview to reconstruct an accurate record of events.

Even without any taping or note-taking, engineers can learn a great deal from user visits. Although the detail from the interview may not be remembered, the understanding gained during the interview is still a valuable source of insight.

Developing an Operational Usability Specification

Studying users provides a rich, holistic under standing of how people experience software systems. However, each person will have his or her own interpretation of user experience as it relates to usability. Similarly, a team of people working on a project will find that each member has a different understanding of what “usability” means for that product. Keeping these understandings private and unarticulated can have two undesirable results. First, team members work toward different and some times mutually exclusive goals. Second, the team does not have a shared criterion for what it means to succeed or fail in meeting users’ needs.²

Our group constructs shared, measurable definitions of usability in the form of operational usability specifications. These specifications are an extension of Deming’s idea of operational definitions.³We based our usability specifications on the system attribute specifications described by Gilb⁴and Bennett.⁵A usability specification, described in the following section, includes a list of usability attributes crucial for product success. Each attribute is associated with a measuring method and a range of values that indicates success and failure.

Constructing a Usability Specification

The development of the VAX NOTES conferencing system provides an example of a usability specification.⁶ Table 1 is a summary of the usability specification for the first version of the VAX NOTES system. Five items are defined for each attribute: the measuring technique, the metric, the worst-case level, the planned level, and the best-case level.

Table 1: Summary Usability Specification for VAX NOTES Version 1.0

Usability Attribute	Measuring Technique	Metric	Worst- Case Level	Planned Level	Best- Case Level
Initial use	NOTES benchmark task	Number of successful interactions in 30 minutes	1-2	3-4	8-10
Initial evaluation	Attitude questionnaire	Evaluation score (0 to 100)	50	67	83
Error recovery	Critical- incident analysis	Percent incidents “covered”	10%	50%	100%

The measuring technique defines the method used to measure the attribute. Details of the measuring technique (not shown in Table 1) accompany the brief description in the summary table. There are many different techniques for measuring usability attributes. We have usually measured usability attributes by asking users to perform a standardized task in a laboratory setting. We can then use this task as a benchmark for comparing usability attribute levels of different systems.

In the VAX NOTES case, we chose to measure initial use with a 14-item benchmark task that an expert VAX NOTES user could finish in three minutes. Initial users were Digital employees who had experience with the VMS operating system and the Digital Command Language but not with conferencing systems. The users completed their initial evaluations using 10-item Likert-style questionnaires after they finished the benchmark task. Error recovery was measured by a critical-incident analysis. In the analysis, we used questionnaires and interviews to collect information about costly errors (critical incidents) made by users of the prototype versions of the VAX NOTES software.

The metric specifies how an attribute is expressed as a measurable quantity. Table 1 shows the definitions of the metrics in the VAX NOTES specification. For the initial-use attribute, the metric was the number of successful interactions in the first 30 minutes of the bench mark task. For the initial-evaluation attribute, we scored the questionnaire on a scale ranging from 0 (strongly negative) to 100 (strongly positive), with 50 representing a neutral evaluation.

For error recovery, the metric was the percent age of incidents reported with the prototype systems that would be “covered” (i.e., eliminated) by changes made in version 1.0 of the VAX NOTES system.

The worst-case and planned levels define a range from failure to meet minimum acceptable requirements to meeting the specification in full. This range is an extension of Deming’s single criterion value, which determines success or failure. It is easier to specify a range of values than a single value for success and failure. Providing a range of values for several attributes also makes it easier to manage trade-offs in levels of quality of different attributes.

The best-case level provides useful management information by estimating the state-of-the-art level for an attribute. The best case is an estimate of the best that could be achieved with this attribute, given enough resources.

For the initial use of VAX NOTES software, we defined the planned level as experiencing 3 or 4 successful interactions in the first half hour of use. We considered 1 or 2 successful interactions to be the minimum acceptable level, and 8 to 10 successful interactions to he the best that could be expected. In practice the actual level was 13 successful interactions, suggesting that we set the levels for this attribute too conservatively.

The planned level for initial evaluation (67) was fairly positive. Users’ neutral feelings were acceptable but negative feelings were not, so we set the worst case at 50. We set the best case at 83, which represented the highest scores we had seen so far when using this questionnaire with other products. The actual tested value was 67, matching the planned level.

We planned an error-recovery level that could cover 50 percent of the reported critical incidents. The worst-case level was set at a fairly low 10 percent, whereas the best case would be to cover all of the reported critical incidents. In practice, 72 percent of the critical incidents were covered, exceeding the planned level.

Many usability specifications provide further detail by including “now” levels and references. Now levels represent current levels for an attribute, either for the current version of the product or for competitive products. References can be used to add more detail, such as describing how the levels were chosen, and to document the usability specification.

User needs and expectations are shaped in part by the marketplace; therefore competitive analyses can provide important data for usability specifications. We have constructed usability specifications that compare the system under development to either the current market leader, the product with the most highly acclaimed user interface in the market, or both. We can also compare the systems by measuring usability on appropriate benchmark tasks.

Limitations of Usability Specifications

Constructing a usability specification helps build a shared understanding of usability among the diverse people working on a development project. However, to achieve a shared understanding, tradeoffs have to be made. Usability specifications represent a constricted and incomplete definition of usability. The analytic definition of usability is necessarily less complete than an individual’s holistic understanding based on observing people use systems.⁷Nonetheless, we deliberately trade off the holistic understanding for the analytic definition because the latter economically focuses our efforts on essential elements of product usability.

If engineers do not understand the needs of users before creating a specification, they risk developing a specification that does not reflect users’ needs. As a result, the product that meets its specification might still be unusable or commercially unsuccessful. Development teams must continually evaluate usability specifications during the development process and make the changes necessary to reflect current information on users’ needs. This approach is part of evolutionary delivery, described next.

Adopting Evolutionary Delivery

Changing requirements pose a challenge in user-interface design as they do elsewhere in software development. Brooks refers to changeability as one of the essential difficulties of software engineering — a problem that is part of the nature of software engineering and that will not go away.⁸

Evolutionary delivery exploits, rather than ignores, the changeable nature of software requirements.⁴This technique has been referred to as incremental development⁸ and as iterative design.⁹We believe that “iterative design” is usually a redundant term in software design. Unless otherwise mandated by external sources, most software design is already an iterative process.¹⁰The waterfall model and similar models of software design are useful for managing project deliverables, but they do not describe what happens in software design and development. Evolutionary delivery takes for granted the iterative nature of the design process, rather than treating iteration as an aberration from textbook methods. Evolutionary delivery is the process of delivering software in small, incremental stages. An initial prototype subset of the software is built and tested. New features are added and existing features refined with successive versions of the system. The prototype evolves into the finished product.

Evolutionary delivery helps to build the project team’s shared understanding of the system’s user-interface design. Contemporary direct-manipulation user interfaces are too rich, dynamic, and complex to be understood from paper specifications. Even simpler terminal-based interfaces are too involved to be understood completely with out being seen in action. Early delivery of subset systems helps everyone on the development team understand the system being designed, making it easier to build a shared vision of the final system.

Early, incremental deliveries also demonstrate project progress in a concrete form. Demonstrating improvements to the system at the user-interface level can be an important factor in maintaining managerial support for a project and continuing availability of resources.

The techniques used to improve system usability during the stages of evolutionary delivery include the following:

Building and testing early prototypes
Collecting user feedback during early field test.
Instrumenting a system to collect usage data
Analyzing the impact of design solutions

These general-purpose techniques can be used independently of an overall usability engineering process They are described in the following sections, some with examples from the evolutionary delivery of the EVE text editor.^{9, 11}

Building and Testing Prototypes

The first step in an evolutionary delivery process is building and testing prototypes. These prototypes effectively test for ease of learning¹² and can provide the germinal product. Prototyping also helps identify potential interface problems while still very early in the development cycle.

From the point of view of usability engineering, the first prototype subset produced should facilitate usability testing. This typically means that the system

Includes only simple versions of the most important and most frequently used features of the product
Is able to complete a simple benchmark task that the designer will use for a preliminary evaluation of the system’s usability attributes
Is useful only for limited testing. not for normal work

If the first prototype is actually useful for normal work, it is probably a larger portion of the project than needs to be delivered at this stage.

The first prototype of the EVE text editor was available three weeks after development began. This prototype tested only the keypad interface. At that point, we had neither implemented nor fully designed the command-line features. To test ease of learning, seven new computer users used EVE in informal laboratory sessions. They performed a standard text- editing task. The tests showed that the keypad interface was basically sound; only minor changes to the basic EVE keypad commands were required. This prototype was the first of 15 versions of EVE that users tested over 21 months.

Because prototypes are not suitable for daily use, they must be tested in controlled conditions. For example, the test might involve asking users to complete a standardized task, where that task is the only one that can be completed using the prototype system. Special equipment can make it easier to conduct these tests and to collect more complete data, but is not necessary. For example, videotaped records can help in later analyses, but as with user visits, we can learn much without them.

For many years we tested prototypes in spare offices, developers’ offices, or users’ offices. Our group now tests most prototypes in our usability engineering laboratory, which is equipped with computer hardware and software, a one-way mirror, and videotaping equipment. The laboratory resources provide greater opportunity for routine testing and elaborate data collection.

Collecting User Feedback during Early Field Test

The earlier a system can be delivered to a group of users for field test, the sooner valuable information will be available to designers. User data collected in the field is usually a richer source of information than laboratory data collected under controlled conditions. Field data takes into account the context in which the system is used.

We use “field test” to describe any version of software distributed to a group of people for use in their work. This definition includes the distribution of early subset versions as well as the later versions commonly referred to as field-test software. Early field testing often begins by giving a usable subset system to users who understand the status of the product and agree to use and evaluate it.

User visits, described previously, are a good way to collect field-test data. Another way to collect user feedback is by electronic communication. Digital’s developers frequently use this effective method by making early field-test versions available on Digital’s private world-wide DECnet network and by encouraging user feedback through electronic mail or a VAX NOTES conference.

Designers of the EVE text editor and VAXTPU software relied on user feedback by means of electronic communication throughout the development cycle. Preliminary versions of the EVE editor were available for daily work six months before external field test began. Overall, we received 362 suggestions from 75 different users. We implemented 212 (or 59 percent) of these suggestions for the version of EVE shipped with the VAX/VMS operating system version 4.2. We received 225 (or 62 percent) of the suggestions before field test began. More of these suggestions were implemented than suggestions received later: 65 percent of the suggestions received during internal field test were implemented compared to 48 percent of the suggestions received during external field test.

Although contextual interviews provide more information than users’ reports of summary experience, the summary experience data is still valuable. The two methods complement each other. The on-site interviews provide details of users’ ongoing experiences in the context of sys tem use; on the other hand, electronic mail, conferencing, and problem reports provide summary experience data from a wider range of users than engineers could interview.

Early field testing is especially important for collecting data on experienced users. Experienced users, as well as new or infrequent users, must find systems easy to use. Early field testing is an excellent way to develop a test population of experienced users before a product is released. By the time later field test versions are available, these experienced users will be a valuable source of data on longer-term usability issues.

Instrumenting the System to Collect Usage Data

Knowing how frequently and in what order people use a system’s functions helps engineers with low-level design decisions. For example, engineers can use usage data to order functions on menus, putting less frequently used commands on less accessible menus. Our group has collected and analyzed usage data for text editors and operating systems, and compared this with data collected by other groups.^{13, 14}

We collect usage data by asking people to use an instrumented version of a functioning system, either an existing product or a field-test version. We collect the most complete data by recording and time-stamping each individual user action. Keeping frequency counts of user actions also provides useful usage data, but does not include data on transitions between actions or time spent with different functions.

For the EVE editor, we used command frequency data from five different text editors to guide the initial design of the keypad interface and the command set. During internal field test, we collected command frequency data from a small set of EVE users to refine the command set. We also used command transition data as the basis for the arrangement of the arrow keys on the LK201 keyboard into an inverted-T shape. Usage data from an experimental text editor showed that the transition from the down-arrow key to the left-arrow key occurred more than twice as often as any other transition between arrow keys.^{11, 13} The inverted-T arrangement also allows three fingers of the user’s hand to rest on the three most frequently used arrow keys, with an easy reach up to the up-arrow key.

Collectors of usage data must be concerned about user privacy and system performance. Users should know about the nature of the data collection and be informed when data is being collected. They should also have the option of using a system that has not been instrumented and does not collect usage data.

To inform users that data is being collected, designers can modify the instrumented version of the system so that a notification message is displayed each time this version is invoked. Users are thus reminded that all actions are being recorded. To minimize performance problems on instrumented versions, engineers can design the logging system so that any necessary delays occur at the start and finish of an application, not at random intervals while the application is being used.

Analyzing the Impact of Design Solutions

Designers make an impact analysis of user data collected during evolutionary delivery to estimate the effectiveness of design techniques in meeting product goals.¹⁵In usability engineering, design techniques are usually ideas developed after watching people use computer systems. Estimating the effectiveness of a set of design techniques for meeting a set of usability attributes helps to economically focus engineering effort on key issues.

Impact analysis tables contain percentage estimates of the contribution of each technique to the planned levels for each usability attribute. Impact analysis tables list product attributes and proposed design techniques in a matrix. Each entry in the table estimates the percentage that this technique will contribute toward meeting the planned level of this attribute.

Our software usability group creates impact analysis estimates in several ways, such as analyzing the videotapes made during user visits. With laboratory tests, we have derived estimates from the time actually spent as a result of interface problems encountered on a benchmark task.¹⁶ Impact analysis data can also be presented graphically using Pareto charts.¹⁷

Conclusion

Our group applies usability engineering in the development of many new software products within Digital. Software usability engineering techniques can be used by any group of engineers that designs interactive software. No special equipment or prior experience is necessary to start applying these techniques, although equipment and experience can improve the results.

As we have gained experience with usability engineering, we have moved from laboratory tests to field visits as the main source of usability data. We find that field-test data provides a richer source of ideas for user interface design. Laboratory testing is still valuable, however, especially for testing early prototypes. We are now bringing some contextual interview techniques to our laboratory tests, interviewing users as they perform a task rather than observing them as they work on their own. For more advanced prototypes, we may ask users to use the system with their own work, which they bring with them to the laboratory. Controlled laboratory experimentation techniques are still useful for deciding some important design issues, such as choosing screen fonts for an application.

A user-oriented approach to software design requires a commitment to understanding and meeting users’ needs through observation of people using systems. Software usability engineering techniques, applied in whole or in part, can produce computer systems that enrich human experience.

Acknowledgments

The techniques described here were developed in a group effort by present and past members of the Software Usability Engineering Group, including Mark Bramhall, Alana Brassard, Jim Burrows, Elisa del Galdo, Charles Frean, Kenneth Gaylin, Karen Holtzblatt, Sandy Jones, Thomas Spine, Eliot Tarlin, John Whiteside, Chauncey Wilson, Dennis Wixon, and Bill Zimmer. Dorey Olmer helped edit this paper.

References

T. Winograd and F. Flores. Understanding Computers and Cognition: A New Foundation for Design (Norwood: Ablex, 1986).
J. Whiteside, “Usability Engineering,” Unix Review, vol. 4, no. 6 (June 1986): 22-37.
W. Deming, Quality, Productivity, and Competitive Position (Cambridge: MIT Center for Advanced Engineering Study, 1982).
T. Gilb, “Design By Objectives,” Unpublished manuscript available from the author at Box 102, N-1411 Kolbotn, Norway (1981).
J. Bennett, “Managing to Meet Usability Requirements: Establishing and Meeting Software Development Goals,” Visual Display Terminals, eds. J. Bennett, D. Case, J. Sandelin, and M. Smith (Englewood Cliffs: Prentice-Hall, 1984): 161-184.
P. Gilbert, “Development of the VAX NOTES System,” Digital Technical Journal (February 1988, this issue): 117-124.
H. Dreyfus and S. Dreyfus, Mind over Machine (New York: The Free Press, 1986).
F. Brooks, Jr., “No Silver Bullet: Essence and Accidents of Software Engineering,” IEEE Computer, 20, no. 4 (April 1987): 10-19.
M. Good, “The Iterative Design of a New Text Editor,” Proceedings of the Human Factors Society 29th Annual Meeting, vol. 1 (1985): 571-574.
B. Curtis, et al., “On Building Software Process Models Under the Lamppost,” Proceedings of the IEEE 9th International Conference on Software Engineering (1987): 96-103.
M. Good, “The Use of Logging Data in the Design of a New Text Editor,” Proceedings of the CHI ’85 Human Factors in Computing Systems (1985): 93-97.
M. Good, J. Whiteside, D. Wixon and S. Jones, “Building a User-Derived Interface,” Communications of the ACM, 27 (October 1984): 1032-1043.
J. Whiteside, et al., “How Do People Really Use Text Editors?” SIGOA Newsletter, 3 (June 1982): 29-40.
D. Wixon and M. Bramhall, “How Operating Systems Are Used: A Comparison of VMS and UNIX,” Proceedings of the Human Factors Society 29th Annual Meeting, vol. 1 (1985): 245-249.
T. Gilb, “The ‘Impact Analysis Table’ Applied to Human Factors Design,” Human-Computer Interaction–INTERACT ’84, ed. B. Shackel (Amsterdam: North-Holland, 1985): 655-659.
M. Good, et al., “User-Derived Impact Analysis as a Tool for Usability Engineering,” Proceedings of the CHI ’86 Human Factors in Computing Systems (1986): 241-246.
K. Ishikawa, Guide to Quality Control, second revised ed. (Tokyo: Asian Productivity Organization, 1982).

Biography

Michael D. Good As a principal software engineer in the Software Usability Engineering group, Michael Good is developing software usability engineering methodologies and contributing to the user-interface design of several products. He has conducted usability research since joining Digital in 1981 and has published a number of papers on usability engineering and text editing. He designed and implemented the Eve text editor for the VAX/VMS operating system version 4.2. Michael received a B.S. (1979) and an M.S. (1981) in computer science from the Massachusetts Institute of Technology.