An Ease of Use Evaluation of an Integrated Document Processing System

Michael Good

Laboratory for Computer Science
Massachusetts Institute of Technology
Cambridge, Massachusetts, USA

Originally published in Proceedings of Human Factors in Computer Systems (Gaithersburg, MD, March 15-17, 1982), ACM, New York, pp. 142-147. Included here with permission. Copyright © 1982 by ACM, Inc.

1. Introduction

Designers of systems intended to be easy to use have many guidelines available to them in the literature. Most of these recommendations are based on the intuition and experiences of particular designers with particular systems. Very few of them have been evaluated experimentally, so one must be cautious not to attribute more authority to these guidelines than they deserve [6].

If a computer system follows these guidelines, is the resulting system easy to use? This question cannot be answered in general, but it can be asked of particular systems that have been designed from the outset with the intention of being easy to use. Experimental evaluations of such systems can contribute to an understanding of the usefulness of these guidelines and provide a way to measure the success of the system in meeting the goal of ease of use.

Nearly every new computer system claims to be easy to use, and there is nothing new about the widespread nature of these claims [1]. The obvious question to ask when this claim is made is, “What do you mean by easy to use?” An ease of use evaluation must describe the criteria that are being used and be careful to ensure that these are reasonable given the purpose of the computer system.

This paper summarizes the results of an experimental evaluation of the Etude text processing system [8]. Section 2 provides a brief overview of Etude. Section 3 describes the development of suitable ease of use criteria. Section 4 presents the experimental protocol. Section 5 discusses the results of the evaluation. A complete description of the experiment can be found in [7].

2. Etude

Etude is an interactive and integrated document processing system developed by the Office Automation Group of the MIT Laboratory for Computer Science [11]. As the user creates, edits, and formats a document, Etude displays the results on its full page, high resolution bit-map display screen. Etude is the first component of an integrated office workstation that will include functions such as business graphics, database/file management, electronic mail, and a calendar. All of these functions will be integrated into a single system with a uniform and consistent interface. The user will not have to switch back and forth among various systems to accomplish a certain task.

One of the workstation’s primary goals was that it be easy to use; in particular, Etude was designed to be both easy to learn and easy to use once learned. An early prototype was evaluated according to ease of use guidelines available in the literature. Etude measured up to most of these guidelines; changes were made where this was not the case [6].

The following guidelines were among those used in Etude’s development:

  • Commands are English-like phrases. Etude commands have a verb-modifier-object form; examples include go-to previous page, erase 3 lines, and move sentence (to) start-of next paragraph. The most common vocabulary items are provided as dedicated keys on an enlarged keyboard. Items not on the keys may be typed in full, typed in an abbreviated style, or selected from a menu.
  • All commands are reversible. An undo key reverses the effect of the immediately preceding operation.
  • Online assistance is available at any time by pressing the help key.
  • Each keystroke provides feedback to the user.

The keyboard used in this experiment is diagrammed in Figure 1. Other available objects refer to document components, such as the return address, address, greeting, closing, and notations in a letter.

Etude KeyboardFigure 1: Etude Keyboard

While Etude is intended to run on a powerful single-user computer, the version used in this experiment ran on a timesharing DECSYSTEM-20 using a Nu machine as a display device [14, 21].

3. Criteria for Ease of Use

Ease of use is a multi-faceted problem. Many guidelines are contradictory, for increasing one aspect of ease of use may in turn decrease another aspect [5, 13]. Many current systems exhibit a tradeoff between ease of learning and ease of use once the system is learned. Those that are easy to learn are often hard to use due to redundant or verbose features of the interface, while systems that have a very terse interaction style may be easy for experts to use but very difficult for novices to learn.

A system that follows ease of use guidelines may turn out to be easy to use in one aspect but not in another. The goals of each system determine what factors of ease of use are most important. In the case of Etude, the following four general criteria were chosen to represent the notion of ease of use:

  1. Ease of learning. Etude should be easy for a completely computer-naive person to learn. Such a person should be able to use Etude for useful work after a short, informal training period.
  2. Ease of use once learned. Etude should be easy for people to use. Users should be able to create and edit documents quickly, without being burdened by a clumsy or slow interaction style.
  3. Anxiety factor. Etude should not induce anxiety in its users. Common anxieties in computer users include the fear of breaking something and the fear of losing a large amount of work without notice.
  4. User attitudes. Both novices and experts should enjoy using Etude.

These general criteria for ease of use were developed through consideration of the requirements of people who will use advanced office systems such as Etude. These users may be clerical or managerial workers, but in either case they will not necessarily have any experience with using computers.

If a system is not easy to learn, it will not be used. Management will be reluctant to invest a large amount of time in the training of clerical workers, especially with the rapid turnover in this field. Managers will invest even less time in any attempts to learn to use the system themselves.

While ease of learning is the first hurdle that must be cleared for an advanced office system to win acceptance, ease of use once learned is at least as important. If a system is cumbersome to use it will either be circumvented or it will be used in its own inefficient way. Neither of these outcomes is desirable.

User satisfaction with the system is as important a goal as user performance. The anxiety factor and user attitudes towards the system provide straightforward indications of user satisfaction.

In order to evaluate Etude, these general criteria were narrowed down to the following specific measurable variables:

  1. Training time required for users to learn how to create and edit letters. This includes the time that a user spent going through a tutorial and the time spent typing and editing practice letters.
  2. Time required for novice users to type and edit a one-page business letter. These novices were users who had just completed the tutorial and practice letters.
  3. User’s state anxiety as measured by Form X-1 of the State-Trait Anxiety Inventory (STAI) [18, 19].
  4. User’s attitudes towards the system as measured by the evaluation component of a Semantic Differential (SD) [15].

The general criteria of ease of learning and ease of use once learned have been used by Roberts in her comparative evaluation of text editors [16], but the specific measurements differ due to the different purposes of the evaluations. It would have been preferable to evaluate the criterion of ease of use once learned by measuring expert users. The version of Etude used in this experiment was a prototype that was not feasible to use as a production editor; hence no expert users existed.

The last two criteria use standard psychological questionnaires for anxiety and attitude measurement. State anxiety refers to a person’s anxiety at a particular time, while trait anxiety refers to a person’s proneness to anxiety. Form X-1 measures state anxiety. It contains twenty items such as “l am tense,” “I feel calm,” and “I feel nervous.” The subject marks one of four possibilities for each item: “not at all,” “somewhat,” “moderately so,” or “very much so.” Each scale is scored from 1 to 4, with the higher score reflecting higher anxiety. The scores for each scale are added up to form the total score. In this experiment, subjects were instructed to reply according to the way that they felt during the preceding experimental task. There is a large body of literature regarding the STAI [3], including experiments involving computer systems [9, 20].

The SD is a type of questionnaire which has found widespread use in the area of attitude measurement. An SD is made up of a series of scales anchored by bipolar adjectives such as “good-bad,” “large-small,” and “fast-slow.” Each scale is divided into seven steps, each of which is qualified by an adverb. The subject is then asked to rate a particular word or concept on each scale. For instance, a subject might rate the term “dinosaur” as being extremely large, quite slow, and neutral with reference to begin good or bad.

An SD measures three components of attitude. Scales such as “good-bad” measure the evaluation component; scales such as “large-small” measure the potency component; scales such as “fast-slow” measure the activity component. Each scale is scored from -3 to +3; the final score for a particular component is the mean of all the scales which measure that component. The evaluation component is the only part where positive attitudes can be said to be “better” than negative attitudes when measuring ease of use; the other components’ contribution to ease of use are much less clear cut. The SD also has a large literature [10, 17]. The SD used in this study contained four scales for each component and was constructed in much the same way as was the SD used by Lucas in a study of patients’ attitudes towards medical interviews conducted by a computer [12].

Computer-naive temporary office workers were chosen for the sample population. For the purposes of this study, a subject was considered to be computer-naive if they had never had any experience with text processing equipment. Some subjects had used simple data entry devices. Many had probably used one of the 24 hour automated tellers, complete with video screens, that are available at most major banks in the Boston area.

4. Experimental Design

Twenty-five subjects were hired from two temporary agencies in the Boston area. The temporary agencies were told that the subjects should be office workers who did not have any word processing experience. They were not to be selected because of their inclination towards technical jobs or a technical environment. Each subject was paid for four hours work at a rate of five dollars an hour.

The number twenty-five was chosen to allow for a margin of safety if subjects did not show up, since the experiments could not be rescheduled (only one prototype was available for testing); a minimum of twenty subjects was desired. Three subjects did not show up. One subject revealed that she had used a computerized typesetting system at the end of the experiment. This left a sample of twenty-one subjects who had indeed not had any text editing experience, with attitudes towards the technical environment ranging from enthusiastic to fearful.

An arriving subject was given a brief oral explanation of the experiment and signed a consent form. Each subject performed the experimental task of typing and editing a one-page business letter using both Etude and an IBM Selectric II correcting typewriter. Editing time was measured by giving subjects a marked-up copy of the original letter; when using the typewriter, this involved retyping the letter. Three sets of letters were used in each trial to avoid practice effects. After finishing the task on a given device, the STAI and SD questionnaires were completed.

Eleven subjects used the typewriter and then learned to use Etude; the other ten used the typewriter after using Etude. Subjects were told that the Etude tutorial tried to be self-explanatory, but that they should feel free to ask questions of the experimenter (who was seated elsewhere in the room) if something unexpected happened. Assignment of subjects to treatment order was made at random. Subjects were given a break after they completed the Etude tutorial and practice tasks. After the experiment, each subject was asked what they particularly liked or disliked about Etude and was given the chance to ask questions about Etude, the experiment, or word processing in general. An entire trial took about three hours.

Nonparametric statistical tests were used to analyze the data. Comparisons were made by using the Wilcoxon matched-pairs signed-rank test [22]; confidence intervals were computed for the median rather than the mean [2]. Nonparametric tests were used for several reasons. The usual parametric assumption of normality cannot be made for SD scores and is of questionable validity for the other measurements. In addition, the Wilcoxon test is almost as efficient as the corresponding parametric test, even when normality is a valid assumption, and is easier to compute.

5. Results and Discussion

Table 1 summarizes the results of the experiment. The main conclusions were:

  1. Ninety percent of the subjects learned to use Etude in less than 2 hours and 20 minutes, with an average time of 1 hour and 53 minutes.
  2. Subjects took significantly longer to create and edit letters with Etude than with the typewriter.
  3. There was no systematic difference between subjects’ anxiety when using Etude and when using the typewriter.
  4. Subjects had positive attitudes towards Etude. In addition, their attitudes were as least as favorable towards Etude as they were towards a typewriter.
Device Mean Standard
Deviation
Range Median 95% Confidence
Interval
Training Time
Etude 1:53:25 34:35 1:16:10 3:46:10 1:52:10 1:26:30 2:06:20
Typing Time (p = 0.00)
Etude 16:35 5:40 9:15 29:55 15:05 13:05 20:45
Typewriter 6:55 2:00 3:40 10:25 6:20 5:45 7:45
Editing Time (p = 0.04)
Etude 8:20 4:55 3:15 19:10 6:45 5:10 10:25
Typewriter 5:55 1:10 3:45 7:35 6:15 5:00 6:55
STAI Score (p = 0.96)
Etude 40.95 8.12 25 54 41 36 47
Typewriter 41.33 9.60 23 64 43 35 45
SD Evaluation Score (p = 0.08)
Etude 1.12 1.12 -1.00 3.00 1.25 0.25 2.00
Typewriter 0.69 1.00 -1.75 2.50 0.50 0.25 1.25
SD Potency Score (p = 0.57)
Etude 0.38 0.53 -0.75 1.25 0.25 0.00 0.75
Typewriter 0.27 0.55 -1.00 1.75 0.25 0.00 0.50
SD Activity Score (p = 0.00)
Etude -0.68 1.00 -2.00 0.50 -0.50 -1.25 0.00
Typewriter 1.05 0.63 -0.25 2.50 1.00 0.75 1.50

Table 1: Statistics for Experimental Tasks

The statistics in Table 1 were computed using the Consistent System on Multics [4]. Values for p are the probabilities that the differences between Etude and the typewriter occurred by chance. The differences for typing and editing time were both large and statistically significant, while the STAI scores were nearly identical. For the SD, Etude scored higher on the evaluation component while the typewriter scored higher in on the activity component; the difference in the potency component was not statistically significant. There were no significant effects due to treatment order in any of the tests.

Figures 2 and 3 elaborate on the information in Table 1. Figure 2 shows the cumulative sample distribution function for training time, indicating the number of subjects who learned to use Etude with any given amount of time. Figure 3 compares the means for each scale in the SD. The first four scales measure evaluation, the middle four measure potency, and the last four measure activity. Such a graph is called a “semantic profile” [12].

Cumulative sample distribution function for training timeFigure 2: Cumulative Sample Distribution Function for Training Time

Differences in individual SD scalesFigure 3: Differences in Individual SD Scales

Etude succeeded in meeting the criteria of ease of learning and user attitudes. This system can be used productively and enjoyably by people with no previous text processing experience after an informal training session lasting less than half of a working day. The anxiety factor criterion may have been met as well, though interpretations of null results should always be made cautiously. The slow response time of the experimental version of Etude may have been a major factor in not meeting the criterion of ease of use once learned.

In this case, ease of use guidelines have been applied to the design of a powerful document production system that is easy to learn and enjoyable to use. A multi-dimensional set of ease of use criteria has been developed based on tests that are easy to administer and evaluate. Future experiments could use similar criteria to examine the utility of individual design guidelines.

References

  1. Bennett, J. L. The user interface in interactive systems. In Annual Review of Information Science and Technology, Vol. 7, C. A. Cuadra, Ed., American Society for Information Science, Washington, 1972, pp. 159-196.
  2. Beyer, W. H. (Ed.). The CRC Handbook of Tables for Probability and Statistics. The Chemical Rubber Co., Cleveland, 1966. Confidence intervals for medians in Table VII.3 on p. 266.
  3. Buros, O. K. (Ed.). The Eighth Mental Measurements Yearbook, Vol. 1. The Gryphon Press, Highland Park, N.J., 1978. Entry No. 683, pp. 1088-1096, contains STAI bibliography.
  4. Consistent System: Elementary Statistical Analysis. First edition, Renaissance Computing, Inc., 675 Massachusetts Avenue, Cambridge, Mass. 02139, 1980.
  5. Gebhardt, F. and Stellmacher, I. Design criteria for documentation retrieval languages. J. American Society for Information Science 29 (1978), 191-199.
  6. Good, M. Etude and the folklore of user interface design. SIGPLAN Notices 16 (June 1981), 34-43.
  7. Good, M. An ease of use evaluation of an integrated editor and formatter. Tech. Rep. TR-266, MIT Lab. for Computer Science, Nov., 1981. Revised version of MIT M.S. thesis.
  8. Hammer, M. et al. Etude: an integrated document processing system. 1981 Office Automation Conference Digest, AFIPS, March, 1981, pp. 209-219.
  9. Hansen, J. B. Effects of feedback, learner control, and cognitive abilities on state anxiety and performance in a computer-assisted instruction task. J. Educational Psychology 66 (1974), 247-254.
  10. Heise, D. R. The semantic differential and attitude research. In Attitude Measurement, G. F. Summers, Ed., Rand McNally, Chicago, 1970, pp. 235-253.
  11. Ilson, R. An integrated approach to formatted document production. Tech. Rep. TR-253, MIT Lab. for Computer Science, Aug., 1980. MIT M.S. thesis.
  12. Lucas, R. W. A study of patients’ attitudes to computer interrogation. Internat. J, Man-Machine Studies 9 (1977), 69-86.
  13. Miller, R. B. Human ease of use criteria and their tradeoffs. Tech. Rep. TR 00.2185, IBM Poughkeepsie Laboratory, April 12, 1971.
  14. Niamir, B. A virtual terminal interface for text processing applications. Memo OAM-011, MIT Lab. for Computer Science, Office Automation Group, Dec., 1979.
  15. Osgood, C. E., Suci, G. J. and Tannenbaum, P. H. The Measurement of Meaning. University of Illinois Press, 1957.
  16. Roberts, T. L Evaluation of computer text editors. Report SSL-79-9, Xerox PARC, Nov., 1979. Stanford Ph.D. dissertation.
  17. Snider, J. G. and Osgood, C. E. (Eds.). Semantic Differential Technique. Aldine Publishing Co., Chicago, 1969.
  18. Spielberger, C. D. Anxiety as an emotional state. In Anxiety: Current Trends in Theory and Research, Vol. 1, C. D. Spielberger, Ed., Academic Press, New York, 1972, pp. 23-49.
  19. Spielberger, C. D., Gorsuch, R. L. and Lushene, R. E. Manual for the State-Trait Anxiety Inventory. Consulting Psychologists Press, 577 College Ave., Palo Alto, Calif. 94306, 1970.
  20. Walther, G. H. The on-line user-computer interface: the effects of interface flexibility, experience, and terminal-type on user-satisfaction and performance. Ph.D. Th., U. Texas at Austin, Aug., 1973. NTIS No. AD-777 314.
  21. Ward, S. A. and Terman, C. J. An approach to personal computing. Digest of Papers, Compcon ’80, IEEE, Feb., 1980, pp. 460-465.
  22. Wilcoxon, F. and Wilcox, R. A. Some Rapid Approximate Statistical Procedures. American Cyanamid Co., Pearl River, N.Y., 1964.

Copyright © 1982 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.

This is a digitized copy derived from an ACM copyrighted work. ACM did not prepare this copy and does not guarantee that is it an accurate copy of the author’s original work.