| Automatic
web usability evaluation: what needs to be
done? |
 |
| By Giorgio Brajnik |
 |
| Abstract |
 |
| Website redesign and maintenance
are likely to absorb more and more resources
as web technologies and uses keep evolving
at the current pace. Usability evaluation
methods need to be run after each change in
order to ensure a decent quality level. The
means to control the complexity and cost of
website maintenance lies in tools performing
automatic usability evaluations. I present
a survey of tools that analyze websites, illustrating
what kind of automatic tests they perform
and which usability factors the tests are
more closely related to. The survey then leads
to an analysis of the still remaining gaps
and of research openings. |
 |
| 1.
Introduction |
 |
| It is well known that the average
quality of websites is poor, “lack of
navigability” being the #1 cause of
user dissatisfaction [Fleming, 1998; Nielsen,
1999]. |
 |
| On the one hand web technologies
evolve extremely fast, enabling sophisticated
tools to be deployed and complex interactions
to take place. Secondly, the life cycle of
a website is also extremely fast: maintenance
of a website is performed at a rate that is
higher than that of other software products
because of market pressure and lack of distribution
barriers. In addition, often the scope of
maintenance becomes so wide that a complete
redesign takes place. |
 |
| On the other hand, the quality
of a website is rooted on its usability, which
usually results from the adoption of user-centered
development and evaluation approaches [Newman
and Lamming, 1994; Fleming, 1998; Rosenfeld
and Morville, 1998; Nielsen, 1999]. Usability
testing is thus a necessary and repeated step
during the life-cycle of a website. |
 |
| To test usability of a website
a developer can adopt two kinds of methods:
usability inspection methods (e.g. heuristic
evaluation [Nielsen and Mack, 1994]) or user
testing [Nielsen, 2000]. Heuristic evaluation
is based on a pool of experts that inspect
and use a (part of a) website and identify
usability problems that they assume will affect
end users. With user testing, a sample of
the user population of the website is selected
and is asked to use (part of the) website
and report things that they think did not
work or are not appropriate. |
 |
| Even though the cost (in terms
of time and effort) of both methods is not
particularly high, and their application improves
the website quality and reduce the overall
development cost, they are not systematically
performed at detailed levels on every different
part of a website after each maintenance or
development step. |
 |
| It is clear that as change actions
on a website increase rapidly in number and
variety, more and more resources need to be
deployed to ensure that website quality does
not decrease (but hopefully increases). It
is also clear that any tool that can, at least
in part, automate the usability evaluation
and maintenance processes will help to fill
this ever widening gap. |
 |
| The goal of this paper is to
present a brief survey of what these tools
do and how they contribute to the usability
evaluation problem. From the analysis it appears
that gaps exist between what these tools achieve
and what is required to ensure usability.
While some of these gaps are inherently unsolvable,
other ones can probably be filled in, given
that additional research is carried out to
identify effective techniques. |
 |
| 2.
A software engineering view of a website |
 |
| A website is an interactive
software system. It interacts with at least
two different kinds of users: end users trying
to achieve some goal and developers/maintainers
striving to keep the system working and improving
it. |
 |
| End users
can be characterized in terms of: |
 |
| |
 |
goals and tasks:
e.g. information seeking, choosing where
to buy some specific product, buying
it, writing a book review, etc. |
 |
 |
context: user behavior
during information seeking processes
is strongly affected by users’
culture, language, previous knowledge
in the field, experience in using the
web. |
 |
 |
technology: end users
interact with the website through a
layer of technology that is not under
control by the web designer: browsers,
protocols, plug-ins, operating system
platforms, interaction devices (screens,
speaking devices, pens, reduced telephone
keyboards, etc.), network connections.
|
|
 |
| Information seeking through
browsing is a process that almost all websites
must support. Unfortunately, it is also a
difficult task to model and support because
it encompasses complex cognitive, social and
cultural processes [Allen, 1996] spanning
through interpretation of textual, visual,
audio messages, selection of relevant information
and learning. |
 |
| On the other hand we have developers
and maintainers. Amongst their activities,
a prominent role is played by actions that
include: corrective maintenance (i.e. fixing
problems with the website behavior or inserting
missing contents), adaptive maintenance (i.e.
upgrading the site with respect to new technologies,
like new browsers’ capabilities), effective
maintenance (ie. improving the site behavior
or content), and preventive maintenance (i.e.
fixing problems in behavior or content before
they affect users). A large fraction of these
activities is aimed at detecting system failures
(that is departures from its required behavior),
analyzing them and identifying faults (that
is representations, within the system, of
human errors that occurred during development
– bugs). |
 |
| Maintenance is meant to improve
the quality of the website. ISO9126 defines
quality as “the totality of features
and characteristics of a software product
that bear on its ability to satisfy stated
or implied needs” and it includes properties
like maintainability, robustness, reliability
and usability that are particularly important
for websites. |
 |
| Usability can be defined (ISO9241)
as “the effectiveness, efficiency and
satisfaction with which specified users achieve
specified goals in particular environments”,
where: |
 |
| |
 |
effectiveness means “the
accuracy and completeness with which
specified users can achieve specified
goals in particular environments”,
|
 |
 |
efficiency means “the resources
expended in relation to the accuracy
and completeness of goals achieved”,
and |
 |
 |
satisfaction means “the comfort
and acceptability of the work system
to its users and other people affected
by its use”. |
|
 |
| General properties like these
are not independent: for example, a robustness
failure of a website (e.g. some browser incompatibility)
will result also in a usability failure (e.g.
user inability to complete a task and dissatisfaction). |
 |
| In order to be operationalized
these properties need to be decomposed into
more detailed ones that can be assessed in
a simpler and perhaps more standard way. For
example, maintainability can be decomposed
into complexity of the DHTML code, its size,
the number of absolute URLs, etc. |
 |
| The same applies to usability.
It can be described in terms of usability
factors (like speed of use, error rate, ease
of error recovery, etc) which in turn can
be reduced to other lower-level properties.
The most important properties for website
usability include those related with “navigability”
(most of them taken from [Fleming, 1998]):
|
 |
| |
 |
consistency of presentation
and controls |
 |
 |
adequate feedback |
 |
 |
natural organization of the information
(systematic labels, clear hierarchical
structure) |
 |
 |
contextual navigation (in each state
all and only the possible navigation
options are available) |
 |
 |
efficient navigation (in terms of
time and effort needed to complete a
task) |
 |
 |
clear and meaningful labels. |
|
 |
| Other
properties relevant to usability of a website
are: |
 |
| |
 |
robustness (i.e. how well
the website handles technology used
by users that has not been foreseen
by developers) |
 |
 |
flexibility (for example: availability
of graphic and textual versions, redundant
indexes and site maps, duplicated image
map links) |
 |
 |
functionality (i.e. support of users’
goals) |
|
 |
| The latter can be further decomposed
if we narrow users' goals. For e-commerce
sites, for example, other relevant attributes
can be: |
 |
| |
 |
how security is handled
and how easy it is to get information
about it |
 |
 |
similarly for privacy |
 |
 |
how easy and effective it is to find
the desired item |
 |
 |
how easy and effective it is to search
the catalog for an item not known a
priori |
 |
 |
how easy and effective it is to preview
an item |
 |
 |
what are the return policies and
how they are communicated |
|
 |
| The Web Accessibility Initiative
[W3C, 2000] is an effort by the W3C organization
to improve website accessibility. They publish
a set of guidelines [WAI, 1999] where accessibility
is defined as the website ability to be used
by someone with disabilities. An accessible
website: |
 |
| ensures graceful transformation:
it should remain accessible despite physical,
sensory and cognitive disabilities, work constraints
and technological barriers; |
 |
| makes content understandable
and navigable: it should present its content
in a clear and simple language, and should
provide understandable mechanisms to navigate
within and between pages. |
 |
| While usability implies accessibility
(at least when an unconstrained user population
is considered), the contrary is not necessarily
true. For example, a missing link to the home
page may be a fault affecting usability, while
it does not affect accessibility. |
 |
| All these properties (either
those related with usability or those related
with accessibility) may be further decomposed
into more detailed ones that refer to specific
attributes of the website implementation.
Actually, such a decomposition has to be done
in order to support usability inspection methods
and to identify and fix faults. For example,
to determine how flexible a website is, we
need to inspect implementation (or perhaps
design specifications) to determine if there
is a textual version of the page, if there
are textual links that duplicate those embedded
in images, etc. |
 |
| Some of these lower-level properties
refer to attributes that depend only on how
the website has been designed/developed (e.g.
textual duplicates of links embedded in images)
– they are internal attributes, while
others depend on the website and its usage
(e.g. how meaningful a label is) – external
attributes. This is always the case for properties
referring to the content, which require some
sort of interpretation that assigns meaning
to symbols in order to be assessed. |
 |
| While for evaluating usability
of a website both internal and external attributes
are needed, only the former ones are amenable
for automatic tests. External attributes can
be evaluated only via semi-automatic means
that entail a human evaluation step. However,
tools can provide useful assistance by filtering
and ranking content that is potentially relevant
(for example, by adopting statistical techniques
developed in Information Retrieval [Belkin
and Croft, 1987]). |
 |
| 3.
Automatic tools for usability evaluation |
 |
| Tools that support the developer/maintainer
in finding usability faults and fixing them
can be classified according to: |
 |
| location: web-based vs off-line
|
 |
| |
 |
type of service: failure
identifiers (they discover potential
failures via simulation of user actions,
like filling a form; sometimes they
rank them according to severity); fault
analyzers (they find failures and highlight
their causes, i.e. faults; usually they
systematically analyze the source code
of the website; sometimes ranking the
list of faults according to their severity);
analysis and repair tools (they assist
the developer also in fixing the faults)
|
 |
 |
information source: automatic usability
analysis can be performed on the basis
of the actual implementation of a website
(sources), or on webserver logs, or
data acquired during user testing (user
testing data); this paper deals only
with tools analyzing website sources
|
 |
 |
scope, i.e. the set of attributes
that are considered during the automatic
analysis. A classification based on
scope is: |
 |
 |
HTML validators and cleaners (they
assist in removing non standard usage
of the language) |
 |
 |
HTML/graphic optimizers (they improve
downloading and rendering performance
by recoding certain parts of HTML or
graphic documents) |
 |
 |
link checkers (they probe all the
links leaving a page to determine if
their targets exist) |
 |
 |
usability tools (they detect and
sometimes help to fix usability faults).
|
|
 |
| In the following of the paper
I will discuss only tools having d) for scope,
being the most general one. At the moment
the following tools have been developed and
are available (or will soon be available)
from the web1: |
 |
| A-Prompt: developed
by the University of Toronto [ATRC, 1999];
off-line, with ranking; does fault analysis
and repair |
 |
| Bobby: available
from CAST [CAST, 1999]; web-based and off-line,
with ranking; fault analyzer |
 |
| Doctor
HTML: available from Imagiware [Imagiware,
1997]; web-based and off-line; fault analyzer
|
 |
| LIFT:
available from UsableNet.com [Usablenet, 2000];
web-based and off-line, with ranking; fault
analyzer and repair tool |
 |
| 4.
The test effectiveness problem |
 |
| While these tools offer a test
suite that is reasonably wide and open, at
the moment there is no standard way to assess
usability of the tools themselves. This is
particularly true for their effectiveness,
that is how accurate are the tests that they
run. Determining the means to measure and
evaluate test effectiveness is an important
requirement, both from research and pragmatic
viewpoints. In fact, a standard tool evaluation
methodology: |
 |
| |
 |
could be used to assess
validity of each test and consequently
each tool; |
 |
 |
could be used to compare effectiveness
of different tools; |
 |
 |
could be used to define standard levels
of effectiveness, that might then automatically
reflect on standard usability levels
of websites that have been passed through
certified tests; |
 |
 |
could provide insights for a proper
interpretation of the results produced
by tests (what can be the consequences
of the problems identified and fixed
by tools). |
|
 |
| The research on web usability
and accessibility guidelines [WAI, 1999; Scapin
et al., 2000] is a first step towards such
a methodology. But more is needed to define
a proper methodology. |
 |
| An evaluation methodology,
given the fast evolution pace of web technologies
and uses, can probably be only based on experiments
comparing test results with results obtained
through other usability evaluation methods,
namely usability inspection methods and user
testing. |
 |
| It should specify a set of tests
(by identifying possible usability failures
and related faults), how test effectiveness
is to be measured and how the experiment should
be performed (what kind of user testing, what
kind of questionnaires or data acquisition
methods should be adopted, etc.) in order
to be valid. The Goal-Question-Metrics approach
[Fenton and Lawrence Pfleeger, 1997] could
be followed as a framework to define such
a methodology. |
 |
| Notice that even though many
tests are likely to yield false positives,
the major consequence of this is a reduced
productivity of the maintainer (that has to
cope with incorrect information). In my view,
it is more important to define effectiveness
in terms of the number of false negatives,
that is cases where the automatic tool was
not able to identify a fault that was instead
uncovered by other means. |
 |
| Test sites could be set up where
specific faults are injected with the purpose
of exercising certain tests. Tools then could
be evaluated on the basis of the number of
faults that they uncover. |
 |
| 5.
Conclusions |
 |
| In this paper a brief survey
of automatic usability evaluation tools for
websites has been presented. These tools consider
a large set of properties depending on attributes
of websites only (and not on the context in
which websites are used, thus not considering
its contents). Expecially those supporting
repair actions (in addition to identification
of usability faults) have the potential to
dramatically reduce the time and effort needed
to perform maintenance activities. |
 |
| Several tests are still uncovered
even though it seems that they are viable
with currently available technology. In other
cases, in order to be able to advance the
state of the art in automatic usability evaluation,
the test effectiveness problem needs to be
formulated and solved. This is the problem
of defining a standard methodology for evaluating
the effectiveness of these tools. This in
turn requires that appropriate models for
usability are defined |
| |