
Prepared by:Tamra Hall, Ph.D.The MITRE CorporationJuly 2000 |
There are generally two types of system evaluations. The first category is the evaluation of the system from the user’s perspective. The second evaluation category addresses the technical performance of the system, including the impact of the system on the network on which it resides and other systems to which it is connected. This document addresses evaluations from the user perspective. However, system performance issues will be discussed in terms of the ability of users to effectively interact with a system.
This document begins with a discussion on why metrics are so difficult
to define and collect. Specific guidelines are provided for the collection
of metrics in terms of what types of data to collect, why each type of
metric is important, who benefits from the metrics, and, finally, how and
when to collect metrics. While this document focuses on assessing collaboration
systems as applied to the IC, the measurement concepts presented can be
applied to the assessment of a wide range of systems.

The first wicked problem that confronts metrics programs is determining how exact the metrics need to be and what the appropriate level of effort is to devote to the collection and analysis of metrics. The key to an effective metrics program is finding the right balance between effort and results. Figure 1 depicts a continuum of evaluation rigor. The left end of the continuum represents programs that have no information on the effectiveness of the deployed collaboration system. The next step along the continuum is when programs have very little or no data to support their claims but have a general impression of how well the collaboration system is working. Programs operating at this level often argue that common sense dictates that collaboration is a good thing. Therefore, providing more collaboration tools and connectivity to users is good.
The first point on the continuum that provides an acceptable degree of evaluation rigor to support the evaluation of collaboration systems is the collection of anecdotal evidence. Anecdotal evidence is not based on a systematic data collection effort that is planned to investigate specific effects. Rather, anecdotal evidence is an account of events when the end result is of particular note, most often characterized by success stories or significant program failures.
Structured baseline comparisons provide the recommended level of rigor for the evaluation of collaboration systems. Structured baseline comparisons involve predetermining the collaboration goals and defining specific data to be collected both before and after the deployment of the collaboration system to assess whether the system has satisfied those goals. At the far end of the continuum is controlled statistical evidence. Rarely do evaluators have the freedom to strictly control conditions within an operational environment to support these types of evaluations. Of equal concern is the fact that laboratory evaluations rarely replicate operational conditions and, therefore, provide only a limited view of the system. Therefore, controlled statistical evidence is not typically available for deployed collaboration systems. Section 5.0 further discusses the varying levels of metric investments and their relative merits related to the capture of anecdotal evidence and structured baseline comparisons.

Metrics programs for collaboration systems often struggle through a second wicked problem: how to quantify the intangible notion of quality. Within the IC, the desired outcome of increased collaboration is frequently increased quality of analytic products. Unlike manufacturing processes, where the quality of widgets produced can be easily quantified in terms of fault testing, the quality of intelligence reporting is mostly subjective in nature. Section 2.2.1 provides guidance on how to address quality metrics.
Another issue that frequently emerges when defining metrics for collaboration systems is how to isolate the effects of collaboration tools from the effects of collaboration business processes. Within this document, a collaboration system is defined as both the process and the technology supporting collaborative activities. Typically when a collaboration tool is introduced into an organization or team of organizations, the process of how these individuals communicate and collaborate also changes. The inevitable question that arises is whether enhancements in business operations were achieved due to the collaboration tool or the changes in business processes. Evaluating the relative value-added of process versus technology will be explored further in Section 2.3.2.
Metrics programs for collaboration systems also need to determine a reasonable timeframe to expect results. Full adoption of collaboration systems may be a very slow process. How quickly organizations will embrace collaboration is often difficult to predict. Frequently there are significant cultural barriers and legacy business processes that take time to overcome. Use of collaboration systems also is likely to evolve over time as business needs change and users discover new applications for the technology. Therefore, an appropriate timeframe for an evaluation may be difficult to predetermine. In fact, it may be more appropriate to consider the evaluation of a collaboration system as an on-going process rather than a one-time event.
Finally, metrics programs must consider the burden of data collection
on users. A goal of any data collection effort within an operational
environment should be to minimize the burden on users and the disruption
of operations. Metrics programs should carefully balance the imposition
on users with the collection of meaningful data.

At a minimum, the who component of this formula should include a simple count of the number of users who interact with the system on a daily basis. A more comprehensive assessment of system usage requires additional information on the salient characteristics of the active and inactive users. Tracking usage by individual users is highly recommended, since it allows evaluators to follow-up with specific users to understand why they choose to use or not use the system. This means that each user must be assigned a unique identification. Basic information on user characteristics should include organizational affiliation, team or community of interest membership, and how long each user has had access to the system. Additional demographic data may include the role that each user serves within the collaborative environment (e.g., analyst, collector, customer), areas of expertise, and relevant experience levels (e.g., experience with the system, experience in their assigned role, expertise on the collaboration topic). It may be infeasible and of limited value to track individual usage of collaborative systems that have very large user populations. However, tracking usage for these larger populations by key characteristics such as organizational affiliation and, perhaps, role is critical. When collecting and reporting on usage statistics of collaboration systems with relatively small user populations, it is important to separate out use by system administrators, evaluators, and other support functions from the "real" users of the system. When user populations reach several hundreds, system support personnel represent such a small percent of the total usage that their data is unlikely to distort representative usage patterns.

The second element in the who, what, how frequently, and in what context formula is what types of activity are occurring within the system and what types of data are being exchanged. Types of activity within a collaborative system are generally categorized in terms of the functions supported by the collaborative tools, such as e-mail, on-line meetings, database postings, and shared files. Usage statistic should capture which of these functions users are employing. Types of data exchanged such as word files, images, and audio are also important to capture. Understanding the what of collaboration provides insight into the process of collaboration within the system and the extent to which users are applying the capabilities provided by the tools.
Characterizing the frequency of use is the core element of usage statistics. Frequency statistics capture the number of times that events occur within the system, including the number of user logons and the number of times each function is used. A metric closely related to frequency is the length of time that users are on the system. Do users logon to the system for short periods of time several times during the day or do they logon and stay logged-on for the majority of their work shift? The length of time that a user is on the system, however, can be a tricky metric. Users can be logged onto to the system but not actively engaged in collaboration, allowing the application to run in the background. On the other hand, a user may be logged-on to the system for a shorter period of time but engaged in collaborative activities for the entire period. In these cases, length of time logged-on to the system would not accurately reflect the degree of collaboration activity.
The final component of usage statistics is a description of the context in which usage occurred. Context descriptions should include the number of users who have accounts or access to the system at any given time. The status of the system should also be continually logged. System status metrics include tracking when the system is down, when the network is degraded, and when a new version of the software has been deployed. Usage statistics should also be presented in the context of events that may affect system use. For example, usage may be unusually low during holiday or summer vacation periods and high during crisis operations or exercises.
Usage statistics should be gathered automatically by the system with no imposition on the users. As an alternative, users can be asked to manually log their system use. However, because many users find manual tracking to be very tedious, they may not comply with the data collection request. Asking users to estimate how frequently they used the system at the end of the evaluation period or on a periodic basis, such as a weekly or monthly, is another data collection option for usage statistic. However, these types of user estimates have not proven to be particularly accurate.
Figures 3 through 5 provide examples of system usage statistics. Figure 3 displays hypothetical data representing the number of unique user logons per day over a sixth month period. The graph shows the number of daily users from each of five organizations. The yellow line represents the total number of users who had active user accounts. The graph also overlays significant events on the usage data. Figure 4 provides a more sophisticated example of usage statistics. This figure was drawn from a report by NSA’s Community-Wide Enterprise Facility (CWEF) describing the Signal Intelligence Community’s use of the Collaborative Virtual Workspace (CVW) tool. The graph shows the length of user sessions on the tool based on user experience with the tool. From this data, the CWEF was able to conclude that users who had greater experience with the CVW tool were more likely to integrate tool use into their daily operations as compared to inexperienced users who would logon for relatively short periods of time. Figure 5 is also drawn from the CWEF. This chart depicts the types of data that users exchanged through the use of CVW.


Another challenge related to the collection of metrics on quality is
the potential resistance of the workforce to subject work to inspection.
Quality measures how well the analysts and collectors are doing their job.
The more scrutiny a worker is under, the more resistant a worker may be
to participate in metrics collection. Due to the sensitivity of assessing
quality of work, metrics efforts in this area should be especially well
thought out.
While timelines provide an objective quantification of time, subjective assessments of the timeliness of intelligence products may also be important. Regardless of the actual time to produce a product, the product may or may not be perceived as timely. Therefore, subjective assessments of timeliness provide a good complement to objective measures of time.
Another metric used in the evaluation of collaboration systems is the
time that workers spend on non-value added activities. These non-value
activities include tasks such as scheduling meetings, traveling to meetings,
and distributing hardcopy documents. Virtual collaboration technologies
are frequently targeted to reduced the time associated with these tasks,
allowing workers to spend their time on substantive tasks directly related
to their mission, such as research, analysis, and reporting.
Similar to production timelines, these measures can be assessed either
objectively, by directly measuring the time spent on specific activities,
or subjectively, by asking workers if the time spent on these non-value
added activities has decreased. The meaningfulness of these measures
is dependent on the clear definition of the tasks that are considered to
be non-value added. Table 5 provides sample time metrics related
to collaboration.

One note of caution is warranted related to cost metrics. Budgetary processes may serve as a disincentive to cost savings. Government organizations may perceive that if they reduce operating costs, their budgets in the following years will be reduced. An organization’s motivation to protect its resources may derail metrics programs attempting to quantify any cost savings associated with collaboration. Table 6 provides a list of sample cost metrics for collaboration systems.

Many evaluation programs in their quest for quantitative data, often referred to as "hard data," will overachieve in the area of quantity metrics, collecting data on anything and everything that can be counted. However, just because something can be counted does not mean that it is a meaningful metric. Section 2.3.4 discusses the importance of tying metrics to specific collaboration goals to ensure meaningful metrics are collected.
Increases in quantity may not always be a desirable outcome. If the goal of a collaboration system between analysts and customers is to allow analysts to be more proactive in the provision of information to the customer, then a decrease in the customer’s requests for information may be a positive outcome. A decrease in requests for information may indicate that analysts were able to feed customers the information they needed without a specific request. When defining an evaluation plan it is important to think through what an increase or decrease in quantity means from a business process and outcome perspective. This task occurs naturally when metrics are tied directly to collaboration goals.

One of the pitfalls of quantity metrics is that they are often event
driven. During a crisis period, activity tends to be high.
During summer vacation or holiday periods when no significant organizational
or world events are occurring, collaborative activity may be naturally
low. Therefore, it is important that data collection timeframes are
of sufficient length to capture representative periods of activity.
It is especially critical that data collected for baseline purposes are
acquired during a period comparable in activity level to the period when
the impact of new collaborative processes and technologies is assessed.

Collecting baseline metrics can involve a significant level of resources and time. The traditional method for collecting baseline data is to set a period of time prior to the introduction of the new technology and business practices to collect metrics on the current state of business. To support meaningful comparisons, the baseline data collection period must be of sufficient length to represent the full range of operational conditions. In many cases, the collection of a full-scale baseline data collection effort may not be feasible. For example, in a crisis situation where collaborative technologies have been deployed quickly to a new set of users and applied to a new set of business operations, it is not possible to stop the crisis to collect baseline measurements. However, it may be very important to understand whether collaboration enhanced operations during the crisis. In other cases, resource constraints and impositions on the user population may restrict baseline data collection.
There are other options for collecting baseline information. Historical
records on information such as the quantity of products and production
timelines can be a valuable source of benchmark data. In general,
evaluators should investigate what information is currently available through
an organization’s routine record keeping practices or previous studies
prior to instituting any new data collection initiatives. A second
method for obtaining benchmark data is to study two groups simultaneously.
One group of users applies the new collaborative technology and business
practices while the other group continues to operate with its existing
tools and processes. The challenge is to find two groups that are
sufficiently similar to support meaningful comparisons. The paradox
with this approach is that assessing the degree of similarly between the
groups often requires benchmark types of data. A third data collection
option is to use subjective assessments as a means of benchmarking.
In this approach, after users have had experience with the collaboration
system, they are asked to assess whether outcomes and processes are better,
worse, or unchanged from their previous method of conducting business.
Subjective baseline assessments are obviously less rigorous than the other
options discussed, since there are no objective data to support users’
perceptions.
Most experts in collaboration contend that the real value of collaboration
is not found within the tools but rather in how people apply the tools.
This document defines collaboration systems to include the tools, the support
provided for the tools (e.g., training), the people, and the process.
The results observed in business process and outcome metrics will be attributable
to the interaction of all of these factors. However, in order to
enhance collaborative activities, it is important to understand the strengths
and weaknesses of these interacting factors. The best insight into
the relative contributions of tools, process, and people is through a combination
of investigative techniques. Comparing baseline versus new business
practices documents how the process has changed. In addition to the
collaborative activities that occur through the use of new tools, evaluations
should examine collaboration that occurs thorough traditional methods,
such as face-to-face, hardcopy document exchange, and phone calls.
However, feedback from users may provide the best insight on the specific
factors impacting the effectiveness of collaboration. Structured
interviews can be constructed to investigate the strengths and weakness
of both tools and process.
The golden rule is that if you don’t know specifically what you are going to do with a piece of data, don’t collect it. The most effective method to ensure that metrics are tied to collaboration goals is to create a traceability matrix. A traceability matrix lists the goals of the system under evaluation, the data needed to assess whether a goal has been reached, and the method in which that data will be collected (e.g., questionnaires and production records). Construction of a traceability matrix ensures that data will be collected to address each of the system goals and that no extraneous data is planned for collection. Table 9 provides an example of a traceability matrix.

Capturing unexpected outcomes or process changes does not mean that every conceivable metric needs to be collected, resulting in painfully long questionnaires and reams of system audit data. Rather, unexpected results can be captured through the use of brief open-ended questions. These questions should ask users to describe the impact of the system. The responses to such high-level questions probably are not going to yield the type of detail and confidence in the data to drive significant decisions. However, the responses will point investigators to consider more detailed and structured evaluation of the reported impact. The earlier an evaluation uncovers these unanticipated results, the easier the evaluation can be tailored and redirected to capture more detailed data in that area. Therefore, the evaluation plan should include a mechanism to capture early feedback on the impact of the system. This can include on-line feedback/hot wash sessions, brief questionnaires, brief one-on-one interviews, or user conferences. Since there is a need to take the pulse of the system early and on a frequent basis, the method of data collection for ad hoc results should be as unobtrusive as possible.

Ease-of-use, however, is only one aspect of system usability. A system may be very easy to use but the functionality provided by the system may be inadequate to support effective and efficient job performance. Therefore, a second element of usability is the adequacy of the functionality provided by the system to support job performance. Adequacy of system functionality includes metrics on system performance, such as speed, accuracy, and reliability. The ability of the system to process the required types and amount of data is another facet of system functionality. In addition, the system should interoperate with key software applications employed by users. Finally, the system should be compatible with the business goals and work processes of the user population.
A third component of usability is system training. A comprehensive evaluation of system usability should include an assessment of the training burden associated with a system. Training burden includes the length of time required to be adequately trained on the system, the number of trainers and support personnel, equipment, and data resources needed to conduct the training. In addition to understanding the time and resource burden associated with training, it is important to understand the effectiveness of the training. Did the training provide users with the requisite skills and knowledge to effectively use the system? Table 11 summarizes the three types of system usability metrics.

In terms of usability rating scales, users are typically asked to rate how easy it is to perform the various functions in the system (e.g., set up an on-line meeting, post a document in the virtual library, share an application). Another approach to rating scales is to ask users to rate the acceptability of the system in terms of basic usability principles (e.g, the number of steps to perform system functions, the interpretability of system terminology, and the adequacy of on-line help and system feedback).
The problem with the rating scale approach is that a comprehensive understanding of the specific aspects of the system that users find easy or difficult may require a very long set of rating scales. For example, asking users to rate the ease/difficulty of an on-line meeting capability may need to include ratings on the ease/difficulty to set-up the meeting, join/attend a meeting, use the text chat capability, use the audio capability, etc. Rating scales also make it difficult to interpret whether something was easy or difficult due to the characteristics of the collaboration tools or due to the processes associated with the use of the tool. For example, regardless of the user interface and functionality provided by a tool, on-line meetings can be difficult to manage if participants do not adhere to established meeting protocols.
The appeal of rating scales to many evaluators is that they can produce numerical data. As discussed in the section on business process and outcome metrics, obtaining meaningful quantitative data for collaboration systems can be challenging. While it may be easy to construct a battery of rating scales to address usability issues, and the data may produce impressive charts that provide at least the illusion of quantitative results, the practical informational value of rating scales may be limited. Rating scales can provide a general indication of what aspects of the system users consider easy or difficult. However, it may be impractical to use rating scales to understand why the system is difficult and how the system could be modified to enhance usability. One way to use rating scales is to obtain an initial indication of system usability and then follow-up with other data collection techniques in those areas where usability appears to be an issue. However, it may be easier to just ask users if they had difficulty performing any particular function of the system and why they found that function to be difficult. Similarly, users can be asked whether the functionality of the system is adequate to support their job. If not, what additional functionality is needed? Adequacy of training can also be assessed by asking users whether the training they received was effective and how they would recommend improving the training. This leads to the second type of survey techniques: open-ended questions.
Open-ended questions can be administered as paper-and-pencil or on-line data collection instruments where the users are left to their own devices to interpret the questions and respond as they desire. Alternatively, the evaluator can directly ask the questions and interact with the users while they are responding. The evaluator’s interaction with users can occur in a face-to-face setting, over the phone, or on-line through the use of collaboration tools provided by the system under evaluation. In addition, the evaluator’s interactive adminstration of open-ended questions can occur one-on-one or in a group.
When possible, it is always more desirable to administer open-ended questions in an interactive session. Interactive sessions allow the users to ask for clarification of the questions and allow the evaluator to follow-up with more detailed questions when problem areas are identified. When open-ended questions are administered in a hands-off mode, the evaluator may not receive the level of detail required for a comprehensive evaluation.
The concern frequently held with open-ended questions is that written, narrative responses require significantly more thought and effort on the part of the user than circling a number on a rating scale. However, presenting users with a small number of well-constructed, open-ended questions may be less imposing than the monotony of responding to page after page of rating scales. Evaluators need to carefully balance minimizing the imposition on users with obtaining meaningful data that can support smart decisions and drive future actions. Surveys that are created solely on the basis of being non-obtrusive, but provide data of limited value, are a waste of everyone’s time.
One readily available source of usability data that is frequently overlooked is help desk requests. Systematic logging and review of problems addressed by the help desk staff can provide a wealth of information. If several users request help in the same area, it is likely there is a problem with the system design or the training users received. If users call into a help desk and ask how they can perform a specific task, but the system does not support the requested capability, the help desk staff will likely respond "sorry, the system doesn’t do that" and the call will be logged as a non-issue (if calls are being logged at all). However, the user’s request for additional capabilities should be considered a data point for evaluating the adequacy of system functionality. All user requests and comments that come into a help desk should be captured for the record and examined as an integral component of any system evaluation.
Help desk requests should not be used as the only source of usability information. Not all issues are likely to be surfaced with the help desk staff. Users are often reluctant to call into a help desk. These users may continue to struggle through the system, unable to take advantage of the full range of capabilities. Users who are having problems may also choose not to use the system at all, especially if they are able to find workarounds. Frequently, this means reverting to their old, but familiar, work tools and methods.
Direct observation of users interacting with the system, often referred to as "shadowing" is another method for collecting usability data. An evaluator observes users in the context of their daily work, watching how they interact with the system and where they have problems. Many evaluators believe that direct observation in an operational environment is the best way to understand the strengths and weaknesses of both the system and business process. However, direct observation is very resource intensive and likely to be logistically infeasible when evaluating collaboration technologies involving a wide range of geographically dispersed users. Even if face-to-face observation is not feasible, it is highly recommended that evaluators periodically observe activity within the virtual collaboration environment.
A final method for collecting usability information is through the use of expert evaluations. These evaluations employ the expertise of individuals trained in the principles of effective user interface design. These experts review the system in terms of its compliance with accepted design principles. The advantages of this method are that it does not require a time commitment from users. However, the expert reviews are very limited in scope. These types of evaluations only address ease-of-use and do not provide any information on the adequacy of the functionality to meet user work requirements or the effectiveness of training programs.
Table 12 summarizes the various data collection methods for usability metrics. No one single data collection method is likely to support a comprehensive evaluation of usability issues. The best approach is to employ a combination of techniques. At a minimum, the use of interactive open-ended questions, review of help desk requests, and at least some observation of the users within the virtual collaboration environment is recommended.

Usage statistics should be captured immediately following system deployment. Automatic collection of usage statistics does require some development and integration time, although there are an increasing number of commercial products available to capture usage data. Often, programs that are under pressure to deliver a system will delay incorporation of automated usage statistics for a later release of the system’s software, arguing that it will take time for the users to fully incorporate the system into their daily operations. However, delaying the collection of usage statistics can miss data that is vital to understanding the strengths and weaknesses of a system. For example, when users first receive their accounts, they may logon to the system for a short period of time while training is fresh in their minds. If users do not perceive value in the system, their use is likely to drop off dramatically in the near-term. Lack of participation by one or more key groups of users within a collaboration system can also affect system use soon after deployment. If users logon expecting to be able to collaborate with organizations and individuals who are not on the system, they may not logon again in the future. Therefore, capturing usage patterns during the first weeks or months after deployment is critical to gaining a full understanding of the system. Usage patterns are also likely to change over time. For this reason, usage statistics should be captured on a continuing basis.

Collection of business process and outcome metrics should occur both prior to and after system deployment. Once the targeted user population and goals for the system have been defined, baseline data on business process and outcome metrics can be gathered. Collection of baseline data should be completed before the system is deployed. Following system deployment, data should be collected on the impact of the system on process and outcomes. Data supporting planned process and outcome metrics should be collected at predetermined intervals following system deployment. In this case, it does make sense to wait until users have had an opportunity to apply the system to a product or process. The correct timeframe for collecting planned process and outcome metrics is determined by the specific goals of the system. If the system is deployed to support activities that will be fully engaged or completed within a short timeframe, then the initial data collection on process and outcome may occur soon after deployment. If the system is targeted to support collaboration on a longer term project, then taking the initial pulse on how the system is affecting operations may occur several months after deployment. It is also recommended that process and outcome metrics be captured at several points in time. Users may evolve the way they apply the system or business requirements may change over time. A single data collection point provides a very limited view. It is especially important to recapture process and outcome metrics following a significant upgrade to the system’s capabilities. In terms of business changes that have not been anticipated, a mechanism for capturing anecdotal evidence supporting ad hoc process and outcome metrics should be in place soon after system deployment.
The timing of data collection efforts to support usability assessments is different for the various aspects of usability. Information on system ease-of-use and adequacy-of-functionality should be gathered several times throughout the system’s lifecycle. Once the user population and business goals have been defined, an initial assessment should be made to ensure ease-of-use for that user population as well as to confirm adequacy-of-functionality to support defined goals. Ease-of-use and adequacy-of-functionality should be reassessed once the users have had an opportunity to work with the system within their operational environment. Ease-of-use and adequancy-of-functionality should also be reassessed following any significant upgrades to the system.
Users’ feedback on the effectiveness of training is traditionally collected
immediately following training. It is also important to have users
revisit their opinions on training after they have been using the system
as a part of their daily operations. User may not realize that training
failed to adequately prepare them until they have tried to use the system
on their own. Finally, help desk requests can provide valuable
information related to ease-of-use, adequacy-of-functionality, and training
effectiveness. A mechanism should be in place immediately following
system deployment to continuously capture and monitor help desk requests.
The value of metrics is dependent on what role one plays in regards to the system under evaluation. For the purposes of this discussion, three types of stakeholders that should have a vested interested in metrics are defined: (1) funders, (2) program offices, and (3) process owners. Funders are those individuals or organizations who are providing the funds for the development, deployment, and support of the system. Funders are generally interested in metrics to justify financial investments in collaboration systems. Program offices are the organizations responsible for executing the development, deployment, and support associated with collaboration tools. The program offices’ interest in metrics should be to collect information on how to improve the tools and associated training. Process owners are the actual system users or managers of those users. Process owners need metrics to determine the impact of the system on their bottom line, i.e. does the system enhance operations? The value of metrics for each of the three stakeholder categories is summarized in Figure 7.

Figure 8 further breaks down the value of metrics in terms of the specific types of data that are meaningful to each category of stakeholder. For usage statistics, funders are probably most interested in the number of individuals and organizations who are participating in the collaboration system. The number and breadth of users that benefit from systems is often a key justification for program costs. Since the main concern of programs is how to improve tools and system support, they are most interested in usage statistics that describe the degree to which specific system capabilities are being used. Process owners need to know how frequently specific individuals or organizations are using the system and for what purpose.
In terms of business process and outcome metrics, funders want information related to return on investment. What is the cost/benefit ratio associated with the system? Program offices are interested in process and outcome metrics to determine additional tool capabilities needed by the user population to achieve desired business goals. Process owners, on the other hand, need process and outcome metrics to understand how their operations have been impacted by the system and whether the system is enabling defined business objectives.
Although process owners have a vested interest in whether a system is easy to use and whether the functionality provided by the system satisfies their operational needs, the real customers of usability metrics are program offices. Usability metrics identify areas in which the program needs to enhance either the ease-of-use, functionality, or training associated with the collaboration tools.

Responsibility for the collection, analysis, and reporting of metrics has traditionally fallen with the program offices that develop, deploy, and support collaboration systems. Since usage statistics should be collected automatically, it is appropriate that the responsibility for these metrics remain with the program offices. Similarly, since the program offices are the primary customers for usability metrics, it is reasonable for the program offices to also have responsibility to collect this data. The burden of collecting business process and outcome metrics, however, is more appropriately placed with the process owners. Process and outcome metrics are likely to be the most burdensome on users to collect and the most sensitive in terms of judging the performance of individuals, organizations, and teams. Successful collection of process and outcome metrics requires cooperation and buy-in from the users and their management. In fact, only the process owners can define the goals of collaboration systems and, thus, appropriate process and outcome metrics. It may not be reasonable to expect process owners to inherently possess the expertise to construct data collection instruments and analyze evaluation data. Evaluation experts should be assigned to work directly with the process owners on these metrics rather than placing the entire metrics burden on program offices. Recommendations for who should be responsible for collecting metrics are summarized in Figure 9.

The bottom level of the hierarchy of understanding depicted in Figure 10 prescribes the recommended minimal level of investment in metrics. At a minimum, usage statistics should capture the number of users who logon to the system on a daily basis, as well as the number of user accounts that are active each day. Business process and outcome metrics should, at least, consist of general feedback from the user population on the value of the collaboration system. Similarly, usability metrics should include general user feedback on how easy the system is to use, whether the system provides adequate functionality, and whether training was effective. Usability metrics at this level also include a review of help desk requests. This basic level of metrics provides an indication of the percentage of users who are logging onto the system, general usage trends over time, high-level indications of whether users perceive the system to be of value, and general impressions of the system’s usability. What this minimal level of metrics investment does not provide is any specific information on the value-added of the system. In addition, very limited information is provided in this approach on how to improve either the collaboration tools or business processes to further enhance collaboration.
The intermediate level in the hierarchy of understanding builds on the information collected in the bottom level of the pyramid. Usage statistics at this level include basic demographic information on the users, such as organizational affiliation and role, so that usage can be tracked by key user characteristics. In addition to capturing the basic number of user logons, usage data at this level also includes how frequently specific functions within the system are used (e.g., number of e-mail messages sent, number of on-line meetings held, number of whiteboard sessions). Information on business process and outcome metrics is obtained from anecdotal evidence such as success stories or sample case studies. Usability information is gathered through formal user surveys and interviews. The level of understanding gained with this intermediate investment in metrics is information on the how the system has affected sample business applications or a subset of users. More structured data from the usability surveys or interviews also provide information on which to base improvement strategies for the system.
The highest level of the hierarchy of understanding represents a comprehensive metrics program. More detailed information is collected on user demographics as well as specific function use (e.g., specific types of data exchanged, who is using what system functions to perform what tasks). Business process and outcome metrics are expanded to include structured baseline comparisons. Usability metrics include both structured user surveys as well as direct observation of users interacting with the system. The advantage of this level of detail is the ability to characterize the return on investment for the system in terms of the value afforded to specific business operations and user groups. In addition, this level of metrics provides program offices and process owners specific information on how to improve tools and processes to enhance collaboration.

