Aiming at goals
Measurements quantify attributes of a system. Measurements allow us to calibrate and communicate these attributes so that the system itself is better understood. To maintain this understanding over time, the measures must be designed properly, and collecting and communicating them must be cost effective. To design and use measurements cost-effectively, we must adopt rules about how we assigning values to our observations, and put them in a context so that such attributes can be compared and appropriately applied.
The following definitions are important to understanding several of the underlying concepts of measurement, and applying these concepts in design and operations:
- Attribute - an aspect, facet, or characteristic of an object or system
- Metric – a measurement attached to a measurement unit
- Measurement units – the basis of observations that is used for reporting purposes, which is established by their:
- Level of analysis – the span of observations that will be aggregated when reporting
- Level of measurement – the intended precision of collected measures
- Indicator - something that signifies the presence or absence of an attribute
Both the value of measurements, and an indication of how precise and accurate these values are (i.e. their variability with respect to reference standards) is crucial to successfully applying measurements over time.
Several key types of measurements deserve special attention, since they are inter-related:
- Quantity - the amount of acceptable work that has been completed.
- Capacity - the maximum quantity of work can can be produced at a sustainable pace (velocity).
- Efficiency - the quantity of inputs that a system requires to produce a prescribed quantity of outputs
- Quality - how frequently results meet a defined set of user's requirements or needs; stated differently, achieving targets within accepted variance limits
- Productivity - A representation of both quality and efficiency measurements in one metric
The predictability of each of these measures is dependent upon control of the variability of the factors which arise in their production. These factors often are 'proxys' for real controls, and are selected for convenience rather than effectiveness in control. Measurements are usually developed for one of several purposes. Performance metrics attempt measure what you are doing; diagnostic metrics seek to uncover why things are performing the way they are, and product metrics report on attributes of the outputs being produced. The reporting of this information should never attempt to tell the whole story, or to blindly be used for decision-making, but rather should establish a basis for subsequent discussions. As a part of that discussion, benchmarking can often be used effectively to provide helpful landmarks of comparable performance in similar situations. As such, they can be particularly useful in providing for the basis of realistic, long-term goal setting.
Here is a recommended approach to designing measurements that will be effective within the above contexts:
- Establish the reason for collecting metrics. Examples:
- Understanding a situation
- Tracking progress in performing an improvement activity
- Reporting on the status of some activity
- Determine what specific product and service is going to be measured
- Identify the scope, completion criteria, and level of assessment for the measurement
- Determine which dimensions of these objects are critical to competitiveness, supporting business objectives, or process management. Examples:
- Cycle time
- Throughput
- Costs of delays
- Identify realistic behavioral changes which will help achieve each of these target outcomes
- Assign an 'owner' of measurement collection and reporting that will continue to refine the collection and application of the metric
- Determine the target outcomes for each of these dimensions
- Determine the level of measurement required to achieve these outcomes
- Reach agreement on a controlled, consistent, and reliable means for collecting and interpreting the results.
- Design reporting formats and cadences which will be used to track progress towards that target
- Deploy the metric, publish the results, and begin using the results to support the reasons established for metrics collection
- Reflect and iterate
The following guidance is relevant to applying these steps:
- Measurements and metrics must be well-defined, or their collection and reporting will not be uniform, and they will not be useful in decision-making. For example, let's say a measure of quality is required. There needs to be clarity with respect to what constitutes a defect. Defects might be interpretted as anything which results in changing the source code in a software-intensive system, whereas at the system level, it might only be counted if a defined test procedure failed; obviously, these separate meanings would need to be reconciled and associated withinin the corresponding change tracking systems. Similarly, rules and tools for counting the quantity of things should be carefully designed, so that the total measurements can 'roll up' into meaningful assessments (In software-based systems, do blanks count? Comments? Headers?).
- Proposed measurements should be developed and reviewed with the community who is expected to collect and report on them, so that expectations are clear with respect to data collection, so the intended usage of the data is well understood. This allows approaches to be tailored to local practices.
- Metrics must be collected against prescribed, controlled baselines, or uncontrolled changes will render their understanding problematic. As a result, the context for collection must always be well defined, and reported along with the metrics themselves.
- Metrics must be collected and reviewed in an environment of respect, trust, and commitment, and with a focus on managing by facts and data; otherwise, it has been repeatedly demonstrated that people will game the system.
- Metrics collections should also be synchronized against other development triggers. What should be measured thus shifts with time. For example, measurements of individual components should be aligned with developmental 'builds', whereas integration element measures should be aligned with interface baselines. Such triggers should be factored into reporting baselines, so that the lessons learned during one baseline (whether a development baseline or integration baseline) can be compared and evaluated over time. Learning and improvement should be expected, but such learnings will always be incremental and only developed over time. If you've been burning down 20 test procedures per week, you won't go from 20 to 40 in one week, but with considerable effort, may be able to achieve a 3-5% improvement rate per month. Such improvements will only be evident once you've made appropriate investments in the right infrastructure to capture and institutionalize such learning. Of course, if a common cause is essentially blocking nearly all progress, addressing that blocking issue might enable your rate to jump more quickly, and then achieve smaller, incremental improvements from there.
- Ideally, reporting should be performed within a standard period. While weekly is desirable, collecting data at a different pace than you can make or incorporate decisions may not be worthwhile, so it may be appropriate to reduce this frequency to monthly in some circumstances (such as when working across companies). However, the less frequent your reporting period, the less accurate the predictions of future performance will be, since there will be fewer data points to draw predictions from. Thus, there should be a balance between frequency and effort of reporting, and the desired precision and accuracy of measurements.
- Everyone must commit to evolving the definitions of the measures which are used to manage the work over time, and their interpretation and reporting, to ensure that they are useful and not burdensome; if you ever reach the point that you are collecting data that is not useful, or collecting metrics for the sake of collecting metrics, you must realign everyone's actions against what your business goals are, and make appropriate adjustments.
While a typical response to rolling out a metrics program is always 'do you want the work done, or do you want the metrics', the answer to this question should always be that producing metrics is how we measure when the work will be done, and accurately determining when we might expect when it is done, and so both activities need to be supported. If that date in fact does not matter, then such measurements may not be necessary. But if it does, something other than a SWAG should be used in decision-making.
Never forget, however, that measurement for the sake of measurement will likely cost you a lot, and may buy you very little. To avoid such waste, consider available guidance on metrics, KPIs, scorecards, and dashboards, and use them to design a program that targets a limited number of meaningful objectives, deploys those incrementally, and demonstrates their value in showing progress on designated leverage points over time.This is not about challenging people to prove that they will do a good job; it is about synthesizing an overall assessment from individual measurements collected across many different sampling points in the development effort, and assuring that the best possible information is made available for decision-making.

One of the most effective communities of practice for using measurements to provide accountability of results has been the Lean efforts within Medicine. Thomas Nolan, Donald Berwick's process leader, emphasizes the importance of determining the interrelationships between key outcomes and the primary and secondary drivers which effect those outcomes. This is consistent with the GQM measurement methodology used widely in process improvement. For example, one of the primary contributors to medical costs is in the appropriate utilization of resources in the last 6 months of life. An analysis of the factors involved in this is depicted in the graphic on the right.
Once such an analysis has been performed, specific 'leverage points' can then be identified which can improve the target outcome over time. Until these leverage points are determined, it can be counter-productive to introduce reforms, since you will not be able to determine if the reform led to the outcome or not. Finding these leverage points itself will usually require data collection, research, and experimentation, but once such measures of effectiveness are designed and implemented, and the core disciplines necessary for assuring they are reliable are put in place, they can be used quite effectively.
One more thing that's fundamentally different about Berwick's approach is in how he uses goals - he actually aims his measures at strategic leverage points for improvements, as opposed to providing vague goals for improvement. My favorite quote from Don is this: 'Some is not a number, and Soon is not a time.'
While Berwick's approach works well to guide and focus resources for work done by others, measuring yourself is an entirely different ballgame. People are aways concerned that measures will be used to apply pressure on them, a tactic that nearly always fails, since they will find ways to game the system. For this reason, even well-designed metrics programs can struggle to gain traction. For example, at Microsoft, their teams try hard to use data to guide their projects; yet too often, their approaches (and most others) were based upon dogma and guesswork, rather than evidence. One of their key learnings has been to measure results (what), rather than methods (how), since results are less subject to biases. Another is that measurements should not be used to mindlessly make decisions, but rather to trigger the need for a review that will determine any new courses of action required. Important in such considerations is what the baseline was under which measurements were collected, what the target measurement definition and assumptions are, and how performance trends over time. Generally, it is changes in these trends (or their absence) rather than the absolute values of measurements themselves, that offers the greatest opportunity for signalling the need for action.
