The overhead of switching contexts
Let's consider the implications of distributing attention across too many competing demands. Inefficiencies exist as people or teams switch from performing one task to another. Let's call the monitoring and control function which performs such context switches our 'Work Operating System'. This Work Operating System is offered to provide a metaphor for what occurs during context switches, and relate that to the actions which an operating system must perform as it manages the resources on a computer. In each case, whether the computer or person switches contexts, there is overhead (in both time and energy) that is involved in making such transitions.
Psychologists describe these context switches as disrupting an individual's psychological flow. Such changes significantly reduce the overall throughput of work teams, whether they are small or large. Ideally, such context switches should only occur when one set of tasks completes, or when the Work Operating System must respond to infrequent requests for switching context, i.e. unplanned attention on particular material.
Such emergent work may or may not deserve expedited attention. When requests for interrupts are serviced by an operating system, before it is scheduled, the system must have sufficient resources and operating performance to process this emergent work. Once this task is accomplished, the computer's operating system must return to processing both normal priority (time-critical in the short-term) jobs and it's lower-priority jobs (batch runs that are not immediately time-critical, but are still important over the longer term). The system must also have sufficient throughput capacity to 'catch up' with the demand over time. Individuals and work groups have these same challenges.
The false promise of multitasking is that when processing elements are directed to treat all jobs as high-priority, the attention of those elements is split among these jobs, and no one benefits. This is depicted by the two options on the right, which depicts a situation in which 3 jobs compete for attention from a single resource. In option 1, allocation is assigned in a fixed time-slice, and scheduling occurs in a round robin style. Contrast this with option 2, which depicts a 'focus and finish' style of execution. Note that for Job A & B, option 2 is always better than option 1, and for Job C, the outcome is no different than would have occurred under option 1. Yet option 2 is only really possible when constraints are removed which prevent efficient processing.
In resource assignments that involve people rather than computer jobs, work can arrive at rates that are inconsistent with the capacity of the people to process that work. Once processing begins, it may be discovered that the work may also be in various stages of 'readiness' for execution. There also may need to be separate input job queues for different types of work (for example, for different customers, priorities, service commitments, or durations. Each of these job queues can change length quite quickly, and the optimum algorithm to service these queues is dependent upon whether responsiveness for certain types of jobs or overall throughput is more important. The jobs in these queues themselves may also need to be regularly re-prioritized or shuffled, to respond to external conditions or competing demands. To further complicate things, these job queues may be 'serviced' by multiple, concurrent processing agents that themselves have different capacity, and which may conflict for resources and attention.
Planning to achieve an aggregate throughput at a desired rate in such a system is only possible when the underlying system itself is operating deterministically. Yet for knowledge work, there are often no precedents for doing the work, so it is not possible to be deterministic. What is predictable in this situation is how most performance management systems behave (as described by David Anderson):
When knowledge work begins to accumulate, the human beings performing that work are placed under pressure. Pressure can be internal and psychological, but more often than not it is external. With a backlog, there is pressure to show that everything in the backlog is being worked on. The result is everything takes longer - a basic effect of multitasking that results when project teams attempt to work on everything.
Over time, as the system (and it's resulting performance and variation) has been calibrated and baseline measurements have been benchmarked for a known set of inputs, various improvements may be incorporated to gradually increase that rate. Such improvements may depend upon the
- characteristics of the demand on the system (queue size, etc),
- capabilities of processing elements
- bottlenecks and constraints of the system,
- management and controls available to shape the inputs and processing of work
Achieving determinism of the underlying system is far more difficult than it sounds. According to the theory of constraints, improvements involves 4 primary steps: identifying constraints, exploiting them, subordinating all other activities to them, and then elevating the constraints. However, this model presumes a deterministic system, and becomes much more challenging when constraints are not predictable due to system dynamics, changes, and performance variability.
Until order is established through a predictable governance system, decisions and performance commitments involve considerable risk. Attempting to operate a non-deterministic system at a higher rate than it is capable of sustaining for long durations may lead to crashes and loss of critical information. Recovery from resulting outages can easily exceed planned reserves.
When individuals and work groups multitask, they must adopt similar policies when they switch from one task to another. Let's call these policies the WOS scheduling algorithm. This rule set comes into play in many different circumstances, such as when a roadblock is encountered in performing one task - waiting on resources, information, or decisions - or when overall priorities need to be re-evalated so that focus can be redirected. The need to apply the group's WOS scheduling algorithm can arise while in the middle of processing another task, when a task completes, or when a critical event (time, or external stimuli) triggers the reallocation of attention and resources. Effort is required to analyze and characterize the options. Just determining which task to run next takes time, and when under pressure, such decisions are often flawed.A scheduling algorithm can make such decisions if it indeed has 'control', but if it's not even clear what components are responsible for scheduling and decision-making, the throughput of the system will be unpredictable.
In order to determine what to do next, it is usually also useful to have some idea of how long things in the job queue will take before they begin execution, and it is desirable to have them complete execution once they have begun. Preference is usually given at some level to completing things which are already started, or restarting suspended jobs which are close to completion, to keep the list of things that have to be regularly re-evaluated shorter. But sometimes, it may make more sense to start something else, at the expense of completing this work in process.
Tracking these decisions and their consequences to overall performance is important to improving throughput. This allows adjustments to be implemented over time, as patterns develop (an example might be correcting for chronic underestimation of how long things will take). Selection of the right thing to improve, in the absence of such data, can be frustrating, because you may end up working on the wrong things, pouring a lot of energy into it, and not see the results that you desire.
The scheduling algorithm and resources available to this Work Operating System should be optimized for the system's maximum sustainable throughput. When resources are constrained and available unpredictably, this is conceptually simple: the most highly weighed scheduling heuristic should be to give attention to time-critical priority tasks, and limit execution of lower-priority tasks until higher priority tasks complete. If no high priority tasks exist, then a second heuristic should come into play: work should be performed according to the priorities of tasks which are not time critical. When priorities are equal, attention should be assigned based upon the amount of remaining work to do on those tasks, so that things are close to completion are completed, and there are thus fewer context switches over time. Such a policy favors working on tasks for which the minimum requirements have not yet been achieved, and completion is within striking distance. However, hen the availability of resources is unpredictable, the task becomes much more complicated.
A walkthrough of a group's scheduling algorithm (from documented processes, and with all involved factors and decision-makers) under various scenarios is an effective way of verifying the fitness of the algorithm to these scenarios. In practice, though, using a tool such as a kanban can also be very effective, as it allows a visual indicator of overall system health to emerge, and enables the group to dynamical tune the group's scheduling algorithm and dashboard over time, until an effective approach 'settles in'.
Tracking the time overhead associated with these activities and decision-making (the evaluation of what to change to, and the process of reconfiguration for the change) can be very useful, if overall system performance is critical for the system overall. The measures of effectiveness for system performance may include factors such as the responsiveness to arrivals of new work, the efficiency or throughput for benchmarks, etc. Accounting for (and reducing) this overhead is a key factor in assuring that the overall system behavior will be predictable, especially as the system approaches its capacity limits.
If you schedule a computer at 90% of capacity, the computer generally will encounter a condition called thrashing, in which overall throughput drops significantly, due to resource constraints and constant context switches. This can happen with people or work teams, too, if they are not able to focus attention on individual tasks, but are constantly interrupted. Meetings, email, phone calls, and other disrupters can all become a source of these interruptions. Of course, if all jobs complete, a computer's operating system essentially 'spins' in an idle loop, waiting for more work. Spending much time in this state is also not an efficient use of resources. So there should be a way of minimizing interrupts, feeding the right mix of medium priority tasks, and having low priority tasks to work for those times that there is nothing else to do, if overall utilization is to be maximized, while avoiding thrashing.
To stretch our analogy a bit further, from the perspective of the computer center's leadership team in this WOS analogy (the parent organization, or the leader of a work team), they may not want to buy more computers, even when there are backlogs of tasks to do, until they get insight about how effectively the existing computers are being used. Of course, this decision should be made based upon risk-based assessments of the cost of delaying jobs vs the cost of acquisition of new computers, and the time it takes for them to be brought on-line. While there is also a belief that adding more computers will provide corresponding linear increases in capacity, it should be recognized that while such changes are being made, and for a period of transition after, the change itself takes time from other resources, at least until the aggregate is fully operational; even then, there will be losses incurred. Simply put, 8 parallel computers generally does not result in 8 times the throughput of one computer, due to communications overhead, latency, and the complexity of scheduling.
One of the values of the analogy of a WOS is that it helps to emphasize the value of conducting periodic reviews on data collected against jobs run in a given period, and making decisions to tune the job queue, scheduling algorithms, and associated priorities accordingly. Such is the focus of computer operators for major data centers around the world, and for operations managers of production centers. Such operating system tweaks are not easy, as they can be disruptive to overall performance, until they are just right. And when the platform itself changes, the tuning must begin again.
This WOS analogy is just a mental model - but it points out that one way of evaluating performance is to review parameters about job status and throughput, through consideration of issues such as the following:
- What is the overall utilization (relative to capacity), and what might be done to improve that?
- How long are benchmark jobs taking to complete, and how does that compare to the last time they ran?
- Which resources are most frequently the constraint on processing?
- How frequently have high priority tasks been injected into the queue, how much time is spent working them, and how disruptive is that?
- What is the estimated time remaining to complete tasks in the job queues, at current performance levels?
- Which jobs have had to be (or may need to be) re-started because of interdependencies with some other job that themselves have had to be rescheduled?
- Are we servicing most important tasks at the level of responsiveness that they require?
You can see how designing a system that allows questions like this to be answered is important to improving performance of an operating system, whether that operating system runs a computer, or is the internalized rule-set used to decide what people work on from hour to hour. You can also see why it's important to have the discussion be performed in terms of 'machine time' as well as 'calendar time', because some jobs may take a long period to complete, but not take much effort; this might be due to the workload of other tasks that are concurrently executing.
However, unlike computers, people are not robots. They have strengths and weaknesses on different days or in different situations. They have emotional needs and desires, and those influence the outcomes that are achieved. They are the key in determining the extent to which the system can be made deterministic, and how quickly it can be evolved or tuned.
- Bryan Pflug's blog
- Login or register to post comments
