Medical Device Error State Design and Patient Safety

This article draws on Creative Navy's project work in medtech UX, spanning practice management software, surgical equipment, ventilators, blood pumps, infusion systems, and patient monitoring devices, including Class II and Class III regulated products. Our work in this sector covers clinical environments including the ICU and operating theatre, designing for surgeons, nurses, and biomedical engineers. Dennis Lenard, who leads this work at Creative Navy, is the author of User Interface Design For Medical Devices And Software, the practitioner reference on UX design for medical devices and software. Our approach integrates IEC 62366 usability engineering requirements and FDA Human Factors guidance as structural inputs to the design process, not post-hoc compliance activities.

A nurse on a late shift programs a syringe driver. The device looks identical to the one she used yesterday. Same manufacturer, same panel layout, same button in the same position. She sets the rate and starts the infusion. The device is running. The patient receives medication.

What she did not see was the difference that mattered. The device in her hands delivers medication at one twenty-fourth the rate of the one she expected. The only distinguishing features between the two models were a colour difference and a small printed label, in the smallest typeface on the device, indicating the delivery rate.

No warning interrupted her. No error state fired. The interface gave her nothing to contest the assumption she had already formed.

The design is what failed.

Across five of the most documented medical device safety incidents of the past four decades, a single mechanism runs beneath the case evidence: the interface was designed for the happy path, and the error states were added afterwards. When errors arrived, they arrived into a cognitive context the design had not accounted for. The result, in each case, was a clinical response that was slower, less accurate, or absent entirely.

This article examines those five cases, diagnoses the common mechanism, and presents the design principles required to close the gap. The primary audience is device experience integrators, safety and consequence bearers, and regulatory leads who carry accountability when interface decisions become adverse events.

Original design

Medical UX Design

Key Statistics

At least 3 deaths attributed directly to radiation overdose from the Therac-25, between 1985 and 1987 (as of 1993)
23 incidents including 4 fatalities recorded in Scotland between 1989 and 1994, associated with small-volume syringe pump model confusion
40,000 Graseby syringe drivers deployed across the NHS at peak use, approximately one quarter of all such devices used worldwide
22 types of medication error risk facilitated by a single widely deployed CPOE system; three quarters of house staff reported each occurring weekly or more (as of March 2005)
566 alarm-related deaths reported to the FDA MAUDE database, January 2005 through June 2010
85 to 95% of clinical device alarms require no clinical intervention (as of 2012)
771 alerts per bed per day recorded in one intensive care unit study

What Five Cases Reveal

The Graseby MS16A and MS26 syringe drivers were functionally incompatible devices built on nearly identical physical interfaces. The MS16A was calibrated to deliver medication over 24 hours, expressed as millimetres per hour. The MS26 delivered over a single hour. The only visible distinction between them was colour and a small printed label, rendered in the lowest font weight on the device, indicating the delivery rate. For a comparison of what adequate differentiation looks like in current syringe pump interface design, see our benchmarking of syringe pump UX design patterns.

In Scotland between 1989 and 1994, this visual ambiguity contributed to 23 reported incidents, including 4 fatalities. The UK Department of Health issued a hazard warning in 1994. The NHS National Patient Safety Agency issued a Rapid Response Report in December 2010 requiring organisations to transition away from these devices within five years. NHS Improvement correspondence as late as June 2018 indicates units remained in service after that transition window had closed.

The second case is better known. Between June 1985 and January 1987, the Therac-25 radiation therapy machine was involved in at least six accidents in which patients received massive radiation overdoses. At least three deaths were directly attributed to radiation overdose, as of 1993. The machine had removed the hardware interlocks present in its predecessors, the Therac-6 and Therac-20, in favour of software-based controls. When a race condition caused the turntable to fail to reach its correct position, the machine did not halt. It displayed a message: "Malfunction 54."

The message appeared nowhere in the operator manual. The manufacturer subsequently confirmed that the device itself could not distinguish between radiation underdose and overdose. The error message told the operator that a malfunction had occurred. It gave no indication of what had happened, what state the machine was in, or whether it was safe to continue. Operators cleared it and proceeded.

The third case is Computerised Physician Order Entry. A landmark study published in JAMA in March 2005 examined a deployed CPOE system and identified 22 types of medication error risk the system actively facilitated. Three quarters of house staff reported observing each error risk on a weekly basis or more frequently. Among the most consequential: clinicians in some implementations had to navigate up to 20 different screens to view a patient's full medication history. The structural problem this exemplifies, navigation depth that forces users to hold context in working memory rather than displaying it, is examined comparatively in our medical device information architecture UX patterns benchmarking.

The label ambiguity error, the option to prescribe for "tomorrow" that caused orders placed after midnight to skip an entire day's dose, is a precise example of failure to account for temporal context. The interface presented an option whose meaning depended on clock state the user could not reliably track under load.

The fourth case is alarm fatigue. The Joint Commission's Sentinel Event Alert Issue 50, published in April 2013, reported 566 alarm-related deaths logged in the FDA MAUDE database from January 2005 through June 2010. The Joint Commission's own sentinel event records documented 80 alarm-related deaths between January 2009 and June 2012. The ECRI Institute listed alarm hazards as the number one health technology hazard in both 2012 and 2013.

The mechanism is well understood. An estimated 85 to 95% of clinical device alarms require no clinical intervention, as of 2012. In one study at a major academic medical centre, more than 59,000 alarms fired over a 12-day period. In an intensive care unit, the average was 771 alerts per bed per day. Staff respond to this volume not through increased vigilance but through desensitisation: volumes are reduced, thresholds are adjusted, and alarms are silenced. When a genuine alarm fires, it fires into a clinical environment already primed to ignore it.

The fifth case is ventilator interface inconsistency. Clinicians working across institutions over a career may operate five or six different ventilator systems, each with its own terminology, layout, and interaction model. Critical parameters such as PEEP and FiO₂ are frequently buried several menu levels deep. The same ventilation mode is described by different names across manufacturers. ISO 19223 defines standardised terminology for ventilation modes; its adoption remains incomplete. The structural consequence is that clinicians carry mental overhead that belongs in the interface.

Across all five cases, the same questions recur. Why was the error state not designed to interrupt an already-formed assumption? Why was the warning not contextually intelligible? Why did the alarm fire without hierarchy that established its urgency? The case details do not answer these questions. Understanding how these interfaces were built does.

The Error Afterthought Model

The Error Afterthought Model names the design mechanism common to all five cases. It is not a failure of intent. Teams knew error states needed to exist. The failure is structural: design teams treat successful task completion as the end of the design problem, then add error states as disconnected appendices once the happy path is solved.

The model has two diagnostic arms.

The first is Disconnected Error. An error state designed with no reference to the task state the user occupied immediately before it arrives as a non sequitur. The user has cognitive momentum toward a goal. They hold an assumption about the state of the system. The error state interrupts that assumption but does not give them the information required to revise it. "Malfunction 54" is a Disconnected Error. It tells the operator something went wrong without anchoring that information to the task the operator was performing, the state the machine is in, or the action required.

Computerized Physician Order Entry (CPOE) Systems

The second is Gestalt Collapse. Multiple error signals are present simultaneously but were designed independently, and they arrive without hierarchy or relationship. The alarm-saturated intensive care unit is a clinical instantiation of Gestalt Collapse. No individual alarm was designed badly. Each was designed to capture attention for its specific condition. Together they produce a signal environment that cannot be parsed under load.

The Error Afterthought Model describes a design process failure, not a technology failure. Design teams treat completion of the primary workflow as the end of the design problem, then add error states as disconnected supplements. The result takes two forms: Disconnected Error, where an error state arrives without reference to the task the user was performing, and Gestalt Collapse, where multiple error signals designed independently arrive without hierarchy or coherence. Both are preventable if error state design is treated as a first-class use scenario from the start of a project.

The Therac-25 and the alarm-fatigued ICU are not two different problems. They are the same design process failure operating at different scales.

Error Messages Do Not Make Devices Safer

The common assumption held across product teams is that any error feedback is better than none. The presence of a warning, the belief goes, fulfils the designer's duty to the user. The product has communicated. The rest is on the clinician.

The case record does not support this.

"Malfunction 54" communicated nothing usable. It was present on the screen. The system had communicated, in the narrowest technical sense. Operators cleared it and continued. At least three patients died as a direct consequence of radiation overdose.

An alarm environment producing 771 alerts per bed per day is not a silent system. It communicates relentlessly. The consequence of that relentlessness is clinical silence. Staff develop coping behaviours specifically to reduce the volume of incoming signal.

A poorly designed error state does not make a device safer. An error that arrives disconnected from the user's cognitive context, or that floods the display without hierarchy, degrades the quality of the clinical response compared to a well-designed intervention. Error state design is a primary safety problem. The standards framework is direct on this point: IEC 62366-1 requires that use scenarios include foreseeable use errors and that summative usability evaluation tests user performance under those conditions. The standard treats the error path as a use scenario requiring the same design and validation rigour as the primary workflow.

The question for any design team is not whether error feedback exists. It is whether the error state, as designed, supports a safe clinical response under the cognitive conditions in which it will actually be encountered.

Designing for Real Cognitive Context

Six principles follow from the case analysis. They are design requirements, not suggestions.

Principle 1: Design error states as part of the task sequence, not after it.

The error state a clinician encounters does not arrive in isolation. It arrives after a sequence of actions, in the middle of a clinical workflow, with a goal already formed and attention already allocated. The design of the error state must account for where in the task the user is, what assumption they are likely holding, and what information they need to revise that assumption. "Malfunction 54" failed this requirement entirely. A state that names the machine's condition in terms the operator can act on does not.

Principle 2: Visual differentiation must be unambiguous between functionally distinct devices.

The Graseby case is a design failure at the physical interface level. Two devices with incompatible delivery rates were differentiated only by colour and a small-font label. Colour alone is insufficient: it is missed under low light, during rapid selection, and by clinicians with colour vision deficiency. Label font weight communicates priority. A label rendered in the lowest font weight on a device communicates that it is supplementary information. When it carries the critical functional distinction, it will be treated as supplementary by users under load.

Ventilator interface design

Principle 3: Design error states for recognition, not decoding.

When a surgical device shows a critical status in an operating theatre, under gloved hands, under the cognitive load of an active procedure, the clinician cannot stop and read. The interface must produce recognition. The state, the severity, and the required action must be decodable within the window of attention that is actually available. This principle was the central finding from work on a surgical ultrasonic cutter used in orthopaedic and trauma surgery: warnings required reading rather than recognition until the interface was redesigned with procedural relevance and glance-readability as primary constraints, across 13 surgeon sessions and 12 human factors studies.

Medical device error state design requires that critical status is always visible and that every error state is recognisable at a glance, without requiring the user to stop and read.

Six design principles reduce patient harm from poor medical device error state design. Error states must be designed as part of the task sequence, not after it. Visual differentiation between functionally distinct devices must be unambiguous. Error states must support recognition rather than decoding. Alarm hierarchies must remain parseable under clinical load. Interface label ambiguity must be treated as an error mechanism. And error state design requires observational research in live clinical environments to understand the cognitive context in which errors will actually arrive.

Principle 4: Alarm systems require a hierarchy the clinical environment can parse.

An alarm that fires at the same urgency level as 770 others in the same day is not a safety signal. It is noise. Effective alarm design is not a matter of ensuring each alarm is technically correct. It is a matter of ensuring the aggregate signal environment remains parseable under the cognitive load of clinical work. Hierarchical alerts with distinct colour, tone, and modality by urgency level are a structural requirement. Threshold personalisation by patient-specific data reduces the false-alarm load that produces desensitisation in the first place.

Principle 5: Interface ambiguity is a dose-error mechanism.

The CPOE label ambiguity case is easy to dismiss as a training problem. Clinicians should know that "tomorrow" means the next calendar day, not the next clinical day. The error was resolved in some implementations by defining "tomorrow" as any time after 6am. This is a design correction. It treats the ambiguity as a property of the interface, not a gap in user knowledge. Every label whose meaning depends on context the user cannot reliably access under load is a latent error mechanism. Designing it out is a design obligation.

Principle 6: Error state design requires observational research in live clinical environments.

Designing an error state for the cognitive context of clinical use requires knowing what that cognitive context actually is. Self-report surveys and lab sessions, where cognitive load is artificial and interruptions are controlled, do not produce this knowledge. Contextual inquiry and observational research in live clinical environments capture what users actually attend to, what they miss, and how their attention is distributed across competing demands at the moment an error arrives. This is the methodological requirement the case evidence supports, not an enhancement to standard practice.

IEC 62366 Compliance and Error States

IEC 62366-1 requires usability engineering across the full use-related risk spectrum. Summative usability evaluations that test only the happy path and primary use scenarios leave a demonstrable gap in use scenario coverage. If an adverse event is subsequently connected to how an error message was presented during a use error, the validation record will show that error-state design was not treated as a use scenario requiring testing. This gap cannot be addressed after the evaluation is complete. It must be built into the test protocol from the start.

The risk register implication is equally direct. Error-state design decisions that were treated as UI-layer choices below the threshold of hazard analysis are precisely the decisions most likely to surface in an adverse event investigation. Classifying error state design as below the risk threshold is itself a risk decision, and one the case record has repeatedly shown to be wrong.

Across projects involving clinical interface design, the pattern that consistently surprises product teams is this: the error states causing the most harm in practice are not the ones nobody thought of. They are the ones the team believed were handled because a message exists. The compromise is rarely "we decided not to show an error." It is "we showed an error, so we assumed the problem was solved." In live use, that assumption fails at the exact moment the interface is under the most cognitive load, in the most consequential clinical situation, with the least available attention to parse a poorly designed state. The resistance to going back and redesigning error states, once the happy path is approved, is almost always framed as a schedule problem. In our experience, it is a risk assessment failure presented as a project management constraint.

Limits and Gaps

The five cases examined here are drawn from the most extensively documented medical device safety events in the public record. They represent a specific category of failure: interfaces that existed, were used at scale, and whose error states contributed to documented harm over extended periods. They are not a representative sample of the full population of medical device interface failures, the majority of which do not generate public investigation records.

The Error Afterthought Model describes a design process failure. It does not apply to every category of medical device error. Errors arising from hardware failure, from clinical protocols unrelated to interface design, or from system integration failures outside the interface layer are not within its explanatory scope.

The evidence on alarm fatigue is robust in describing the phenomenon and the volumetric conditions that produce it. The evidence on design interventions that sustainably reduce alarm fatigue at the system level, without creating new categories of missed critical events, is considerably thinner. Hierarchical alert design and threshold personalisation are supported by the available evidence, but the conditions under which these interventions fail have not been studied with the same rigour as the problem itself.

The Koppel et al. CPOE study examined a single system at a single tertiary teaching hospital over the period 1997 to 2004. The 22 error types it documented are directionally consistent with subsequent CPOE research, but the specific interface failures may not generalise to all current implementations.

A harder question sits beneath the framework. What combination of design quality, clinical protocol, and IEC 62366-1 validation scope is sufficient to prevent the next iteration of these failures? The case evidence shows what went wrong. It does not supply a fully specified answer to how much better is enough. That question remains genuinely open, and any practitioner claiming otherwise has not read the field carefully enough.

Conclusion

A clinician encounters an error state in a high-stakes procedure. In the second it takes her to read it, to decode it, to determine whether it requires action and what that action is, the interface has already failed the person it was designed to serve.

The five cases in this article share a common cause. Error states were added to interfaces after the successful workflow was designed and validated. They were designed for the system's state, not the user's state. They arrived disconnected from the cognitive context of clinical work, or they arrived in such volume that the clinical environment had already adapted to ignore them.

The Error Afterthought Model names this mechanism. Disconnected Error and Gestalt Collapse are its two diagnostic arms. Both are preventable: by treating the error path as a first-class use scenario from the beginning of the design process, by subjecting error state design to the same observational rigour applied to the primary workflow, and by validating error-state performance under the IEC 62366-1 use scenario framework rather than confining summative evaluation to the happy path.

Error state design is not a secondary task. It is a primary safety problem. The case record is unambiguous on this point.

If your team is currently treating error states as a post-happy-path task, or if your IEC 62366-1 evaluation protocol does not include use errors as test scenarios, the gap between your current position and the safety standard the evidence requires is exactly the gap documented in these five cases. Addressing it before deployment is substantially less costly than addressing it after an adverse event.

Frequently Asked Questions

What is the Error Afterthought Model in medical device UX design?

The Error Afterthought Model describes the design process failure in which product teams treat completion of the primary workflow as the end of the design problem, then add error states as disconnected supplements. It has two diagnostic arms: Disconnected Error, where an error state arrives without reference to the task the user was performing; and Gestalt Collapse, where multiple error signals designed independently arrive without hierarchy or coherence. The model explains why error messages can exist and still fail to support a safe clinical response.

Does IEC 62366 require usability testing of error and alarm states?

IEC 62366-1 requires that use scenarios include foreseeable use errors and that the summative usability evaluation covers user performance under those conditions. Evaluations that test only the happy path leave a demonstrable coverage gap. If an adverse event is subsequently connected to how an error state was presented during a use error, the validation record will show that error-state design was not treated as a use scenario requiring test coverage.

What made the Graseby MS16A and MS26 confusion fatal?

The Graseby MS16A and MS26 were functionally incompatible devices, one delivering medication over 24 hours and the other over one hour, with nearly identical physical interfaces. The only distinguishing features were colour and a small-font label in the lowest font weight on the device. Neither cue is sufficient for reliable discrimination under clinical load. The UK Department of Health issued a hazard warning as early as 1994. In Scotland alone, 23 incidents including 4 fatalities were recorded between 1989 and 1994 before the NPSA issued its Rapid Response Report in December 2010.

How does alarm fatigue relate to interface design rather than clinical workflow?

Alarm fatigue is a clinical phenomenon with a design origin. When 85 to 95% of alarms require no clinical intervention, as documented in the Joint Commission Sentinel Event Alert of April 2013, the alarm system has been designed to maximise sensitivity at the cost of specificity. The result is a signal environment that clinical staff adapt to by reducing volume, adjusting thresholds, or silencing alarms. These coping behaviours are a rational response to a poorly designed system. The correction is hierarchical urgency differentiation and threshold personalisation, not staff training.

Why do CPOE systems create new error types while eliminating existing ones?

CPOE systems replace handwriting ambiguity with a different error category set shaped by their interface structure. The 2005 Koppel et al. JAMA study identified 22 error types generated by a deployed CPOE system, including fragmented displays preventing coherent medication views and navigation requiring up to 20 screens to view a patient's full medication history. Each failure reflects a design decision, not an inherent property of digitisation. Designing against them requires the same use scenario rigour as any other safety-critical interface.

How does medical device error state design interact with IEC 62366 risk assessment?

Error-state design decisions treated as UI-layer choices below the threshold of hazard analysis are precisely the decisions most likely to surface in an adverse event investigation. IEC 62366-1 hazard analysis must include the consequences of foreseeable use errors in the context in which they occur, including the cognitive state of the user at the point of error encounter. Classifying error state design as below the hazard analysis threshold is itself a risk decision, and one the case record has repeatedly shown to be wrong.

References

Leveson, N. G., & Turner, C. S. (1993). An investigation of the Therac-25 accidents. IEEE Computer, 26(7), 18-41. https://ieeexplore.ieee.org/document/274940

Koppel, R., Metlay, J. P., Cohen, A., Abaluck, B., Localio, A. R., Kimmel, S. E., & Strom, B. L. (2005). Role of computerized physician order entry systems in facilitating medication errors. JAMA, 293(10), 1197-1203. https://pubmed.ncbi.nlm.nih.gov/15755942/

Dickman, A., & Schneider, J. (2002). Guidelines for use of the MS26 daily rate syringe driver in the community. Palliative Medicine, 16(6), 533-534. https://pubmed.ncbi.nlm.nih.gov/12411857/

The Joint Commission. (2013). Sentinel Event Alert, Issue 50: Medical device alarm safety in hospitals (April 8, 2013). https://www.kff.org/wp-content/uploads/sites/2/2013/04/sea_50_alarms_4_5_13_final1.pdf

Cvach, M. (2012). Monitor alarm fatigue: An integrative review. Biomedical Instrumentation and Technology, 46(4), 268-277. https://array.aami.org/doi/full/10.2345/0899-8205-46.4.268

National Patient Safety Agency. (2010). Rapid Response Report NPSA/2010/RRR019: Safer ambulatory syringe drivers (December 2010). Referenced in NHS Improvement correspondence, June 2018. https://nationalcareassociation.org.uk/news-events/news/old-style-graseby-syringe-drivers-e-g-ms16-ms16a-ms26/p20

NCBI Bookshelf. (2020). Making Healthcare Safer III: Alarm fatigue. Agency for Healthcare Research and Quality. https://www.ncbi.nlm.nih.gov/books/NBK555522/

In this story

From the Therac-25 radiation accidents to alarm fatigue in ICUs, five cases share a common cause. Error state design was treated as secondary, after the happy path was solved. This article introduces the Error Afterthought Model, diagnoses the common mechanism, and presents six design principles for clinical device teams responsible for patient safety.

21 min read

Table of contents

What Five Cases Reveal
The Error Afterthought Model
Error Messages Do Not Make Devices Safer
Designing for Real Cognitive Context
Limits and Gaps
Conclusion
References

About

Work

Lab

Blog

Jobs

Contact

Why Medical Device Error Design Kills