Medical Device Help Design: What the Benchmark Shows

This article draws on Creative Navy's project work in medtech UX, spanning practice management software, surgical equipment, ventilators, blood pumps, infusion systems, and patient monitoring devices, including Class II and Class III regulated products. Our work in this sector covers clinical environments including the ICU and operating theatre, designing for surgeons, nurses, and biomedical engineers. Dennis Lenard, who leads this work at Creative Navy, is the author of User Interface Design For Medical Devices And Software, the practitioner reference on UX design for medical devices and software. Our approach integrates IEC 62366 usability engineering requirements and FDA Human Factors guidance as structural inputs to the design process, not post-hoc compliance activities.

UI software errors were responsible for 5.44% of all medical device recalls analysed across a three-year FDA dataset and for 46.33% of all recalls attributable to software failure (Zhang et al., 2019, Biomedical Instrumentation and Technology, 53:182-194). For a product director managing a device family through a post-market surveillance cycle, that figure is not a patient safety abstraction. It is per-incident liability that the in-device help system, in most documented cases, was architecturally unable to prevent.

This benchmarking review examines in-device help systems across five widely referenced medical device platforms: the Dräger Evita V800 ventilator, Siemens Healthineers AI-Rad Companion, Stryker SONOPET iQ ultrasonic aspirator, Hologic Brevera breast biopsy system, and the Welch Allyn Connex Spot Monitor. The review evaluates each against six criteria drawn from the evidence base on use error aetiology and post-market performance patterns. It is written for product directors and senior product managers who need to understand what the current help design landscape in their device category is costing their organisation, and where the competitive gap is being left unclaimed.

Key statistics

UI software errors: 5.44% of all medical device recalls; 46.33% of all software-related recalls (Zhang et al., 2019, as of September 2015 dataset close)
70 to 90% of ICU device alarms are false or clinically non-actionable (meta-synthesis, PMC11992819, April 2025)
Medical device and clinical software interfaces: mean SUS score of 45.9/100, placing them in the bottom 9% of any product category assessed across more than 1,300 industry studies (Mayo Clinic/AMA study, cited in Mayo Clinic Proceedings, 2024)
Each 1-point SUS improvement is associated with a 3% reduction in odds of physician burnout (Mayo Clinic/AMA study, 2024)
Physician burnout costs the US healthcare system approximately $5.6 billion annually (Mayo Clinic Proceedings, 2024)
Healthcare IT leaders ranked usability their top technology concern in 2024, above AI adoption and telehealth (Juno Health survey, 2024)

What a help system actually signals

A medical device's help system is, at its most diagnostic, an artefact of primary interface decisions made under time and resource pressure. Where the primary interface placed users in states of uncertainty it could not resolve, a help layer was built to compensate. Where the design team exhausted its review bandwidth before clarifying function labels, tooltips were added to explain what the labels had failed to say. This describes the ordinary development trajectory of most embedded help systems, not an exception.

The distinction that matters commercially is between compensatory help and structural help. Compensatory help exists because the primary interface generated confusion the team could not resolve at the design stage. Structural help exists because the clinical task contains genuine judgment demands that no interface simplification could eliminate. Most deployed platforms contain both, but the proportion reveals something the post-market performance data will eventually confirm.

In a 2024 scoping review of EHR interfaces (PMC12206486), covering studies from 2007 to 2024, deep navigation hierarchies and nonintuitive menu labels doubled the clicks required to complete infusion pump documentation tasks, and wrong-field data entry occurred in 17% of observed tasks. The help systems for these platforms were not absent. They were structurally unable to interrupt errors that occurred within the primary interface flow before a user would think to consult them. The same mechanism applies to ICU device interfaces. As the evidence on healthcare UX and patient safety shows, poor clinical interface design is a patient safety problem before it is a support volume problem.

Across our work on medical device interfaces, the question that surfaces most consistently when we review embedded help systems is not whether the content is accurate, but why the user needs it at this particular point. More often than not, the answer traces back to an earlier interface decision that placed the user in a state of uncertainty the design itself created. The help system is not resolving complexity. It is documenting it.

How this benchmark was conducted

Six criteria were applied to each platform reviewed. Each is drawn from published evidence on use error aetiology and post-market performance patterns:

Criterion	What it tests
Contextual relevance	Whether help triggers at the correct task moment
Integration depth	Whether help is structurally embedded or retrofit
Error recovery support	Whether help addresses the device's documented failure modes
Infrequent-use currency	Whether the system supports users after skill gaps
Alarm architecture alignment	Whether help is accessible during alarm states
Post-market updatability	Whether help content can be revised as field data arrives

The evaluation draws on publicly available product documentation, the benchmarking literature, practitioner accounts from peer-reviewed qualitative studies, and evidence gathered in early 2026. Where device interfaces could not be directly accessed, assessments are marked as evidence-limited.

Dräger Evita V800: the structural baseline

The Evita V800 represents the highest integration depth in this benchmark. Its embedded help system operates as a dedicated section within the device menu, with breadcrumb navigation, scrollable display sections consistent with the device's primary UI, and a searchable knowledge base. The architecture was designed alongside the primary interface. This is the minimum standard against which the other platforms are measured.

The device received Authority to Operate certification under the US Department of Defense Risk Management Framework in February 2025 (Dräger press release, PR Newswire). This is a cybersecurity certification, not a UX change, and the help system appears structurally unchanged from prior documentation.

What the Evita V800 does not resolve is the alarm architecture alignment problem. A 2025 meta-synthesis of qualitative ICU studies (PMC11992819, searching up to May 2024) found nurses silencing alarms without investigation, adjusting thresholds to inappropriate values, and turning off alarms entirely. Not because they lacked access to help content. Because the alarm interface produced undifferentiated signal volume that help content cannot resolve. The Evita V800's well-structured help system is architecturally disconnected from its alarm presentation layer. This is where its post-market performance data will show the gap.

Siemens AI-Rad Companion: sense decay

The AI-Rad Companion is the most instructive case in this benchmark for a product director managing an AI-powered device family. The tooltip and "i" icon interaction model identified in earlier benchmarking work remains the persistent design approach. The content behind those interaction points has expanded materially. The most recently documented release (VA40) adds white matter hyperintensity segmentation and quantification for the Brain MR module, extending an algorithm family that now covers multiple modalities and body regions.

The tension is specific. The tooltip architecture was designed for a smaller, more bounded set of AI-driven decisions. Each new algorithm release expands the decision space that tooltips must explain, while the interaction model for delivering that explanation has not been revised. This is sense decay in its recognisable form: the gap between what the interface was built to explain and what the product now requires clinicians to understand widens with each release cycle.

There is a deeper problem the evidence cannot yet settle. Static, authored help content may simply be the wrong medium for AI-generated clinical decisions. The AI-Rad Companion's tooltips describe what an algorithm does and how to interpret its output. As algorithms become more capable, the decisions they support become less reducible to what a fixed piece of help text can explain. Whether the tooltip model is extensible to this level of complexity, or whether a different help architecture is structurally required, is an open question. No device in the current landscape has answered it.

The competitive context has shifted since earlier benchmarking. Viz.ai launched Viz Assist, a suite of autonomous AI agents for care coordination, in October 2025. GE HealthCare released Venue family updates with updated contextual guidance patterns the same month. These entrants are designing help architecture for AI-driven workflows from first principles, without the constraint of a tooltip framework built for a smaller capability set.

Stryker SONOPET iQ: task-specific delivery

The SONOPET iQ's two-button contextual help design, each button opening help specific to its respective function, is one of the more architecturally deliberate patterns in this benchmark. The help maps to the task the user is currently performing. FCC filing data from November 2024 confirms continued manufacturing activity, with no documented interface changes.

The limitation is coverage rather than architecture. The SONOPET iQ's marketed differentiators include RFID-enabled intelligence and Pulse Control waveform-based resection. Whether the two-button help architecture has been extended to cover these capabilities, or whether users encounter a gap between the device's marketed function and the help system's explanatory reach, cannot be confirmed from publicly available sources.

What the design does demonstrate is that task-specific help delivery is achievable within a constrained interface. For procurement evaluation this matters: a device offering two contextual help paths for its two primary functions has made a different claim about its design maturity than a device offering a generic help library reached through a menu.

Hologic Brevera: platform discontinuity

The Hologic 3D breast biopsy portfolio has progressively consolidated around the Brevera system, which integrates tissue acquisition with real-time CorLumina imaging verification. The interface described in earlier benchmarking documentation, featuring contextual highlighting with a yellow outline for attention-requiring functions, corresponded to a generation that appears to have been superseded.

This platform transition illustrates a specific risk in multi-generation device families: help architecture investment does not transfer automatically across hardware generations. When a platform transition occurs, the positioning logic, trigger conditions, and content mapping of the prior help system must be rebuilt for the new primary interface, or the new system inherits the gaps of the old one at greater cost.

The recall context within the Hologic breast health portfolio is relevant background. In 2024, Hologic initiated a Class I FDA recall of its BioZorb radiographic markers following 71 reported patient injuries (Radiology Business, May 2024). This involved a different product category, but it illustrates the regulatory scrutiny that documentation and labelling decisions across the Hologic portfolio currently receive. For the Brevera platform, help system design quality is part of the evidence package that post-market surveillance will examine.

The IEC 62366 practice shown in surgical device redesign demonstrates that when platform transitions are managed through formative usability work, the new interface's help architecture can be validated against the specific use scenarios the previous generation failed. When they are managed through cosmetic redesign with the existing navigation model unchanged, the same failure modes carry forward.

Welch Allyn Spot Monitor: help frozen

The Connex Spot Monitor received a sector alert from the US Health Sector Cybersecurity Coordination Center in June 2024 for vulnerabilities in the device and the Welch Allyn Configuration Tool, both rated at CVSS v4 scores above 9 (Industrial Cyber, June 6, 2024). A patch was released for one vulnerability; a second update was not expected until Q3 2024. No interface or help system changes were identified alongside these security interventions.

This case illustrates a structural tension that affects help design directly. Security patching cycles operate under urgency that UX update cycles do not share. A device can receive a critical security intervention while its help system remains frozen because updating it would trigger a new validation requirement. The result is a help architecture that reflects the interface at a specific point in its design history, not its current operational reality.

The contextual highlighting approach noted in earlier benchmarking (red error outline matching the banner colour for the specific function requiring action) has a defensible rationale. Whether it has been reviewed against current post-market performance data, including whether the colour coding holds for colour-blind users under clinical lighting, cannot be confirmed. Validation freeze limits what can be known from the outside.

Patterns across the benchmark

Three patterns hold across all five platforms.

The first is the structural disconnect between help content and alarm architecture. All five devices generate alarms as a primary feedback mechanism. None of the help systems reviewed are documented as having any integration with the alarm presentation layer: no mechanism by which alarm conditions trigger contextual help, and no documented connection between alarm threshold configuration and in-device guidance. The meta-synthesis evidence on ICU alarm fatigue (PMC11992819) describes nurses adjusting thresholds, silencing alerts, and developing personal workaround protocols as a direct consequence. No help content library compensates for this.

The second pattern is capability expansion without help architecture revision. The AI-Rad Companion is the clearest case, but the pattern is consistent across the benchmark: as device capability grows, help content is added to the existing architecture rather than the architecture being reassessed for the new capability set. This is how sense decay operates in product systems. The competitive position for a device that reassesses its help architecture at each major capability increment shows up in post-market support volume and clinical training costs, not in marketing materials.

The third is the treatment of help design as a post-design activity. The Error Afterthought Model describes how error states added after the happy path is complete arrive into a cognitive context the design had not accounted for. The same mechanism applies to help content. Help designed before the primary interface is complete can shape the interaction model. Help designed after it can only compensate for what the model left unresolved.

Platform	Integration depth	Alarm alignment	Updated 2025-2026	Primary gap
Dräger Evita V800	High	Not integrated	Security cert only	Alarm presentation layer
Siemens AI-Rad Companion	Moderate	Not integrated	Algorithm expansion	Static architecture for expanding AI
Stryker SONOPET iQ	High (task-specific)	Not assessed	None documented	Coverage of expanded capabilities
Hologic Brevera	Unknown (platform change)	Not assessed	Platform transition	Help investment across hardware generations
Welch Allyn Connex Spot	Moderate	Not integrated	Security patch only	Validation freeze limiting UX updates

Better help content is the wrong investment

The default response when help systems fail to prevent errors is to add content: richer tooltips, expanded FAQs, longer interactive tutorials. This is the wrong response when the primary interface is generating the confusion that the help system is being asked to resolve.

The WHO Surgical Safety Checklist evidence is precise on the underlying mechanism. A Dutch hospital study found that full checklist completion with genuine cognitive engagement produced a 77% reduction in the odds of patient death; partial completion produced no benefit; non-completion was associated with higher mortality than before the checklist existed. The confirmation design problem the checklist exposes is the same one that drives help system failure in medical devices: an interface can create a ritual that is performed without generating the cognitive engagement the ritual was meant to ensure.

Help content presented to a clinician in a state of alarm, time pressure, or task interruption does not reliably produce cognitive engagement. A respiratory therapist in ICU conditions encountering an alarm state needs an alarm presentation layer that distinguishes actionable from non-actionable conditions before help is required. Adding FAQ content about alarm thresholds does not address this.

The counterargument deserves a direct answer. Genuine clinical complexity will always require contextual guidance the primary interface cannot provide. A device administering medications through multiple lines, a radiology AI platform evaluating novel pathology, a robotic surgical system mid-procedure: these present decision demands that no amount of primary interface simplification eliminates. Help content that delivers judgment support at the right moment is not compensation. It is design. The distinction is whether the help system exists because the interface generated a question it could not answer, or because the clinical task itself contains uncertainty that belongs in the help layer.

The diagnostic test is simple. If the device's most common support call topics correspond to the areas the help system covers most extensively, the help is compensatory. If the help content addresses clinical decisions that experienced users would still benefit from at the point of action, the help is structural. The second category is where the competitive advantage lies.

The Compensatory Help Model

The Compensatory Help Model is a diagnostic framework for evaluating medical device help systems. It distinguishes between help designed to explain interface decisions that confused users and help designed to support genuine clinical judgment calls the interface cannot resolve. A device's help system is compensatory when its most prominent content addresses navigation, function labels, or confirmation steps rather than clinical decisions. Compensatory help systems do not reduce error risk. They document the interface confusion that produced it.

Applied to the benchmark above, the model predicts specific post-market patterns. The AI-Rad Companion's tooltip architecture was designed for a bounded feature set: as algorithm capability expands, tooltip content for newer modules will be longer, less discoverable, and more frequently bypassed. The Welch Allyn Connex Spot Monitor's red-on-red error highlighting was noted as a partial solution for colour-blind users; the model predicts the device has not verified this against post-market observation, because validation freeze prevents it. The Evita V800 scores highest on integration and breadcrumb navigation; the model predicts its weakest post-market performance data will correspond to alarm states, where the help system has no structural role.

What structural help design requires

Three principles apply in practice, drawn from the evidence above.

Help architecture must be scoped before the primary interface is finalised. The evidence on error state design and confirmation steps is consistent: design decisions made after the primary interface is complete will address the confusion that primary design created, not the clinical judgment demands the device presents. This is what IEC 62366 usability engineering requirements formalise: help content that functions as a mitigation for a known use error is a user-interface element with its own use scenario and its own risk profile, and its adequacy must be demonstrated in summative testing.

Alarm architecture and help architecture must share a design process. In every device reviewed, they are separate. The 70 to 90% false-alarm figure for ICU devices is the most documented consequence of that separation. A help system that cannot be triggered by an alarm condition, and that has no structural connection to alarm threshold configuration, will not reduce alarm fatigue, workaround behaviour, or the use errors that follow from both.

Help architecture must be reassessed at each major capability increment. The clinical observational research required to validate the revised architecture against actual clinician decision-making is not optional at the margin of device certification. It is the mechanism by which the gap between help content and clinical reality is closed. Devices that reassess help architecture at each capability release, rather than appending content to a static architecture, will produce smaller training cost exposure and more defensible post-market documentation than their peers.

Limits of this analysis

This review is constrained by the absence of direct device access for four of the five platforms reviewed. Assessment of integration depth, alarm alignment, and post-market updatability relies on publicly available product documentation, practitioner accounts from peer-reviewed literature, and secondary industry sources. The Stryker OrthoMap platform, referenced in earlier benchmarking work, was excluded because no credible update evidence was available.

The Compensatory Help Model is a diagnostic lens, not an empirically validated framework. Its central claim, that help content type and volume correspond to primary interface quality, has not been tested against a controlled device sample. It is a structured inference from the available evidence, not a measurement instrument.

Direct practitioner commentary about the specific help systems reviewed here was not found in the search horizon. Practitioners rarely attribute device-specific usability complaints to help system design in attributable public formats. The frustration evidence in this review draws from adjacent domains with overlapping clinical populations and directly parallel interface failure mechanisms: alarm fatigue literature, EHR scoping reviews, and respiratory therapist trade publications. The degree to which these patterns hold for embedded help systems specifically has not been empirically isolated.

Conclusion

The benchmark reveals a consistent structural pattern: help systems designed to compensate for primary interface decisions, without integration into the failure modes that drive post-market support costs and recall exposure.

UI software errors account for 5.44% of all medical device recalls and 46.33% of all software-related recalls (Zhang et al., 2019). These errors do not occur in the help system. They occur in the primary interface, in alarm states and confirmation steps, under conditions of time pressure and cognitive load where help content is structurally unavailable. Investing in richer help content without revising the primary interface architecture that generates the confusion is not a patient safety intervention. It is a documentation exercise.

UI software errors in medical devices account for 5.44% of all recalls and 46.33% of all software-related recalls, based on a three-year FDA dataset (Zhang et al., 2019, Biomedical Instrumentation and Technology). Help systems cannot prevent these errors in most documented cases because the errors occur within the primary interface flow, before a user would navigate to help content. Prevention requires primary interface redesign at the formative stage, not supplementary content added after clearance.

The competitive position is available to any product director willing to treat help architecture as part of the primary design process. Devices whose help systems are integrated at the formative stage, whose alarm and help architecture share a design process, and whose help content is reassessed at each capability increment will produce lower post-market support volumes, smaller training cost exposure, and more defensible regulatory documentation than their peers. No device in the current benchmark has fully claimed this position.

FAQ

What does the Compensatory Help Model identify?

The Compensatory Help Model identifies whether a medical device's help system exists because the primary interface generated confusion or because the clinical task contains genuine judgment demands the interface cannot resolve. If a device's most common support queries match its most prominent help content, the help is compensatory and the interface design is incomplete. If help content addresses clinical decisions that experienced users would still benefit from at the moment of action, the help is structural and represents genuine added capability.

Can help systems prevent medical device recalls?

Rarely, in their current form. UI software errors account for 5.44% of all medical device recalls and 46.33% of all software-related recalls (Zhang et al., 2019). These errors typically occur within the primary interface flow, in alarm states and confirmation steps, before a user would navigate to help content. Help content adds explanatory reach; it does not change the cognitive conditions in which primary interface errors occur. Prevention requires formative interface work, not supplementary documentation.

Why does alarm fatigue persist in ICU ventilators despite improved help systems?

Because alarm architecture and help architecture are structurally disconnected in every platform reviewed. During alarm states, ICU nurses are documented adjusting thresholds inappropriately, silencing alerts without investigation, and turning off alarms entirely (meta-synthesis, PMC11992819, 2025). These are workaround behaviours produced by undifferentiated alarm signal volume. No FAQ or tooltip addresses this because the problem is not a knowledge gap: it is an alarm presentation layer that cannot distinguish actionable from non-actionable conditions before help is ever reached.

How does AI capability expansion affect help design in radiology platforms?

For platforms like the Siemens AI-Rad Companion, each algorithm release expands the clinical decision space the device presents to users. A tooltip architecture built for a bounded feature set does not scale to a larger one without structural revision. Content behind the existing interaction model grows longer and less discoverable; clinician engagement with it decreases. Whether static, authored help content is the correct medium for AI-generated clinical decisions at all remains an open question the current generation of AI radiology platforms has not resolved.

What does IEC 62366 require regarding in-device help systems?

IEC 62366 requires manufacturers to document the intended use scenarios for all user-interface elements, including guidance and help features. Where help content is the primary mitigation for an identified use error, the adequacy of that mitigation must be demonstrated in summative usability testing. Treating help as outside the IEC 62366 scope is a regulatory documentation risk: a help system that addresses a known failure mode is a safety-relevant interface element under the standard, regardless of how it is classified internally during development.

What separates structural from compensatory help design in practice?

Structural help is designed before the primary interface is finalised and addresses clinical judgment calls the interface cannot resolve. Compensatory help is added after the primary interface is complete and explains the confusion the design created. The practical test: if removing the help layer would leave trained clinicians unable to interpret or navigate the device, the help is compensatory. If removing it would leave expert users without judgment support for complex clinical decisions the interface cannot simplify, the help is structural. Most deployed systems contain both; the proportion is what post-market performance data eventually reveals.

References

Zhang, J., Walji, M. F., Johnson, C. W., and Lowry, S. Z. (2019). User interface software errors in medical devices: Study of US recall data. Biomedical Instrumentation and Technology, 53(3), 182-194. https://doi.org/10.2345/0899-8205-53.3.182

Meta-synthesis on ICU alarm fatigue and nurse behaviour. (2025, April). PMC11992819. https://pmc.ncbi.nlm.nih.gov/articles/PMC11992819

Scoping review: EHR usability and clinical error. (2024). PMC12206486. https://pmc.ncbi.nlm.nih.gov/articles/PMC12206486

Mayo Clinic Proceedings. (2024). EHR usability, physician burnout, and healthcare system cost. doi: S0025-6196(24)00037-5. https://www.mayoclinicproceedings.org

Juno Health. (2024). Healthcare IT leader priorities survey. Cited in: CreateApe industry analysis. https://createape.com/insight/why-bad-ux-in-healthcare-comes-at-a-high-cost

Dräger. (2025, February 11). Evita V800/V600 receives Authority to Operate certification under DoD Risk Management Framework. PR Newswire. https://www.prnewswire.com

Industrial Cyber. (2024, June 6). HC3 sector alert: Baxter Welch Allyn vulnerabilities. https://industrialcyber.co

International Electrotechnical Commission. (2015, amended 2020). IEC 62366-1. Medical devices: Application of usability engineering to medical devices. https://www.iec.ch/homepage

Joint Commission. (2013). Sentinel event alert: Medical device alarm safety in hospitals. https://www.jointcommission.org/resources/sentinel-event/sentinel-event-alerts/sea-issue-50-medical-device-alarm-safety-in-hospitals/

In this story

A review of in-device help systems across five medical device platforms, from Dräger to Siemens AI-Rad Companion. Introduces the Compensatory Help Model to diagnose help design quality, identifies three patterns consistent across the benchmark, and sets out the principles that separate devices whose help architecture reduces error risk from those that document the confusion they create.

21 min read

Table of contents

What a help system actually signals
How this benchmark was conducted
Dräger Evita V800: the structural baseline
Siemens AI-Rad Companion: sense decay
Stryker SONOPET iQ: task-specific delivery
Hologic Brevera: platform discontinuity
Welch Allyn Spot Monitor: help frozen
Patterns across the benchmark
Better help content is the wrong investment
The Compensatory Help Model
What structural help design requires
Limits of this analysis
Conclusion
References

About

Work

Lab

Blog

Jobs

Contact

Help Design in Medical Devices: What the Benchmark Reveals