Silent Data Errors in Scientific Software: The Validity Failure

Portrait of Dennis Lenard in the UX design agency.

Dennis Lenard

Mar 2026

Scientific instrument software can produce invalid results without triggering any errors. We name the pattern, document three instances in AZtecCrystal, and sets out UX principles that prevent silent failures from reaching the literature.

This article draws on Creative Navy's project work in complex technical and scientific software, spanning computational fluid dynamics, surgery planning systems, scientific research software, CAD/CAM platforms, circuit simulation, vessel tracking systems, air traffic control, and mission control environments. We have designed for demanding technical experts such as CFD analysts, circuit design engineers, surgeons, air traffic controllers, mission controllers, and maritime operators. A central competency in this work is the visualisation of complex, dynamic, multi-dimensional data under real operational conditions, where clarity and precision directly affect decision quality. Several of these environments are governed by specific human factors standards, including EUROCONTROL and ICAO guidelines for ATC, IEC 62366 for medical software, and NASA requirements for mission-critical systems.

A researcher submits a paper. Two weeks later, a reviewer returns it with a note that does not query the science. The query is narrower and more damaging: the data cleaning parameters have not been disclosed, and the reviewer suspects aggressive processing. The paper is rejected.

This is not a hypothetical. Oxford Instruments documented it in their own EBSD Data Processing Blog in 2018. Data cleaned with inappropriate Auto-Clean parameters had been rejected during peer review. The vendor's own words: "aggressive cleaning makes features look artificial." No in-software warning exists as of February 2025 to prevent the same outcome.

This article examines one category within a broader taxonomy of failure modes in scientific instrument software, which we set out in full in this article. That category is the silent validity failure: an interface state in which the software communicates normal operation while producing scientifically invalid output. Three instances from the AZtecCrystal corpus make the structure of the problem visible. All three are instances of the same anti-pattern. The argument this article builds is that the pattern is not an edge case. It is the default consequence of software designed without a model of what the interface communicates when nothing appears to go wrong.

This article addresses software product managers, UX leads, and quality assurance teams in scientific instrument development who need a precise definition of the problem before they can act on it.

Key Statistics

  • Oxford Instruments acknowledged in their 2018 EBSD Data Processing Blog that data cleaned with inappropriate Auto-Clean parameters had been rejected during peer review
  • AZtec 6.2, released February 2025, contains no Auto-Clean warning, usage guidance, or audit trail for cleaning operations (as of February 2025)
  • The LightForm Wiki phrase "click at your own peril" for Auto-Clean is now actively quoted in multiple indexed community posts, making it the primary distributed user guidance on this failure
  • MTEX GitHub Issues #478 and #479, documenting the round-trip CTF export failure, remain open as of March 2026
  • The only structured tutorial addressing the 180-degree coordinate flip targets users with MATLAB literacy, placing it outside reach for the majority of AZtec users at shared facilities
  • Community documentation of instrument software failure modes lags feature releases by an average of 12 to 18 months

Three EBSD Silent Validity Errors

The three failure modes examined here differ in mechanism. They share one property: the interface gives no signal that anything is wrong.

The Auto-Clean risk. AZtecCrystal's Auto-Clean Up feature applies noise reduction parameters automatically. The feature is prominently positioned in the interface. Oxford's own interactive demo of the cleaning workflow demonstrates it without flagging the hazard. What the demo does not show: if the parameters are set too aggressively, the software will smooth away real microstructural features and replace them with artefacts that look like data. AZtecCrystal provides no audit trail of cleaning operations. If a paper is queried during peer review, the researcher has no record of what was applied (as of February 2025).

Auto-Clean Up in AZtecCrystal is a UX design failure because it separates the action from its consequence. The feature executes without displaying the parameters it will apply, without warning that aggressive settings produce scientifically artificial results, and without creating any record of what was done. A peer reviewer can reject a paper on the basis of that record gap. The software treats the absence of an error state as equivalent to the presence of valid data. It is not.

The Confidence Index misinterpretation. The Confidence Index in EBSD analysis is widely understood by novice users as a measure of data quality. It is not. CI measures phase discrimination success given the phases the user has entered into the software. A CI above 90% in a dataset where the phases have been incorrectly specified will appear high and will appear to indicate reliable data. It does not. Community characterisation across ResearchGate threads captures this precisely: CI is "a lottery between the phases you put in." No vendor tutorial addresses this misinterpretation directly.

The 180-degree coordinate flip. AZtec displays EBSD maps in beam view while calculating orientations in the acquisition frame. These two coordinate systems are misaligned by 180 degrees. A user who does not know this will produce pole figures that are mirrored. The results look correct. The software offers no warning. One indexed Reddit thread from 2024 documents the specific user experience: three days working on an analysis before realising the pole figures had been mirrored throughout.

In each case, the software's normal completion state, a map, a CI percentage, a pole figure, communicates that the analysis is complete and the data is ready. None of the three carries a signal that the output may be scientifically meaningless.

Having established that the three instances share a structure, the question is what that structure is and what distinguishes it from ordinary software failure.

What Is a Silent Validity Failure?

The Silent Validity Failure is a specific interface failure mode with a specific three-component structure.

The first component is a completion signal. The software finishes its operation and presents output in the visual form of valid data: a map, a number, a figure. Nothing in the interface indicates that the operation has produced something scientifically invalid. From the user's perspective, the analysis has succeeded.

The second component is a hidden prerequisite. Validity depends on a condition the user may not have met: the right cleaning parameters, the correct phases specified, the right coordinate convention applied. The interface does not surface this prerequisite at the point of operation. It surfaces it, if at all, in a 2018 blog post, a community wiki, or a peer reviewer's query.

The third component is propagation. Because no error state is triggered, the invalid output travels forward through the analysis. It enters calculations. It enters figures. It enters a manuscript. The cost is not borne at the point of failure. It is borne weeks or months later, in a form that damages a paper's scientific standing rather than a user's workflow.

Silent validity failures are only discoverable through structured observation of users working with real data in real conditions, not through standard usability testing or crash-log analysis. A crash log records crashes. A CI value of 94% in a misspecified dataset leaves no log entry. Observational research conducted in live working environments is the specific methodology that surfaces these failures, because it captures what users do when the interface signals success and something else is happening underneath.

Across instrument software projects, the moment that creates most discomfort in a research session is not when a user encounters a crash or a confusing error message. It is when a user completes an analysis confidently, and a question arises about whether to surface what the research has just documented: that the output they are satisfied with may not be scientifically valid. Very knowledgeable users know this and say so explicitly. Most users do not. The vendor response, when this is raised, tends to be that the scenario is rare or that documentation exists. Both may be true. Neither resolves the interface problem.

The knowledge that Auto-Clean is dangerous has existed at Oxford Instruments since at least 2018. That knowledge produced a blog post. The blog post produced a community shorthand. The community shorthand is now the most widely distributed user guidance on the most consequential cleaning failure mode in the software. What the knowledge did not produce is a change to the interface.

Why a Crash Beats a Silent Error

The conventional assumption in instrument software development is that a crash is the worst possible UX outcome. The evidence from this analysis challenges that assumption directly.

A crash is a better UX outcome than a silent validity failure. A crash surfaces a known failure at a known moment. The user cannot proceed. The error is visible, dateable, and addressable. A crash does not travel through a manuscript. It does not reach a peer reviewer.

A silent validity failure does all of these things. The researcher who applied aggressive Auto-Clean parameters did not experience an error. The software completed normally. The manuscript was submitted. The reviewer returned it. The cost was not a disrupted workflow. It was a rejected paper, and in the broader case, potentially contaminated scientific literature.

A serious objection to this argument is worth naming: crashes are not always harmless. In long-running acquisition sessions, a crash means lost data and instrument time that may be expensive or unrecoverable. That is a real cost. The argument here is not that crashes are acceptable or desirable. The argument is narrower: a crash that terminates an invalid operation is better than a normal completion that conceals one. The correct design response is to prevent both, but if a development team must prioritise, addressing silent validity failures carries higher scientific stakes.

This reframes the design priority for instrument software teams. The question is not: how do we prevent crashes? The question is: which of our interface states complete normally while generating invalid output? Those states are more dangerous than any crash, because they pass through every downstream checkpoint without triggering a flag.

How to Fix Silent Validity Failures

Addressing the Silent Validity Failure pattern requires changes at the interface level, not the documentation level. A blog post from 2018 is not a warning. A community wiki phrase that has become shorthand among expert users is not a safeguard for the novice using a shared facility instrument without expert supervision. Three principles address the structural problem.

Principle 1: Surface prerequisites at the point of operation. The phases specified, the cleaning parameters applied, and the coordinate convention active are prerequisites for scientific validity. They should be visible at the moment the user executes an operation that depends on them, not retrievable from a separate help resource. AZtecCrystal's cleaning interface should display the last-applied parameters and a consequence indicator before the operation executes. The prerequisite is known. The interface simply does not present it.

Principle 2: Make completion states conditional on known prerequisite risks. A completion state that signals success when a prerequisite may not have been met is not a completion state. It is a pass-through. Where the software knows that a condition, such as Auto-Clean parameter selection, affects scientific validity, the completion state should carry a conditional signal. The operation is confirmed complete. The validity is qualified until the prerequisite is verified.

Scientific software should distinguish between operational completion and scientific validity inside the interface, not in a blog post. When a cleaning operation completes without verifiable parameter disclosure, when a CI value appears without noting that it measures phase discrimination rather than data quality, or when a map is displayed without flagging the active coordinate convention, the interface is treating these as equivalent. They are not. Measuring the cognitive load a warning-free interface imposes on a user, rather than estimating it, is the starting point for understanding why users proceed confidently when the design intent is that they should pause.

Principle 3: Create an audit trail for operations that affect reproducibility. AZtecCrystal does not store cleaning routines. There is no record, after a session, of which operations were applied in what sequence. For software whose output enters peer-reviewed literature, this is not a minor gap. Reproducibility is a scientific requirement. The interface should treat it as one. Every operation that a reviewer might query should leave a recoverable record.

These three principles do not require a fundamental redesign of the software architecture. They require treating the interface as a system that communicates validity conditions, not just completion states. The distinction is small in engineering terms. In scientific terms, it is the difference between a paper that survives peer review and one that does not.

Limits and Gaps

The three instances examined here are not the only silent validity failures in the EBSD software corpus. The CrossCourt4 documentation, which includes AZtec-specific instructions because the default Euler angle frame produces silently wrong elastic strain and stress tensor calculations, represents a higher-stakes instance: an error that would not appear as a data quality failure but as a physically incorrect result in a published stress analysis. The DREAM.3D import filter documentation carries an explicit warning that Oxford CTF data requires rotation filter application and that results are "wrong" without it. Both merit dedicated analysis beyond the scope of this article.

There is also a gap in the evidence base. The claim that silent validity failures are more prevalent than crash failures in this software corpus is structurally plausible and supported by the documented instances. It is not statistically proven. A systematic review of published EBSD methods sections, assessing whether cleaning parameters, CI interpretation, and coordinate conventions are documented in accordance with reproducibility requirements, would quantify the scope of the problem in a way the current evidence cannot.

The strategic pressure on instrument software vendors to address these failures is increasing as institutional data integrity requirements tighten and as AI-assisted reanalysis tools create new pathways for retrospective error detection. We examine the commercial context for that pressure in depth in a separate article in this series. The strategic case for UX investment in scientific software is made fully there.

Why the Pattern Persists in Practice

The three principles described above address the interface design problem. They do not address the organisational problem that allowed the pattern to persist.

Oxford Instruments has known about the Auto-Clean risk since at least 2018. That knowledge produced a blog post, not a software change. The blog post produced a community shorthand that is now the primary guidance available to novice users at shared facilities. The knowledge pathway from vendor to user runs through a third-party wiki maintained by a specific academic group at Manchester, not through the interface of the software itself.

Why the pattern persists is not straightforward to answer. Development process constraints, validation requirements for software changes, and the legal and commercial implications of surfacing a warning that implies prior versions were inadequate all contribute. The interface failure is identifiable and actionable. The institutional path that produced it is neither.

Conclusion

Return to the researcher whose paper was rejected. The software behaved correctly throughout: it completed every operation, it displayed every result, it produced no error states. The failure was not in the processing pipeline. It was in what the completion signal communicated, which was that the operation was valid when validity depended on a prerequisite the interface never surfaced.

The Silent Validity Failure framework names this structure precisely: a completion signal, a hidden prerequisite, and propagation through the analysis chain without triggering a flag. Auto-Clean, CI misinterpretation, and the coordinate flip are instances of this structure. Each has been acknowledged in community documentation. None has been addressed at the interface level in the most recent software releases reviewed for this article.

The priority for instrument software development teams is not to prevent every crash. It is to identify which normal completion states depend on prerequisites the interface does not surface. Those are the states where the scientific literature is quietly contaminated.

The absence of in-software warnings connects directly to a second structural problem: the tutorial content that should compensate for interface gaps actively compounds them. We examine that failure in the next article in this series.

Frequently Asked Questions

What is the Auto-Clean risk in AZtecCrystal?

Auto-Clean Up applies noise reduction parameters automatically. If the parameters are set too aggressively, the software replaces real microstructural features with processing artefacts. The interface provides no audit trail, no record of parameters applied, and no in-software warning as of AZtec 6.2 (February 2025). Oxford Instruments acknowledged in their 2018 EBSD Data Processing Blog that data cleaned with these parameters had been rejected during peer review.

Does a high Confidence Index mean EBSD data is reliable?

No. The Confidence Index in AZtecCrystal measures phase discrimination success given the phases the user has specified, not data quality or preparation quality. A CI above 90% in a dataset where phases are incorrectly specified will appear high while being scientifically meaningless. No vendor tutorial currently corrects this interpretation for novice users at shared facilities.

What is the 180-degree coordinate flip in AZtec?

AZtec displays EBSD maps in beam view while calculating orientations in the acquisition frame. These two coordinate systems are misaligned by 180 degrees. Without correction, downstream pole figures are mirrored. AZtec 6.2 (February 2025) contains no in-software correction or warning. MTEX 6.0 (October 2024) introduced structural improvements to reference frame handling but does not retroactively correct data already collected under AZtec default conventions.

How does IEC 62366 approach silent output failures?

IEC 62366-1 applies to medical devices rather than scientific instruments. However, its framework for use error analysis, specifically the distinction between errors that produce visible failure states and those that produce incorrect outputs without triggering an alert, provides a directly applicable analytical model. Instrument software vendors whose products feed regulated downstream processes may face analogous documentation requirements in institutional quality management audits.

Why do these errors propagate undetected?

The interface completes each operation normally. No error state is triggered. The output has the visual form of valid data: a map, a percentage, a pole figure. The failure condition is a hidden prerequisite, not a process failure, so it leaves no error log and passes through every subsequent review checkpoint without flagging.

What does a corrected interface design look like for Auto-Clean?

At minimum: display the parameters Auto-Clean will apply before execution, include a consequence indicator that quantifies the risk of aggressive settings for the specific dataset, and create a session-recoverable audit trail of all cleaning operations applied. These are interface changes that do not require alteration of the processing algorithm. The cleaning behaviour need not change. What must change is whether the interface treats the completion of that behaviour as equivalent to the validation of it.

References

Cross, A. J., Prior, D. J., Stipp, M., and Kidder, S. (2017). The recrystallized grain size piezometer for quartz: An EBSD-based calibration. Geophysical Research Letters, 44(13), 6667–6674. https://doi.org/10.1002/2017GL073836

Davis, A. E., Roebuck, B., Shercliff, H. R., Bray, S., Ding, R., and Prangnell, P. B. (2019). Spatially resolved characterisation of the β to α transformation in Ti-6Al-4V using EBSD and metallographic techniques. Materials Science and Engineering: A, 765, Article 138248. https://doi.org/10.1016/j.msea.2019.138248

LightForm Group. (accessed March 2026). EBSD data analysis with AZtecCrystal and MTEX. LightForm Wiki. https://lightform-group.github.io/wiki

MTEX Development Team. (2024, October). MTEX 6.0 release notes and EBSD reference systems. MTEX GitHub Repository. https://github.com/mtex-toolbox/mtex

Muiruri, A. M., Maringa, M., and du Preez, W. (2022). Microstructural characterisation of heat-treated Ti-6Al-4V alloy produced by direct metal laser sintering. Applied Sciences, 12(19), 9552. https://doi.org/10.3390/app12199552

Niessen, F., Nyyssönen, T., Gazder, A. A., and Hielscher, R. (2022). Parent grain reconstruction from partially or fully transformed microstructures in MTEX. Journal of Applied Crystallography, 55(1), 180–194. https://doi.org/10.1107/S1600576721011560

Oxford Instruments. (2018). EBSD data processing: Getting the most from your data. Oxford Instruments EBSD Blog. https://ebsd.com

Oxford Instruments. (2025, February). AZtec 6.2 release notes. Oxford Instruments. https://www.oxinst.com

In this story

When EBSD software completes an operation normally but produces scientifically invalid output, researchers have any feedback. We examine three documented instances in AZtecCrystal: Auto-Clean data integrity risk, Confidence Index misinterpretation, and the 180-degree coordinate flip. The Silent Validity Failure framework sets out three interface-level principles for instrument software teams.

17 min read

You might also like

The Automation Paradox in Medical AI: Why Your Interface May Be Creating Errors
Medtech & Healthcare Design

The Automation Paradox in Medical AI: Why Your Interface May Be Creating Errors

Experienced radiologists' accuracy dropped from 82% to 45.5% when shown AI scores first. Bainbridge predicted this in 1983. Clinical AI is now living it, and interface design is where it gets solved.

20 min read
When Flexibility Becomes the Enemy of Good Design
Industrial GUI

When Flexibility Becomes the Enemy of Good Design

A four-iteration design project for an aircraft engine manufacturer found that designed constraints reduce error risk more reliably than flexible interfaces. Here is what the evidence showed at each stage.

16 min read
Mass Photometry Software UX Benchmarking: a systematic review
Scientific Interfaces

Mass Photometry Software UX Benchmarking: a systematic review

Five mass photometry analysis tools reviewed against a consistent UX framework. DiscoverMP, PhotoMol, ImageJ, CellProfiler, and BioImageIT each fail in documented ways that compound reproducibility risk across shared facilities.

22 min read