AI Is Shifting Software from Measurement to Quality

For decades, software engineering has been built on a simple assumption: the same input should produce the same output.

That assumption shaped everything. It shaped how we wrote code, how we designed systems, and especially how we tested them. If a function returned something different for the same input, we called it a bug. Determinism became the foundation of trust.

And it made sense. Traditional software deals almost entirely in quantitative data. Numbers, booleans, exact strings. Things that can be measured, compared, and asserted.

Then AI showed up.

AI does not operate the same way. It does not produce purely quantitative outputs. It produces language, images, and ideas. It produces things that must be interpreted. And interpretation introduces variability.

You can give the same prompt to an AI system multiple times and receive different responses. Not random responses, but different expressions of something that may still be correct. The output is no longer a fixed object. It is a qualitative result.

We Are Trying to Force AI Into the Old Model

The industry’s first instinct has been to resist this. We try to force AI back into the old model. We want it to behave like traditional software. We tweak prompts, lower temperature settings, and build layers of control in an attempt to make the output consistent. But this misses the opportunity.

AI is not breaking software. It is exposing a limitation in how we have been thinking about it. We have spent years optimizing for consistency because consistency is easy to measure. But consistency is not the same thing as quality. It is just a proxy. And like all proxies, it eventually fails.

That failure is now visible.

Our Testing Model Was Built for Sameness

What makes this shift uncomfortable is not just that the outputs are different. It is that our entire testing philosophy was built around sameness. Traditional systems allowed us to build automated tests that assert exact results. If X goes in, Y must come out. That was the definition of correctness.

AI breaks that definition at the root.

Two outputs can be different and still be good. They can even be better in different ways. That means correctness is no longer something we can fully capture with assertions.

The Return of Human Judgment

In the short term, this forces a return to human judgment.

We are seeing this directly in how we operate. Our UX team is now far more involved in QA than they have ever been. They are not just validating that something works. They are judging the quality of what is being produced. They are asking whether the output is useful, clear, and aligned with intent.

That is not something an automated test can fully assert. It requires a person.

This is uncomfortable for an industry that has spent decades trying to remove humans from the testing process. But it reveals something important. We were not actually testing quality before. We were testing consistency.

Can AI Test AI?

At the same time, we are exploring how to bring AI into the testing process itself. If the system produces qualitative output, then the testing layer may also need to operate qualitatively. That means using AI to evaluate AI.

But this introduces a real constraint. Automated testing depends on speed. AI-based evaluation is slower. It requires context, processing, and judgment. We are still working through how to balance that tradeoff.

What is emerging is a layered approach. Fast, deterministic tests still exist where they make sense. But above them, we are introducing qualitative validation, both from humans and eventually from AI systems that can judge quality at scale.

It is less precise, but it is more aligned with reality.

This Is What Striving for Quality Was Pointing To

In Striving for Quality, I argue that quality cannot be measured directly. It can only be judged. Metrics can point toward quality, but they can never fully capture it. When we rely too heavily on what can be measured, we begin optimizing for the metric instead of the thing the metric was supposed to represent.

That is exactly what happened in software.

We optimized for determinism because it was measurable. We built systems that could prove they were correct through repetition. But in doing so, we quietly reduced our ability to recognize actual quality.

AI is forcing that back into the system.

From Measurement Back to Quality

AI is reintroducing the need for judgment. It is forcing us to evaluate outputs based on usefulness, clarity, and alignment instead of exact matches. It is moving us away from the object as the primary focus and back toward quality itself.

This is not a step backward.

It is a correction.

AI did not create this problem. It exposed it.

Leave a comment