The Classification Gap: The Bug That Passes Every Test

Mike Dyer March 29, 2026 7 min read Part 7 of Solo DDD

There's a class of domain modeling error that no test catches. No structural analysis flags it. No linter complains. Your gap report comes back clean. Your aggregate has the right number of invariants, your bounded context has healthy ratios, your event flows trace end to end.

The model is wrong.

Not wrong like a missing field or a misspelled event name. Wrong like a reactive behavior modeled as a static constraint. Wrong like a Policy collapsed into an Invariant — structurally valid, semantically backwards, invisible to every check you run against it.

I call this the Classification Gap. And if you're using AI to assist with domain modeling — even informally, even just bouncing ideas off ChatGPT — you will hit it. You probably already have. You just couldn't name it.

Here's the example that made it concrete for me.

An architect describes a business rule: "When an order is placed, check inventory and reserve stock." That's a reactive behavior. Something happens — an order is placed. In response, the system does something — checks inventory, reserves stock. In DDD vocabulary, this is a Policy. A stateless reaction to a domain event that produces a downstream command. The triggering event and the resulting action are the whole point.

Now imagine the AI classifies it as an Invariant instead. An Invariant is a static constraint on an aggregate — a rule that must always be true. "An order cannot exceed the customer's credit limit." "A shipment must have at least one line item." These are structural truths about the domain. They don't react to events. They don't trigger commands. They just... hold.

The misclassification looks fine. The Invariant has a name, a description, an owning aggregate. The aggregate now has one more invariant — which actually improves its invariant-to-command ratio. If you're running structural completeness checks, every box is ticked. If you're evaluating heuristic thresholds, the numbers look healthier than before.

The gap report says nothing.

But you've just collapsed a temporal behavior into a spatial constraint. You've taken something that fires when an event crosses a boundary and turned it into something that sits inside an aggregate doing nothing. The reactive path — the entire causal chain from "order placed" to "stock reserved" — is gone from the model. Not broken. Not misconfigured. Just absent, because the concept that would have anchored it was placed in the wrong category.

In traditional DDD, this mistake is self-correcting. A human modeler who classifies a reactive behavior as a static constraint will feel the friction at implementation time. They'll try to write the code and something won't fit. The Invariant doesn't express what they meant. The code becomes awkward. The feedback loop is the implementation itself — the discomfort of writing code that fights the model is the signal that the model is wrong.

Evans never had to name this problem because the modeling medium — human conversation, whiteboard sessions, iterative refinement through code — naturally surfaces it. The human modeler carries the semantic intent in their head. When the model diverges from that intent, they feel it. They may not articulate it as "I misclassified a Policy as an Invariant," but they'll say "this doesn't feel right" and restructure.

AI doesn't feel anything.

When an AI classifies a concept into a building block type, it commits with confidence. The classification is linguistically coherent. The output is well-structured. And the moment it enters your domain model, the original semantic intent — the reactive behavior, the temporal causality, the "when X happens, do Y" — is gone. What remains is a structurally valid Invariant with no trace of what it was supposed to be.

The feedback loop is broken. Not degraded, not delayed — broken. Because every downstream check evaluates the classified output, not the original intent. Structural analysis sees a valid Invariant. Heuristic evaluation sees healthy ratios. The gap report confirms convergence. The system has produced an internally consistent model built on a wrong foundation, and nothing in the verification pipeline can see it.

This is what makes the Classification Gap different from every other modeling error. Missing elements get caught by structural completeness checks. Threshold violations get caught by heuristic analysis. Naming inconsistencies get caught by convention rules. Every other category of error leaves a visible trace — a gap in the report, a metric outside bounds, a rule that fires.

The Classification Gap leaves no trace because the model is complete. The error isn't in what's missing. It's in what's present but miscategorized.

And the most susceptible boundary — Policy versus Invariant — is also the most architecturally consequential. It's the boundary between time and space in your domain model. Between something that reacts to events and something that constrains state. Get it wrong and you don't just have a cosmetic error in your specification. You have a model that will produce an architecture without the reactive paths your domain requires. The event flows won't be designed because the concepts that anchor them were never modeled as reactive. The sagas won't be triggered because the policies that initiate them don't exist.

The system will work. It will pass tests. And it will be fundamentally wrong about how the domain behaves over time.

This isn't just a Policy-versus-Invariant problem, though that's the sharpest edge. Every classification boundary in DDD carries some version of this risk.

A Command misclassified as a Domain Event inverts the causal direction — intentions become facts, and the model's sense of what requests action versus what records completion flips. An Aggregate misclassified as a Domain Service loses its state boundary — the invariant enforcement surface disappears, and the consistency guarantee with it. A Policy over-promoted to a Saga gains compensation logic it doesn't need, adding architectural complexity for behavior that should be fire-and-forget.

Each of these is structurally valid. Each passes every check. Each produces a different architecture than the domain actually requires.

The common thread is that building block type is a semantic decision, not a structural one. It encodes what a concept does in the domain — how it relates to events, state, time, and causality. Structural analysis can verify that the pieces fit together. It cannot verify that the pieces are the right kind.

If you're using AI for domain modeling today — even as a thinking partner, even just for brainstorming bounded contexts or sketching event flows — the Classification Gap is already in your process. The question isn't whether it will happen. The question is whether you'll catch it before the architecture hardens around it.

Traditional DDD never needed a name for this because the detection mechanism was embedded in the human modeler's discomfort. AI-mediated DDD needs the name because the discomfort doesn't exist. The model looks clean. The verification passes. The architecture proceeds.

And somewhere in your specification, a behavior that should react to events is sitting quietly as a constraint, waiting for someone to notice that the system doesn't do what the domain requires.

That's the Classification Gap. The bug that passes every test.

Next: The architecture that emerged from building detection for problems like this was deeper than anything a team would have produced. Not because AI is smarter — but because it doesn't let you skip the hard parts.

This is Post 7 of a 17-post series on Signal-Driven Development — a solo-practitioner DDD methodology built with AI. Post 6: The Candidate Lifecycle

Filed under DDD software architecture #Domain-Driven-Design AI Domain Modeling SDD