The Semantic Validation Loop

Home / Artificial Intelligence (AI) Trends /

May 12, 2026 11:05 am

In an earlier piece (here) I explained a principle I work by: when mixing local and frontier AI in consulting work, the data stays local, but the abstractions can travel. Raw client information is processed by models you control. Patterns and architectures are discussed with frontier models. The boundary is enforced by tooling, not by habits. That’s the principle. This piece focuses what makes it actually work.

There’s a problem I encountered within the initial working principle. If frontier models can’t see your raw data, you can’t ask them to “check” anything. You can’t paste the extracted findings into Claude and ask “does this look right?” – because the extracted findings are derived from the data you can’t share. The single most useful thing a frontier model could do for you – sanity-check the local model’s work – is exactly the thing the architecture forbids.

You could shrug and accept lower quality to maintain compliance, or – you change the role the frontier model plays. It stops being a reviewer and starts being an architect.

“The single most useful thing a frontier model could do for you – sanity-check the local model’s work – is exactly the thing the architecture prohibits.”

Architect and Analyst

Think of the relationship as a senior-and-junior pairing in any consulting team. The senior partner writes the brief, defines what good looks like, and reviews the methodology. The junior analyst gathers and processes the data, runs the tests, and produces the findings. The senior never personally touches the raw inputs. They don’t need to. Their value is in the framework, not in crunching the numbers.

The Pairing

In a local-plus-frontier setup, the frontier model is the senior. It defines success criteria, generates validation schemas, and reasons about edge cases – all in the abstract. The local model is the analyst. It does the actual work on the actual data and applies the senior’s framework. The findings live entirely on the local side with the methodology shaped by the cloud side.

This approach introduces a four-step loop that, in practice, produces materially more trustworthy AI-extracted findings than either model could manage alone.

The Four-Step Loop

Abstract the Ruleset

Before pointing any model at client data, describe the type of work to the frontier model and ask it to anticipate failure. The frontier model never sees the raw data. It uses its general knowledge of domain structures, conventions, and known LLM failure patterns to produce a validation framework.

“I’m using a local LLM to extract X from a facility management contract. What are the five most common logical errors an AI makes in this specific extraction task? Generate a JSON schema for the expected output structure, and list the semantic checks I should run on the extracted data – for instance, plausibility ranges, internal consistency, common terminology confusions.”

Typical output: a structured JSON schema, a list of seven or eight known confusion patterns – “fixed monthly fees mistaken for reactive maintenance caps,” “annual figures double-counted when contracts span multiple years,” “PPM line items pulled into reactive totals” – and a set of plausibility ranges. This output is now your specification. It’s portable, reusable across engagements, and improves over time.

Execute the Task

The local model – Qwen, Llama, DeepSeek, whatever’s strongest on your hardware – does the actual extraction on the actual contracts. The frontier model’s schema is passed in as part of the local prompt, constraining the output structure. Real client data, real findings, all on your infrastructure.

The output isn’t a free-form summary. It’s a structured artefact – usually JSON or a CSV – that conforms to the schema defined in step one. This matters, why, because the next step depends on it.

2026 Enhancement

Constraint-based decoding using libraries like Outlines, Guidance, or the structured output features built into modern inference engines. These tools enforce the schema at the token level – the local model literally cannot generate output that violates the structure the frontier model designed. The architectural separation becomes mechanical, not aspirational.

The Local Critique

This is the step most teams skip, and it’s the one that does the heavy lifting. You run a second pass over the extracted findings, using the local model in a different role. Instead of “extract this,” the prompt is “critique this.”

“Here is an extraction produced by an earlier analysis pass. Using the following list of common errors – [the seven failure patterns from the frontier model] – review each entry and flag anything that looks suspicious. For each flag, explain which error pattern you suspect and what evidence in the source text supports your concern.”

This is essentially using “LLM-as-a-judge” applied locally. The model is now optimising for finding mistakes rather than for producing answers, which empirically catches a meaningful proportion of extraction errors. It happens entirely on your hardware, so the raw data and the critique both stay private.

Critically, both the analyst pass and the judge pass can use the same model with different prompts. Or you can use a smaller, faster model as the analyst and a larger, smarter one as the judge. The point is the separation of roles.

The Anonymised Escalation

Sometimes the local critique finds something genuinely puzzling – a contradiction the model can flag but can’t resolve, the kind of finding you’d take to a senior colleague.

You can still do this, while preserving the boundary. You bring the conceptual conflict to the frontier model, not the data:

“In a soft FM cleaning contract, the specification states ‘quarterly deep cleans’ for a defined set of areas. The recorded delivery count over the year is materially higher than four times that area count. What are the plausible explanations for this kind of discrepancy in practice – periodic re-cleans, mid-year scope additions, mis-categorisation of routine work as deep cleans, double-invoicing – and what evidence in the contract documentation would distinguish between them?”

No client name, no supplier name, no actual figures – only the shape of the discrepancy, no contract identifiers. The answer comes back as logic and methodology, which you then apply locally to the real findings.

“The judging framework is designed in the cloud. The actual judgement happens locally.”

The Discipline This Produces

Practised consistently, this loop changes the character of AI-assisted consulting work. Three effects stand out.

The output structure becomes auditable. Because the schema was defined upfront in step one, every extracted finding has a known shape. You can write deterministic code that validates the structure before any human looks at it – schema validation in code is absolute, free, and runs in milliseconds. Hallucinated fields, missing required values, type mismatches: all caught automatically.
The failure modes become explicit. The frontier model’s anticipated error list does more than just guide the critique pass, it becomes auditable documentation. When explaining your methodology to a client, you can show them the specific failure patterns you screened for and the evidence that each was addressed. This turns “we used AI” from a worrying admission into a credibility-builder.
The senior judgement is captured and reused. The frontier model’s specification from step one is a portable artefact. Refine it once for “FM contract extraction,” and it serves every subsequent FM engagement. Build a small library of these across your practice areas – document analysis, data analytics and reporting, contract review – and you’re effectively encoding your analytical standards in a form that any local model can apply consistently.

A Worked Summary

Step	Where It Happens	Touches Client Data?	Purpose
1. Abstract the Ruleset	Frontier / Cloud	Nothing	Generate validation schema and known-error list
2. Execute the Task	Local	Raw documents	Extract structured findings
3. Local Critique	Local	Findings + raw data	Catch contradictions and known failure patterns
4. Anonymised Escalation	Frontier / Cloud	Abstracted conflict only	Resolve conceptually difficult findings

Privacy is preserved throughout. Quality is materially improved over a single-pass extraction. Methodology becomes explicit and defensible. The principle of data stays local, abstractions travel becomes more than just a slogan — it’s your operational gold standard.

The Honest Limitations

There is a catch though. The loop adds time and tokens to every analysis. Step one is a one-off cost per engagement type – amortised over many runs. However steps two and three roughly double your local inference time. For straightforward extractions on well-structured data, it can be overkill.

Where It’s Worth It

For the consulting work where AI assistance is most useful – messy, sensitive, high-stakes documents – the additional rigour is worth the cost many times over. The frontier model’s anticipated error list is also only as good as its understanding of your domain. For unusual specialisations, the first schema will miss things that domain experience would catch. That’s a feature of the iterative practice: each engagement teaches you what to add for the next, and so on.

Summary

What’s now happening in this loop is that AI assistance has been broken into two distinct roles – generating answers and checking answers – with each role assigned to the tool best suited to it, given the privacy constraints.

This approach is almost automatic when it comes to human teams. The senior writes the framework, the junior does the work, the senior reviews the methodology, the junior applies the corrections. The structure is so familiar nobody named it. When AI tools entered the workflow, most practices collapsed those distinct roles into a single ChatGPT tab, lost the separation, and accepted whatever came out.

“The semantic validation loop is, in a sense, just the rediscovery of how good consulting teams already worked – translated into the new medium.”

The principle of data stays local, abstractions travel makes the privacy boundary explicit. The validation loop makes the quality discipline explicit. Together they describe a way of working that’s neither AI-naive nor AI-credulous – and which is, I think, where serious consulting practice is heading regardless of which models happen to lead the field this quarter.

Applied AI Data Privacy Consulting Method Local LLMs Validation FM Contracts

Nick M — Technology Consultant Nick is a UK-based technology consultant specialising in data engineering, business intelligence, and applied AI for SMB and enterprise clients. This piece is a companion to The Data Stays Local, But the Abstractions Can Travel. — digirelevance.com

Tagged:AI featured post Opinion Vibe coding

Nick Marfleet

Architect and Analyst