Building a Governed AI Routing Pipeline for Regulated Industries

The Problem

Enterprise AI workflows in regulated industries need to answer three questions before a prompt reaches a model:

Does this contain sensitive data? If yes, detect and mask it before it leaves the perimeter.
How complex is this request? Simple FAQ → small local model. Clinical diagnosis → specialist cloud endpoint.
Can we prove every decision? Every routing choice, every PII detection, every model selection — auditable.

We built a pipeline that handles all three, tested across 200 prompts in 5 regulated verticals.

The Architecture

┌─────────────────────────────────────────┐
│  Stage 1: Deterministic Rules           │
│  PII regex, domain keywords, policy     │
│  Handles 92% of routing decisions       │
│  Latency: <1ms                          │
├─────────────────────────────────────────┤
│  Stage 2: SLM Classification            │
│  1.5B model for ambiguous cases only    │
│  Called on 8% of prompts                │
│  Latency: ~800ms per call               │
├─────────────────────────────────────────┤
│  Stage 3: Domain Policy Engine          │
│  Per-vertical config: keyword sets,     │
│  PII entity lists, sensitivity rules    │
│  Adding a vertical = adding a YAML file │
├─────────────────────────────────────────┤
│  Stage 4: PII Detection & Redaction     │
│  Presidio + spaCy NER + 5 custom        │
│  recognisers. 95% recall, CPU-only      │
├─────────────────────────────────────────┤
│  Stage 5: Governed Route Execution      │
│  Hallucination detection, action        │
│  verification, full audit trail         │
└─────────────────────────────────────────┘

Each stage is deterministic except Stage 2 — and Stage 2 is bounded. It classifies ambiguity. It doesn’t set policy. Its decisions are verified before execution.

What It Handles

Tested across 200 prompts in 5 regulated domains:

Domain	Regulation Context	What the Pipeline Manages
Healthcare	HIPAA	Patient names, DOB, medical licence numbers, clinical complexity routing
Finance	PCI DSS, Basel IV	Credit card numbers, SSNs, IBANs, risk analysis routing
Telecom	EU AI Act	Customer PII, network architecture queries, autonomous system classification
Legal	Attorney-client privilege	Names in discovery documents, contract analysis routing
Defence	Classified handling	Personnel clearance data, operational planning routing

Adding legal and defence to the pipeline required two new policy YAML files — keyword sets, PII entity lists, sensitivity rules. No model retraining. No prompt engineering. No GPU. Configuration, not computation.

Key Design Decisions

PII detection without LLMs

The PII layer uses Microsoft Presidio with spaCy NER and 5 custom pattern recognisers we built for domain-specific entities (date of birth, medical licence numbers, broad phone formats). 95% recall at sub-25ms latency on CPU.

We deliberately kept the LLM out of PII detection. PII classification is a policy question — “is a credit card number sensitive?” isn’t ambiguous to a human. It shouldn’t be delegated to a model that optimises for helpfulness over compliance.

Deterministic routing for 92% of decisions

Domain keywords, PII pattern matching, and length heuristics resolve 92% of prompts without touching a model. The remaining 8% — prompts where complexity is genuinely ambiguous — escalate to the SLM layer.

This ratio held consistently across all five verticals (90-92.5% per domain), even though legal and defence have fundamentally different vocabulary from the original three domains. Policy-driven routing scales with configuration.

Hallucination guards for tool execution

When we connected small models to production APIs via MCP, we found that 1.5B models fabricate action confirmations on multi-step workflows. The model reports “record deleted” without making a single API call.

The governance layer catches this: if a request routes to tools but zero tool calls are logged, the response is blocked. Tool call logs are ground truth, not model responses.

Per-domain policy packs

Each domain has different PII entities that matter. Medical licence numbers are irrelevant in telecom. IBANs are irrelevant in US healthcare. The pipeline loads domain-specific policy at runtime — which entities to detect, which keywords trigger premium routing, what sensitivity overrides apply.

This is the enterprise differentiator: a new regulated vertical is a configuration change, not an engineering project.

What This Looks Like in Production

Every routing decision produces an audit entry:

Which rule fired and at what confidence
Whether the SLM was called and what it returned
Which PII entities were detected and redacted
Final route decision and latency
Tool calls made (if any) vs tool calls claimed

The audit trail is what your compliance team reviews. It’s what your security team queries during an incident. Without it, you have routing. With it, you have governance.

The Stack

Routing: Python, deterministic rules engine + Ollama (qwen2.5:1.5b)
PII: Microsoft Presidio, spaCy NER (en_core_web_lg), 5 custom recognisers
Policy: YAML per domain, loaded at runtime
Audit: JSONL per request, structured for querying
Infrastructure: CPU-only, on-premise capable, no GPU required

Working on AI governance, PII compliance, or regulated AI deployment? Let’s talk →