← Back to blog
· From the Lab

Building a Governed AI Routing Pipeline for Regulated Industries

A layered routing architecture that handles PII detection, prompt classification, and model selection across healthcare, finance, telecom, legal, and defence — with deterministic governance at every decision point.

By Krunal Sabnis

The Problem

Enterprise AI workflows in regulated industries need to answer three questions before a prompt reaches a model:

  1. Does this contain sensitive data? If yes, detect and mask it before it leaves the perimeter.
  2. How complex is this request? Simple FAQ → small local model. Clinical diagnosis → specialist cloud endpoint.
  3. Can we prove every decision? Every routing choice, every PII detection, every model selection — auditable.

We built a pipeline that handles all three, tested across 200 prompts in 5 regulated verticals.

The Architecture

┌─────────────────────────────────────────┐
│  Stage 1: Deterministic Rules           │
│  PII regex, domain keywords, policy     │
│  Handles 92% of routing decisions       │
│  Latency: <1ms                          │
├─────────────────────────────────────────┤
│  Stage 2: SLM Classification            │
│  1.5B model for ambiguous cases only    │
│  Called on 8% of prompts                │
│  Latency: ~800ms per call               │
├─────────────────────────────────────────┤
│  Stage 3: Domain Policy Engine          │
│  Per-vertical config: keyword sets,     │
│  PII entity lists, sensitivity rules    │
│  Adding a vertical = adding a YAML file │
├─────────────────────────────────────────┤
│  Stage 4: PII Detection & Redaction     │
│  Presidio + spaCy NER + 5 custom        │
│  recognisers. 95% recall, CPU-only      │
├─────────────────────────────────────────┤
│  Stage 5: Governed Route Execution      │
│  Hallucination detection, action        │
│  verification, full audit trail         │
└─────────────────────────────────────────┘

Each stage is deterministic except Stage 2 — and Stage 2 is bounded. It classifies ambiguity. It doesn’t set policy. Its decisions are verified before execution.

What It Handles

Tested across 200 prompts in 5 regulated domains:

DomainRegulation ContextWhat the Pipeline Manages
HealthcareHIPAAPatient names, DOB, medical licence numbers, clinical complexity routing
FinancePCI DSS, Basel IVCredit card numbers, SSNs, IBANs, risk analysis routing
TelecomEU AI ActCustomer PII, network architecture queries, autonomous system classification
LegalAttorney-client privilegeNames in discovery documents, contract analysis routing
DefenceClassified handlingPersonnel clearance data, operational planning routing

Adding legal and defence to the pipeline required two new policy YAML files — keyword sets, PII entity lists, sensitivity rules. No model retraining. No prompt engineering. No GPU. Configuration, not computation.

Key Design Decisions

PII detection without LLMs

The PII layer uses Microsoft Presidio with spaCy NER and 5 custom pattern recognisers we built for domain-specific entities (date of birth, medical licence numbers, broad phone formats). 95% recall at sub-25ms latency on CPU.

We deliberately kept the LLM out of PII detection. PII classification is a policy question — “is a credit card number sensitive?” isn’t ambiguous to a human. It shouldn’t be delegated to a model that optimises for helpfulness over compliance.

Deterministic routing for 92% of decisions

Domain keywords, PII pattern matching, and length heuristics resolve 92% of prompts without touching a model. The remaining 8% — prompts where complexity is genuinely ambiguous — escalate to the SLM layer.

This ratio held consistently across all five verticals (90-92.5% per domain), even though legal and defence have fundamentally different vocabulary from the original three domains. Policy-driven routing scales with configuration.

Hallucination guards for tool execution

When we connected small models to production APIs via MCP, we found that 1.5B models fabricate action confirmations on multi-step workflows. The model reports “record deleted” without making a single API call.

The governance layer catches this: if a request routes to tools but zero tool calls are logged, the response is blocked. Tool call logs are ground truth, not model responses.

Per-domain policy packs

Each domain has different PII entities that matter. Medical licence numbers are irrelevant in telecom. IBANs are irrelevant in US healthcare. The pipeline loads domain-specific policy at runtime — which entities to detect, which keywords trigger premium routing, what sensitivity overrides apply.

This is the enterprise differentiator: a new regulated vertical is a configuration change, not an engineering project.

What This Looks Like in Production

Every routing decision produces an audit entry:

  • Which rule fired and at what confidence
  • Whether the SLM was called and what it returned
  • Which PII entities were detected and redacted
  • Final route decision and latency
  • Tool calls made (if any) vs tool calls claimed

The audit trail is what your compliance team reviews. It’s what your security team queries during an incident. Without it, you have routing. With it, you have governance.

The Stack

  • Routing: Python, deterministic rules engine + Ollama (qwen2.5:1.5b)
  • PII: Microsoft Presidio, spaCy NER (en_core_web_lg), 5 custom recognisers
  • Policy: YAML per domain, loaded at runtime
  • Audit: JSONL per request, structured for querying
  • Infrastructure: CPU-only, on-premise capable, no GPU required

Working on AI governance, PII compliance, or regulated AI deployment? Let’s talk →

ArchitectureGovernanceSLMPIIProductionEnterprise AI

Working on a similar challenge?

Let's talk