Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories

Emmanuel A. Olowe — University of Edinburgh

Routiium + EduRouter

The Setting

A remote lab failure is both a learning problem and a support problem.

Sarah in a Remote Lab

"My oscilloscope shows a flat line." The hardware is real, the deadline is tonight, and no instructor is online.

She is not debugging text only
She may need waveform interpretation, instrument state, and SCPI troubleshooting
If help arrives too late, the experiment stalls

Core question: How do we give fast help that is useful enough for the lab, but not more expensive or more revealing than it needs to be?

Why This Gets Hard at Scale

1

Remote labs generate late-night troubleshooting demand, not just factual questions

2

One instructor may support hundreds of students across many repeated interactions

3

Some requests need premium reasoning, but many do not

4

So the real task is not “pick the best model”, but “pick the right level of help”

The Dilemma

A single-model strategy fails in both directions.

If everything goes to the cheapest tier

"Phase shift occurs in circuits. Check your connections."

Cheap to run
But weak on the cases students are actually blocked by

If everything goes to the premium tier

"Given your measured phase of −45° at 10 kHz, τ = RC ≈ 15.9 µs..."

Usually strong enough
But unnecessarily expensive for clarifications and routine checks

Why Routing Exists

A student query is not just a request for an answer. It is also a request for a particular kind of help.

1

Which model route is appropriate for this turn

2

What kind of help the student should receive

3

Whether an overlay, guardrail, or escalation rule should apply

Key Insight:

Determine the right kind of help, then deliver it through the right tier and overlay.

Security & Control

What is an LLM Proxy?

Dr. Chen's Nightmare

"Give students OpenAI API keys? One leaked key = $10,000 bill"

Need per-student access control
Need rate limits
Need cost tracking
Need audit logs

Solution: Routiium sits between students and AI providers — like a corporate firewall for AI

What You Get

1

Per-student rate limits

2

Token-by-token tracking

3

Provider keys stay server-side

4

Multi-provider routing

Operational Layer

Routiium Changes What the Model Sees Before the Model Responds.

Three Server-Side Levers

1

Inject hidden system prompts to set tone, pedagogy, and institutional policy

2

Inject tools for retrieval, lab manuals, SCPI helpers, or agentic RAG workflows

3

Swap providers and capabilities without changing the student client

Why This Matters for Teaching

The same user query can behave like a tutor, assessor, or troubleshooting assistant
Tool access can be added only when the policy allows it
Prompts, tools, and credentials stay hidden and auditable on the server

Key point: Routiium does more than forward traffic. It shapes the request so the downstream model behaves appropriately before EduRouter even selects the final route.

Educational Foundation

Scaffolding & the Zone of Proximal Development

Sarah's Learning Moment

Sarah's struggling with phase shifts in her circuit.

Too easy: Direct answer — she learns nothing
Too hard: Overwhelming explanation — she gives up
Sweet spot (ZPD): Guided questions with hints

How EduRouter Implements ZPD

Student State	Model	Overlay
Outside ZPD (basic)	Local model	Direct answer
Within ZPD	Policy-selected	Hint + follow-up
ZPD ceiling	Premium model	Socratic questioning

Decision happens in milliseconds — student never sees the machinery

Intent Matching

What Are Semantic Embeddings?

Why Embeddings Beat Keywords

Keywords miss intent: "help me" could mean anything
Embeddings capture meaning, not just words
Works across paraphrases, typos, domain jargon

Canonical Task Matching

Pre-computed embeddings for 89 representative tasks

1

Student prompt arrives

2

Embed using fastembed (ONNX)

3

Dot-product similarity <1ms

4

Match to preferred model tier

τ threshold: Similarity ≥ 0.82 → trust canonical match. Below → fall back to policy scoring.

Routing Brain

EduRouter Decides Tier, Overlay, and Escalation.

What EduRouter Looks At

semantic similarity to canonical tasks
cost, latency, health, and context-fit weights
budget caps, escalation triggers, and policy rules

What EduRouter Returns

1

model tier recommendation

2

pedagogical overlay and escalation decision

3

cacheable route plan with transparency metadata

Key point: EduRouter is the decision engine. It turns semantic and policy signals into an executable tutoring plan.

Architecture

Routiium Enforces Control. EduRouter Chooses the Route.

Routiium Features

Opaque tokens per student
Multi-backend routing
Transparent analytics
Hot-reloadable config

EduRouter Features

Policy scoring engine
Embedding intent router
Plan cache (15s TTL)
Governance controls

Evaluated Replay Setup

Local: gpt-oss-20b (local, $0)
Premium: GPT-5 Mini
75% routed to the local model in the replay
25% routed to the premium model
66% cost reduction

Decision Engine

Policy-Driven Scoring: How Models Are Chosen

Scoring Formula

            score = 0.40 × (1 − costnorm)

            + 0.25 × (1 − latencynorm)

            + 0.20 × healthscore

            + 0.15 × contextfit

            + tierbonus

All weights configurable in policy.json — hot-reloaded without restart

Weight Rationale

Cost 40% — primary goal is saving money

Latency 25% — students notice slowness

Health 20% — auto-deprioritize failing models

Evaluated Replay Backends

Route	Model	Cost	Used For
Local	gpt-oss-20b	$0.00	Simple Q&A
Premium	GPT-5 Mini	$0.25/$2	Escalated reasoning

Worked Examples

"What is a resistor?"

→ local model wins (short, no images, simple)

"12,000-token circuit analysis + image + SCPI error"

→ premium model wins (large context, high reasoning)

Equity & Safety

Governance: Escalation, Budget Caps & Privacy

Per-request routing under a governed policy

"Please give the full solution"L3 · complete solution

L3 requested?Complete solution

YES→

Approval required Complete-solution requests are explicitly gated Budget cap applies before escalation → premium tier only after approval

NO ↓

"Show me a worked example"L2 · worked example

L2 requested?Worked example fragment

YES→

Worked example with guardrails Throttle repeated requests, keep scaffolding on Guided overlay injected first → local or premium by policy score

NO ↓

"SCPI error −222"instrument fault

SCPI error detected?scpi_retries > 0 in telemetry

YES→

Troubleshoot-first overlay Triage checklist appears before explanation SCPI context and retries logged → premium escalation when needed

NO ↓

"What is voltage?"L0 · basic query

Canonical match?τ = 0.82 · 89-entry library

→

Embedding-scored routing Default to the lowest-cost tier that still fits the task 75% routed to the local model in the benchmark replay bge-small-en-v1.5 embeddings · τ = 0.82

Three Guardrails

Guardrail	Purpose
Budget caps	Stop a small number of students from consuming premium capacity
Escalation rules	Promote hard troubleshooting and high-effort tasks at the right moment
Human override	Allow instructors to approve or block high-cost help explicitly

Pedagogical Overlays

L1: guided troubleshooting and validation
L2: worked-example fragments with scaffolding
L3: complete-solution requests require approval
Students see help that matches the teaching policy, not raw model behavior

Privacy & Audit

privacy_mode = features_only keeps routing decisions on metadata when needed
Student content can stay on the Routiium side of the stack
Analytics and overlay fingerprints are stored for local audit
Routing headers expose why a decision was made without exposing provider keys

Sarah's Request Journey

Request Lifecycle: How One Query Moves Through the Stack

Sarah · 11:47 PM · rc_step lab — "My oscilloscope shows a flat line" → SCPI error −222 detected → GPT-5 Mini + socratic_troubleshoot → triage checklist in 21.7 s

Student Client

Routiium :8088

EduRouter :9099

LLM Backend

1

POST /v1/chat/completions model="rc_step_tutor" · stream=true

← receives request

2

Auth + rate limit + system prompt token verify <1 µs · rate check <1 ms

3

POST /route/plan →
RouteRequest (privacy=features_only, turns=1)

Alias resolve rc_step_tutor → candidates

4

Cache MISS · Stickiness MISS New session — no plan_token present

5

Escalation detect SCPI error −222 in context → GPT-5 Mini path escalation_regex match

6

Embed query (4.7 ms avg) + score τ=0.82 · canonical match · socratic_troubleshoot overlay selected

7

← RoutePlan
model_id, overlay, plan_token, TTL

Returns plan →
route_id, x-policy-rev

8

Inject socratic_troubleshoot overlay Prepend to system prompt → forward to GPT-5 Mini

GPT-5 Mini streams TTFT 222 ms

9

Triage checklist ← 21.7 s x-route-id · x-policy-rev · x-resolved-model

Log analytics tokens · cost · route_id · scpi_retries · duration_ms

Plan Latency

Path	Plan latency	What the student notices
Cache hit	<10 ms P95	Usually negligible
Cache miss	~82 ms CPU	Usually hidden by provider latency

End-to-End Latency

Routed (EduRouter) — 21.7 s avg

GPT-5 Mini direct — 23.2 s avg

gpt-oss-20b direct — 26.1 s avg

EduRouter steers 75% to local tier — fastest end-to-end despite local model's higher raw latency.

Sarah's outcome:

SCPI escalation path → GPT-5 Mini + socratic_troubleshoot. Sarah receives a triage checklist, confirms probe settings, corrects her measurement, and submits on time.

Evaluation

Evaluation Setup and Challenge Alignment Index

0.90

Policy-only (baseline)

0.98

Policy + Embeddings

+8 percentage points improvement with semantic understanding

Evaluation Setup

100-query replay: 60 RC-step + 40 LED I-V prompts
Replay paths: direct GPT-5 Mini, direct gpt-oss-20b, and governed routing through Routiium + EduRouter
CAI combines tier compliance, overlay correctness, escalation accuracy, and fallback validity
Scores calibrated against expert hand-labels

Where Embeddings Help Most

Prompt	Keyword route	Embedding route
"Help me work through this"	Too generic	Matches proof-style help
"I'm lost with phase measurement"	Weak escalation signal	Promotes troubleshooting path
"Quick check — is this normal?"	Over-escalates	Stays on cheaper tier

Results

Results: Lower Cost with Minimal Added Latency

Benchmark Cost per Request

$0.26

All premium

→

$0.087

EduRouter

66%

Cost Reduction

Replay Routing Distribution

Local model — 75% · simple chat, basic Q&A

Premium model — 25% · escalated or higher-demand cases

Latency Performance

<10ms

P95 cache hit

0.95

Cache hit rate

~82ms

Cache miss (CPU)

Operational Takeaway

The routed system does not flatten everything to the cheapest model.

Most replayed turns stay on the local model
Harder or escalated cases move to GPT-5 Mini
That is how spend drops by 66% without sacrificing challenge alignment

Discussion

Limitations & Future Work

Current Limitations

Synthetic workload: Simulator traces, not live classroom data
CAI ≠ learning: We measure routing correctness, not learning outcomes
Language: English-only embedding model (bge-small-en-v1.5)
Canonical bank: 89 entries — production may need expansion

Honest scope: System performs well in simulation — next step is classroom pilot with RCT design to validate educational outcomes.

Near-Term Extensions

Classroom pilot with pre/post assessment
Feedback loop: /route/feedback auto-adjusts weights
Multi-provider benchmarks (Claude Haiku vs gpt-oss-20b)
Dynamic canonical reload without restart

Research Opportunities

Adaptive overlay selection: ZPD-aware mid-session switches
LLM-assisted canonical generation
Multilingual embedding models
Learning analytics integration

Summary

Contributions & Impact

Contribution 1: Routiium

Self-hosted, Apache-2.0 LLM reverse proxy
Per-student opaque tokens — provider keys stay server-side
Multi-backend: OpenAI, Anthropic, Bedrock, vLLM, Ollama
Transparent prompt injection, rate limiting, cost analytics

Contribution 2: EduRouter

Policy-governed, Apache-2.0-licensed routing engine
Selects model tier, overlay, and escalation path together
Embedding-based intent matching (canonical tasks)
ZPD-aligned pedagogical overlay injection
Budget caps, escalation, equity controls

Key Results

66%

Cost reduction

0.98

CAI on replay workload

<10ms

P95 latency

Sarah at 2 AM — The Happy Ending

"My oscilloscope shows a flat line"

System recognizes: troubleshooting intent
Routed to the premium model with troubleshoot-first overlay
Gets triage checklist, confirms understanding
Lab submitted on time. Actually learned something.

Open Source · Available Now

Thank You

Infrastructure-level routing intelligence — grounded in scaffolding theory, embedding-based intent detection, and declarative governance — makes AI tutoring simultaneously cheaper, more pedagogically appropriate, and more equitable.

labiium/routiium labiium/edurouter

Apache-2.0 Apache-2.0

Emmanuel A. Olowe · University of Edinburgh
e.a.olowe@sms.ed.ac.uk
IEEE EDUCON 2026