Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories

Emmanuel A. Olowe — University of Edinburgh

Routiium + EduRouter

The Setting

A remote lab failure is both a learning problem and a support problem.

Sarah in a Remote Lab

"My oscilloscope shows a flat line." The hardware is real, the deadline is tonight, and no instructor is online.

  • She is not debugging text only
  • She may need waveform interpretation, instrument state, and SCPI troubleshooting
  • If help arrives too late, the experiment stalls
Core question: How do we give fast help that is useful enough for the lab, but not more expensive or more revealing than it needs to be?

Why This Gets Hard at Scale

1
Remote labs generate late-night troubleshooting demand, not just factual questions
2
One instructor may support hundreds of students across many repeated interactions
3
Some requests need premium reasoning, but many do not
4
So the real task is not “pick the best model”, but “pick the right level of help”

The Dilemma

A single-model strategy fails in both directions.

If everything goes to the cheapest tier

"Phase shift occurs in circuits. Check your connections."

  • Cheap to run
  • But weak on the cases students are actually blocked by
If everything goes to the premium tier

"Given your measured phase of −45° at 10 kHz, τ = RC ≈ 15.9 µs..."

  • Usually strong enough
  • But unnecessarily expensive for clarifications and routine checks

Why Routing Exists

A student query is not just a request for an answer. It is also a request for a particular kind of help.
1
Which model route is appropriate for this turn
2
What kind of help the student should receive
3
Whether an overlay, guardrail, or escalation rule should apply

Key Insight:

Determine the right kind of help, then deliver it through the right tier and overlay.

Security & Control

What is an LLM Proxy?

Dr. Chen's Nightmare

"Give students OpenAI API keys? One leaked key = $10,000 bill"

  • Need per-student access control
  • Need rate limits
  • Need cost tracking
  • Need audit logs
Solution: Routiium sits between students and AI providers — like a corporate firewall for AI
Student Student Student Routiium Proxy Layer AI Providers OpenAI Anthropic Bedrock Opaque tokens Route Real API keys 🔒

What You Get

1
Per-student rate limits
2
Token-by-token tracking
3
Provider keys stay server-side
4
Multi-provider routing

Operational Layer

Routiium Changes What the Model Sees Before the Model Responds.

Student request "My oscilloscope shows a flat line" one prompt from the user raw request Routiium server-side request shaping system prompt injection tool injection provider + auth policy hidden provider call Provider request hidden payload system socratic tutor tools retrieval, scpi_help provider keys hidden Same user query, different model behavior Routiium can prepend tutor instructions, attach tools for agentic RAG, and keep that wiring server-side without changing student-facing code.

Three Server-Side Levers

1
Inject hidden system prompts to set tone, pedagogy, and institutional policy
2
Inject tools for retrieval, lab manuals, SCPI helpers, or agentic RAG workflows
3
Swap providers and capabilities without changing the student client

Why This Matters for Teaching

  • The same user query can behave like a tutor, assessor, or troubleshooting assistant
  • Tool access can be added only when the policy allows it
  • Prompts, tools, and credentials stay hidden and auditable on the server
Key point: Routiium does more than forward traffic. It shapes the request so the downstream model behaves appropriately before EduRouter even selects the final route.

Educational Foundation

Scaffolding & the Zone of Proximal Development

Vygotsky's ZPD (1978) What the student CAN do ALONE Basic questions, simple clarifications Local ZONE OF PROXIMAL DEVELOPMENT Learning happens HERE — with guided support Scaffolding: hints, questions, worked examples Premium What the student CANNOT do (yet) Too advanced — would cause frustration
Sarah's Learning Moment

Sarah's struggling with phase shifts in her circuit.

  • Too easy: Direct answer — she learns nothing
  • Too hard: Overwhelming explanation — she gives up
  • Sweet spot (ZPD): Guided questions with hints

How EduRouter Implements ZPD

Student StateModelOverlay
Outside ZPD (basic)Local modelDirect answer
Within ZPDPolicy-selectedHint + follow-up
ZPD ceilingPremium modelSocratic questioning

Decision happens in milliseconds — student never sees the machinery

Intent Matching

What Are Semantic Embeddings?

Text → Vector → Similarity "Prove this algebra" identity [0.82, -0.14, 0.67, ...] 384 dimensions "Walk me through" this proof [0.79, -0.11, 0.71, ...] 384 dimensions 93% similar Same routing decision Canonical: math_proof → Route to the matching tier "Tell me a joke" Different vector space → Route to the local model Similar meaning Same cluster

Why Embeddings Beat Keywords

  • Keywords miss intent: "help me" could mean anything
  • Embeddings capture meaning, not just words
  • Works across paraphrases, typos, domain jargon

Canonical Task Matching

Pre-computed embeddings for 89 representative tasks

1
Student prompt arrives
2
Embed using fastembed (ONNX)
3
Dot-product similarity <1ms
4
Match to preferred model tier
τ threshold: Similarity ≥ 0.82 → trust canonical match. Below → fall back to policy scoring.

Routing Brain

EduRouter Decides Tier, Overlay, and Escalation.

RouteRequest from Routiium alias / role privacy mode capability hints turn context EduRouter policy-governed route planning embedding hint policy scoring governance checks RoutePlan returned to Routiium selected tier overlay id fallback chain stickiness token policy metadata EduRouter does not answer the student directly. It returns a governed plan for Routiium to execute.

What EduRouter Looks At

  • semantic similarity to canonical tasks
  • cost, latency, health, and context-fit weights
  • budget caps, escalation triggers, and policy rules

What EduRouter Returns

1
model tier recommendation
2
pedagogical overlay and escalation decision
3
cacheable route plan with transparency metadata
Key point: EduRouter is the decision engine. It turns semantic and policy signals into an executable tutoring plan.

Architecture

Routiium Enforces Control. EduRouter Chooses the Route.

Student Off-campus browser LAMB lab ingress device + SCPI remote /chat Routiium :8088 · Apache-2.0 Auth · Rate Limits Analytics · Injection /route/plan EduRouter :9099 Policy Scoring Embeddings · Cache routing plan Routiium executes the plan against the selected tier Local model gpt-oss-20b $0.00 Premium model GPT-5 Mini $0.25 / $2.00 Contribution 1 Contribution 2 Evaluated replay: 75% local · 25% premium

Routiium Features

  • Opaque tokens per student
  • Multi-backend routing
  • Transparent analytics
  • Hot-reloadable config

EduRouter Features

  • Policy scoring engine
  • Embedding intent router
  • Plan cache (15s TTL)
  • Governance controls

Evaluated Replay Setup

  • Local: gpt-oss-20b (local, $0)
  • Premium: GPT-5 Mini
  • 75% routed to the local model in the replay
  • 25% routed to the premium model
  • 66% cost reduction

Decision Engine

Policy-Driven Scoring: How Models Are Chosen

Scoring Formula

score = 0.40 × (1 − costnorm)
+ 0.25 × (1 − latencynorm)
+ 0.20 × healthscore
+ 0.15 × contextfit
+ tierbonus

All weights configurable in policy.json — hot-reloaded without restart

Weight Rationale

Cost 40% — primary goal is saving money

Latency 25% — students notice slowness

Health 20% — auto-deprioritize failing models

Evaluated Replay Backends

RouteModelCostUsed For
Local gpt-oss-20b $0.00 Simple Q&A
Premium GPT-5 Mini $0.25/$2 Escalated reasoning
Worked Examples

"What is a resistor?"

→ local model wins (short, no images, simple)

"12,000-token circuit analysis + image + SCPI error"

→ premium model wins (large context, high reasoning)

Equity & Safety

Governance: Escalation, Budget Caps & Privacy

Per-request routing under a governed policy
"Please give the full solution"L3 · complete solution
L3 requested?Complete solution
YES
Approval required Complete-solution requests are explicitly gated Budget cap applies before escalation → premium tier only after approval
NO ↓
"Show me a worked example"L2 · worked example
L2 requested?Worked example fragment
YES
Worked example with guardrails Throttle repeated requests, keep scaffolding on Guided overlay injected first → local or premium by policy score
NO ↓
"SCPI error −222"instrument fault
SCPI error detected?scpi_retries > 0 in telemetry
YES
Troubleshoot-first overlay Triage checklist appears before explanation SCPI context and retries logged → premium escalation when needed
NO ↓
"What is voltage?"L0 · basic query
Canonical match?τ = 0.82 · 89-entry library
Embedding-scored routing Default to the lowest-cost tier that still fits the task 75% routed to the local model in the benchmark replay bge-small-en-v1.5 embeddings · τ = 0.82

Three Guardrails

GuardrailPurpose
Budget caps Stop a small number of students from consuming premium capacity
Escalation rules Promote hard troubleshooting and high-effort tasks at the right moment
Human override Allow instructors to approve or block high-cost help explicitly
Pedagogical Overlays
  • L1: guided troubleshooting and validation
  • L2: worked-example fragments with scaffolding
  • L3: complete-solution requests require approval
  • Students see help that matches the teaching policy, not raw model behavior

Privacy & Audit

  • privacy_mode = features_only keeps routing decisions on metadata when needed
  • Student content can stay on the Routiium side of the stack
  • Analytics and overlay fingerprints are stored for local audit
  • Routing headers expose why a decision was made without exposing provider keys

Sarah's Request Journey

Request Lifecycle: How One Query Moves Through the Stack

Sarah · 11:47 PM · rc_step lab — "My oscilloscope shows a flat line" → SCPI error −222 detected → GPT-5 Mini + socratic_troubleshoot → triage checklist in 21.7 s
Student Client
Routiium :8088
EduRouter :9099
LLM Backend
1
POST /v1/chat/completions model="rc_step_tutor" · stream=true
← receives request
2
Auth + rate limit + system prompt token verify <1 µs · rate check <1 ms
3
POST /route/plan →
RouteRequest (privacy=features_only, turns=1)
Alias resolve rc_step_tutor → candidates
4
Cache MISS · Stickiness MISS New session — no plan_token present
5
Escalation detect SCPI error −222 in context → GPT-5 Mini path escalation_regex match
6
Embed query (4.7 ms avg) + score τ=0.82 · canonical match · socratic_troubleshoot overlay selected
7
← RoutePlan
model_id, overlay, plan_token, TTL
Returns plan →
route_id, x-policy-rev
8
Inject socratic_troubleshoot overlay Prepend to system prompt → forward to GPT-5 Mini
GPT-5 Mini streams TTFT 222 ms
9
Triage checklist ← 21.7 s x-route-id · x-policy-rev · x-resolved-model
Log analytics tokens · cost · route_id · scpi_retries · duration_ms

Plan Latency

PathPlan latencyWhat the student notices
Cache hit <10 ms P95 Usually negligible
Cache miss ~82 ms CPU Usually hidden by provider latency

End-to-End Latency

Routed (EduRouter) — 21.7 s avg

GPT-5 Mini direct — 23.2 s avg

gpt-oss-20b direct — 26.1 s avg

EduRouter steers 75% to local tier — fastest end-to-end despite local model's higher raw latency.

Sarah's outcome:

SCPI escalation path → GPT-5 Mini + socratic_troubleshoot. Sarah receives a triage checklist, confirms probe settings, corrects her measurement, and submits on time.

Evaluation

Evaluation Setup and Challenge Alignment Index

0.90
Policy-only (baseline)
0.98
Policy + Embeddings
+8 percentage points improvement with semantic understanding

Evaluation Setup

  • 100-query replay: 60 RC-step + 40 LED I-V prompts
  • Replay paths: direct GPT-5 Mini, direct gpt-oss-20b, and governed routing through Routiium + EduRouter
  • CAI combines tier compliance, overlay correctness, escalation accuracy, and fallback validity
  • Scores calibrated against expert hand-labels

Where Embeddings Help Most

PromptKeyword routeEmbedding route
"Help me work through this" Too generic Matches proof-style help
"I'm lost with phase measurement" Weak escalation signal Promotes troubleshooting path
"Quick check — is this normal?" Over-escalates Stays on cheaper tier

Results

Results: Lower Cost with Minimal Added Latency

Benchmark Cost per Request

$0.26
All premium
$0.087
EduRouter
66%

Cost Reduction

Replay Routing Distribution

Local model — 75% · simple chat, basic Q&A

Premium model — 25% · escalated or higher-demand cases

Latency Performance

<10ms
P95 cache hit
0.95
Cache hit rate
~82ms
Cache miss (CPU)
Operational Takeaway

The routed system does not flatten everything to the cheapest model.

  • Most replayed turns stay on the local model
  • Harder or escalated cases move to GPT-5 Mini
  • That is how spend drops by 66% without sacrificing challenge alignment

Discussion

Limitations & Future Work

Current Limitations

  • Synthetic workload: Simulator traces, not live classroom data
  • CAI ≠ learning: We measure routing correctness, not learning outcomes
  • Language: English-only embedding model (bge-small-en-v1.5)
  • Canonical bank: 89 entries — production may need expansion
Honest scope: System performs well in simulation — next step is classroom pilot with RCT design to validate educational outcomes.

Near-Term Extensions

  • Classroom pilot with pre/post assessment
  • Feedback loop: /route/feedback auto-adjusts weights
  • Multi-provider benchmarks (Claude Haiku vs gpt-oss-20b)
  • Dynamic canonical reload without restart

Research Opportunities

  • Adaptive overlay selection: ZPD-aware mid-session switches
  • LLM-assisted canonical generation
  • Multilingual embedding models
  • Learning analytics integration

Summary

Contributions & Impact

Contribution 1: Routiium

  • Self-hosted, Apache-2.0 LLM reverse proxy
  • Per-student opaque tokens — provider keys stay server-side
  • Multi-backend: OpenAI, Anthropic, Bedrock, vLLM, Ollama
  • Transparent prompt injection, rate limiting, cost analytics

Contribution 2: EduRouter

  • Policy-governed, Apache-2.0-licensed routing engine
  • Selects model tier, overlay, and escalation path together
  • Embedding-based intent matching (canonical tasks)
  • ZPD-aligned pedagogical overlay injection
  • Budget caps, escalation, equity controls

Key Results

66%
Cost reduction
0.98
CAI on replay workload
<10ms
P95 latency
Sarah at 2 AM — The Happy Ending

"My oscilloscope shows a flat line"

  • System recognizes: troubleshooting intent
  • Routed to the premium model with troubleshoot-first overlay
  • Gets triage checklist, confirms understanding
  • Lab submitted on time. Actually learned something.
Open Source · Available Now

Thank You

Infrastructure-level routing intelligence — grounded in scaffolding theory, embedding-based intent detection, and declarative governance — makes AI tutoring simultaneously cheaper, more pedagogically appropriate, and more equitable.

Apache-2.0 Apache-2.0

Emmanuel A. Olowe · University of Edinburgh
e.a.olowe@sms.ed.ac.uk
IEEE EDUCON 2026