Skip to content
volko

Work Speech · On-Prem LLM · proprietary

Radiology Language Action Model

Real-time radiology dictation with a from-scratch German medical text corrector — dual-Whisper transcription, structured LLM reports, fully local inference.

122K

German medical terms · 52 contexts

~70%

false-positive reduction · semantic gate

real-time

dual-Whisper over WebSocket

Problem

Radiologists dictate; off-the-shelf ASR mangles medical German. A usable dictation system needs low-latency transcription, correction that understands clinical terminology, and structured report output — running locally, not in the cloud.

Architecture

An asyncio WebSocket server with four components: audio processor, medical corrector, report generator, and a voice-command processor.

  • Dual Whisper: a tiny model for live preview, large-v2 for final sentences, fed 16 kHz audio over binary WebSocket frames.
  • Medical corrector (122K terms, 52 contexts): SymSpell fuzzy matching over pre-computed deletion variants, Cologne phonetic codes for German sound-alikes, and a composite score weighting context frequency, global frequency, edit distance, phonetics, and co-occurrence — gated by a noun-only medical stem list.
  • Semantic validation: EmbeddingGemma 300M scores sentence- and word-level cosine similarity between original and corrected text; corrections only land above threshold, cutting false positives ~70%.
  • Report generation: an LLM (via OpenRouter) structures the dictation into TECHNIQUE / FINDINGS / IMPRESSION / RECOMMENDATIONS, with 32 spoken punctuation and heading commands.

Results

Real-time German radiology dictation with domain-aware correction, packaged in CUDA and Apple-Silicon containers and deployed on the GPU server.

Impact

A dictation pipeline that actually speaks radiology — and never sends a patient’s voice or report off-site.