Work Speech · On-Prem LLM · proprietary
Radiology Language Action Model
Real-time radiology dictation with a from-scratch German medical text corrector — dual-Whisper transcription, structured LLM reports, fully local inference.
122K
German medical terms · 52 contexts
~70%
false-positive reduction · semantic gate
real-time
dual-Whisper over WebSocket
Problem
Radiologists dictate; off-the-shelf ASR mangles medical German. A usable dictation system needs low-latency transcription, correction that understands clinical terminology, and structured report output — running locally, not in the cloud.
Architecture
An asyncio WebSocket server with four components: audio processor, medical corrector, report generator, and a voice-command processor.
- Dual Whisper: a
tinymodel for live preview,large-v2for final sentences, fed 16 kHz audio over binary WebSocket frames. - Medical corrector (122K terms, 52 contexts): SymSpell fuzzy matching over pre-computed deletion variants, Cologne phonetic codes for German sound-alikes, and a composite score weighting context frequency, global frequency, edit distance, phonetics, and co-occurrence — gated by a noun-only medical stem list.
- Semantic validation: EmbeddingGemma 300M scores sentence- and word-level cosine similarity between original and corrected text; corrections only land above threshold, cutting false positives ~70%.
- Report generation: an LLM (via OpenRouter) structures the dictation into TECHNIQUE / FINDINGS / IMPRESSION / RECOMMENDATIONS, with 32 spoken punctuation and heading commands.
Results
Real-time German radiology dictation with domain-aware correction, packaged in CUDA and Apple-Silicon containers and deployed on the GPU server.
Impact
A dictation pipeline that actually speaks radiology — and never sends a patient’s voice or report off-site.