Work Vision-Language · HIPAA · proprietary
Florence-2 DICOM Anonymizer
A vision-language model that finds and redacts burned-in PHI from DICOM pixel data on-premise — a novel application of Florence-2 with no published prior art.
6 vendors
manufacturer-specific PHI guidance
~95%
compute cut · middle-slice analysis
1–3s
per image · ~2 GB VRAM
Problem
Patient identifiers are often burned into the image pixels — not just the DICOM tags. To share studies for research or clinical trials under HIPAA, that text has to be detected and removed, on-premise, without shipping images to a cloud API. There was no published prior art for using Florence-2 to do it.
Architecture
DICOM I/O → Florence-2 OCR → multi-layer PHI classification → manufacturer guidance → pixel redaction.
- Middle-slice analysis: burned-in overlays are near-identical across a series, so only the middle slice runs through Florence-2’s
<OCR_WITH_REGION>task; detected regions become a template applied to every slice. That alone cuts inference compute ~95%. - Multi-layer PHI detection: a minimum-length filter, a technical-term exclusion list, cross-referencing OCR text against the actual DICOM
PatientName/ID/DOB/AccessionNumber, spatial header analysis of the top band, explicit keyword matching, and a manufacturer-guidance boost. - Manufacturer guidance: a vendor/modality lookup with priority scoring focuses detection where ATL, Philips, GE, Siemens, Canon, or Mindray devices place their overlays.
Redaction is a padded black rectangle, with DICOM decompression/recompression on write and a side-by-side debug mode.
Results
1–3 seconds per image on ~2 GB of VRAM, with CPU / CUDA / MPS support. Deployed as a layer of the production anonymizer stack.
Impact
On-premise, vendor-aware pixel de-identification that keeps PHI inside the hospital — turning a frontier vision-language model into a compliance tool.