Skip to content
volko

Work Vision-Language · HIPAA · proprietary

Florence-2 DICOM Anonymizer

A vision-language model that finds and redacts burned-in PHI from DICOM pixel data on-premise — a novel application of Florence-2 with no published prior art.

6 vendors

manufacturer-specific PHI guidance

~95%

compute cut · middle-slice analysis

1–3s

per image · ~2 GB VRAM

Problem

Patient identifiers are often burned into the image pixels — not just the DICOM tags. To share studies for research or clinical trials under HIPAA, that text has to be detected and removed, on-premise, without shipping images to a cloud API. There was no published prior art for using Florence-2 to do it.

Architecture

DICOM I/O → Florence-2 OCR → multi-layer PHI classification → manufacturer guidance → pixel redaction.

  • Middle-slice analysis: burned-in overlays are near-identical across a series, so only the middle slice runs through Florence-2’s <OCR_WITH_REGION> task; detected regions become a template applied to every slice. That alone cuts inference compute ~95%.
  • Multi-layer PHI detection: a minimum-length filter, a technical-term exclusion list, cross-referencing OCR text against the actual DICOM PatientName / ID / DOB / AccessionNumber, spatial header analysis of the top band, explicit keyword matching, and a manufacturer-guidance boost.
  • Manufacturer guidance: a vendor/modality lookup with priority scoring focuses detection where ATL, Philips, GE, Siemens, Canon, or Mindray devices place their overlays.

Redaction is a padded black rectangle, with DICOM decompression/recompression on write and a side-by-side debug mode.

Results

1–3 seconds per image on ~2 GB of VRAM, with CPU / CUDA / MPS support. Deployed as a layer of the production anonymizer stack.

Impact

On-premise, vendor-aware pixel de-identification that keeps PHI inside the hospital — turning a frontier vision-language model into a compliance tool.