Vision-LLMs transform medical fax data extraction
Robert McDermott of the Fred Hutch Cancer Center has unveiled a sophisticated Proof of Concept (PoC) designed to automate the processing of thousands of medical faxes through Vision-LLMs. By converting chaotic scanned documents into structured Markdown, the system enables precise data extraction into JSON format for electronic health records. This strategic shift from manual entry to AI-driven pipelines promises to enhance clinical efficiency while maintaining strict on-premises data security.
Points clés
- Robert McDermott, Principal Architect at Fred Hutch Cancer Center, presented the “AI Powered Document Processing & Data Extraction” initiative on November 20, 2025.
- PDFs in clinical settings often consist of image-only text from faxes, requiring Vision-Language Models (V-LLMs) rather than standard text extraction.
- The presentation critiques Tesseract OCR for losing document layout, favoring Vision-LLM extraction to preserve formatting via Markdown.
- McDermott developed Doc2MD, an open-source utility designed to solve the “jumbled text” problem inherent in traditional OCR.
- DeepSeek-OCR is highlighted for its efficiency, offering a 7-20x reduction in tokens by using optical context compression.
- The Internally Ordered External Results Delivery (IOERD) PoC automates the intake of thousands of eFaxes for clinic staff.
- The pipeline utilizes the Qwen 2.5-VL model to convert images to Markdown and the Granite 3.3 model to generate structured JSON data.
- Extracted JSON schema includes critical patient identifiers: name, date of birth, collection dates, and performing laboratory.
- IOERD features a web-based UI and an API to allow staff to search, edit, and export processed records into the Epic EHR system.
- The system is designed for on-premises hosting to ensure Fred Hutch maintains full data control without needing external BAAs.
À retenir
So, it turns out that in 2025, the cutting edge of medical technology is finally figuring out how to read a fax machine—a piece of hardware most people haven’t seen since the Macarena was a hit. By throwing massive LLMs at the problem, we can finally stop paying humans to squint at blurry scans and let the robots hallucinate the patient’s name instead. It’s a brilliant plan: keep everything on-premises so the AI’s mistakes stay “in the family” and away from those pesky external auditors. Progress!
Sources
Quiz sur le document: 10 questions






