Researchers have introduced SHIELD, a new dataset comprising 1,394 clinical notes with over 10,000 identified Protected Health Information (PHI) spans. This dataset aims to address the limitations of older benchmarks by offering greater diversity in modern clinical narratives. The project also developed distilled Small Language Models (SLMs) capable of de-identifying clinical text efficiently on standard hardware, achieving high precision and recall. AI
Summary written by None from 2 sources. How we write summaries →
IMPACT Provides a more diverse dataset and efficient models for de-identifying clinical text, potentially enabling broader secondary use of EHR data.
RANK_REASON The cluster contains an academic paper detailing a new dataset and distilled models for de-identification.