optical character recognition
PulseAugur coverage of optical character recognition — every cluster mentioning optical character recognition across labs, papers, and developer communities, ranked by signal.
6 day(s) with sentiment data
-
AI agent built to safely summarize patient discharge data
This article details the creation of an AI agent designed to summarize patient discharge information from PDF documents. The agent focuses on extracting structured data like diagnoses, medications, and allergies, priori…
-
User seeks open-source workflow for editable text layers in images
A user on Reddit is seeking an open-source method to transform text within an image into editable layers, similar to features found in Canva or Ideogram. The desired workflow involves detecting text, reconstructing the …
-
AI automates Swiss initiative signature validation
Researchers have developed an AI-powered system to automate the analysis of handwritten signature lists used in Swiss popular initiatives. The proposed pipeline combines Optical Character Recognition (OCR) with writer r…
-
Large multimodal models show mixed results for medical image PHI detection
Researchers evaluated large multimodal models (LMMs) like GPT-4o and Gemini 2.5 Flash for detecting protected health information (PHI) in medical images. While LMMs showed improved text recognition (lower Word Error Rat…
-
Vision-Language Models enhance Italian parliamentary speech analysis
Researchers have developed a new pipeline using Vision-Language Models to improve the transcription and analysis of historical Italian parliamentary speeches. This approach leverages OCR for initial text extraction and …
-
AI automates healthcare data to improve clinical decision support
Modern healthcare faces a data liquidity problem, where a significant portion of patient information remains trapped in unstructured formats like scanned documents and free-text notes. This necessitates manual data entr…
-
AI logging gaps trigger $1.5M HIPAA fine for hospital
Healthcare organizations are facing significant HIPAA violations due to inadequate logging of AI system activity, leading to substantial fines. A recent case involved a hospital settling for $1.5 million because its AI …
-
Consensus Entropy improves VLM OCR accuracy by measuring inter-model agreement
Researchers have developed a new metric called Consensus Entropy (CE) to assess the reliability of Optical Character Recognition (OCR) outputs from Vision-Language Models (VLMs). CE measures the agreement between multip…
-
New CC-OCR V2 benchmark reveals LMMs fall short in real-world document processing
A new benchmark, CC-OCR V2, has been released to evaluate Large Multimodal Models (LMMs) on real-world document processing tasks. The benchmark includes 7,093 challenging samples across five OCR-centric tracks, addressi…
-
AI classifies historical document pages for tailored content processing
Researchers have developed an AI-powered image classification system to automatically categorize pages from historical documents. This system aims to streamline the processing of digitized archives by identifying differ…
-
New OCR benchmark reveals accuracy doesn't guarantee RAG performance
A new benchmark has been developed to evaluate the robustness of Optical Character Recognition (OCR) systems specifically for Retrieval-Augmented Generation (RAG) applications. Current OCR benchmarks using character-lev…
-
Sun Finance boosts ID verification accuracy with generative AI on AWS
Sun Finance, a Latvian fintech company, has successfully automated its identity document extraction and fraud detection processes using generative AI on Amazon Web Services (AWS). The new system, developed in partnershi…
-
Researchers release dataset of AI-generated images from GPT-Image-2's first week
Researchers have released a dataset of over 10,000 images generated by OpenAI's GPT-Image-2, collected in the first week following its April 21, 2026 release. The dataset, sourced from Twitter/X, was curated using a mul…
-
iWatchRoad system uses YOLO to detect and map potholes for smart cities
Researchers have developed iWatchRoad, an end-to-end system designed for the scalable detection and geospatial visualization of potholes. The system utilizes a fine-tuned YOLO model for real-time pothole identification …
-
New dataset and methods tackle low-light scene text recognition challenges
Researchers have introduced LSTR, a large-scale dataset for low-light scene text recognition, and ESTR, a smaller evaluation set of real nighttime street scenes. They explored two approaches: fine-tuning existing OCR mo…
-
HalalBench benchmark tackles OCR challenges for multilingual food packaging ingredient extraction
Researchers have introduced HalalBench, a new multilingual benchmark designed to evaluate Optical Character Recognition (OCR) performance specifically on food packaging ingredient labels. The benchmark addresses the uni…
-
Older, cheaper LLMs often match premium OCR accuracy at lower cost
Researchers have open-sourced a new benchmark and framework for evaluating Optical Character Recognition (OCR) performance across 18 different large language models (LLMs). Their analysis, involving over 7,500 calls, re…