A study involving 26,904 queries across four leading AI models—OpenAI's GPT-5.4, Anthropic's Claude Sonnet 4.6, and two Google Gemini versions—revealed significant inconsistencies in carbohydrate estimations from food images. Despite using the same prompts and images repeatedly, the models produced widely varying results, with some estimations differing by hundreds of grams, potentially leading to dangerous insulin dosing errors for diabetes patients. The study also highlighted instances where models misidentified foods or hallucinated ingredients, further compromising their reliability for health applications. AI
Summary written by gemini-2.5-flash-lite from 6 sources. How we write summaries →
IMPACT Highlights critical reliability issues in AI models for health applications, potentially impacting patient safety and requiring robust validation.
RANK_REASON Research paper detailing AI model inconsistencies with significant implications for health applications.