New framework estimates LVLM confidence by contrasting image-based predictions

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a new framework called BICR (Blind-Image Contrastive Ranking) to assess the confidence of Large Vision-Language Models (LVLMs). This method helps distinguish between predictions genuinely informed by visual input and those relying solely on language priors. BICR trains a lightweight probe to contrast hidden states from the LVLM with and without the image, penalizing higher confidence when the image is obscured. Evaluated on multiple LVLMs and diverse tasks, BICR demonstrated superior calibration and discrimination with significantly fewer parameters than existing baselines. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Improves reliability of vision-language models by identifying predictions not grounded in visual input.

RANK_REASON Academic paper introducing a novel method for confidence estimation in LVLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
safety

COVERAGE [1]

arXiv cs.CL TIER_1 · Mohammad M. Ghassemi · 2026-05-11 17:35

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

Large vision-language models suffer from visual ungroundedness: they can produce a fluent, confident, and even correct response driven entirely by language priors, with the image contributing nothing to the prediction. Existing confidence estimation methods cannot detect this, as…

COVERAGE [1]

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

RELATED ENTITIES

RELATED TOPICS