GridProbe: Posterior-Probing for Adaptive Test-Time Compute in Long-Video VLMs
Researchers have developed GridProbe, a novel method to improve the efficiency of long-video Visual Language Models (VLMs). This technique adaptively selects relevant frames during inference, reducing the computational cost associated with processing thousands of frames. GridProbe achieves this by probing frame importance in the answer space, allowing for a dynamic adjustment of the number of frames processed based on question difficulty without sacrificing accuracy. AI
IMPACT Reduces computational demands for processing long video content with AI, potentially enabling wider adoption of VLM applications.