ENTITY Gemini 1.5 Pro

Gemini 1.5 Pro

PulseAugur coverage of Gemini 1.5 Pro — every cluster mentioning Gemini 1.5 Pro across labs, papers, and developer communities, ranked by signal.

Total · 30d

11

11 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

9

9 over 90d

TIER MIX · 90D

research 2
tool 8
commentary 1

RELATIONSHIPS

developed by Google 100%

RECENT · PAGE 1/1 · 16 TOTAL

TOOL · CL_22532 · May 8 · 04:00

VLMs show significant privacy deficits in physical world simulations

Researchers have developed ImmersedPrivacy, an interactive audio-visual framework using a Unity simulator to evaluate the privacy awareness of Vision-Language Models (VLMs) in physical environments. Their study tested 1…
TOOL · CL_18789 · May 6 · 04:00

New MSI metric reveals nuanced bias in LLMs, with distillation reintroducing bias

Researchers have developed a new metric, the Moral Sensitivity Index (MSI), to evaluate contextual bias in large language models. This index quantifies the probability of biased output across a seven-tier stress test, m…
RESEARCH · CL_18669 · May 5 · 16:36

UnAC method enhances LMMs for complex multimodal reasoning with adaptive prompting

Researchers have introduced UnAC, a novel multimodal prompting method designed to enhance the reasoning capabilities of Large Multimodal Models (LMMs) on complex visual tasks. This method employs adaptive visual prompti…
RESEARCH · CL_15643 · May 5 · 04:00

New AI methods enhance video reasoning by structuring and selecting visual evidence

Researchers are developing new methods to improve how large vision-language models (VLMs) understand and reason about long videos. Several papers introduce techniques for more efficient frame selection and evidence gath…
COMMENTARY · CL_14506 · May 4 · 06:34

Google's Gemini 1.5 Pro benchmarks and Meta layoffs highlight AI's complex evolution

The AI development landscape is becoming increasingly complex, with discussions around AI's potential to eventually replace human trainers. This is highlighted by events such as Meta's recent layoffs and Google's advanc…
RESEARCH · CL_14347 · May 4 · 04:00

GPT-4o and other multimodal models evaluated on computer vision tasks

A new paper evaluates how well multimodal foundation models, including GPT-4o and Gemini 1.5 Pro, perform on standard computer vision tasks. Researchers developed a prompt-chaining method to translate vision tasks into …
RESEARCH · CL_13354 · May 2 · 21:04

AI models show low accuracy on Nigerian livestock knowledge, posing safety gap

A researcher has developed a benchmark to evaluate AI models on their knowledge of African livestock practices, specifically focusing on Nigeria. The initial test using Meta's Llama 3.1 8B model yielded a 43% accuracy r…
RESEARCH · CL_13057 · May 2 · 13:46

GPT-5.5 and Opus 4.7 show systematic reasoning failures on ARC-AGI-3 benchmark

A new benchmark, ARC-AGI-3, has revealed significant reasoning errors in advanced AI models like GPT-5.5 and Opus 4.7. These models achieved a mere 0.8% success rate on the benchmark, highlighting persistent gaps in abs…
RESEARCH · CL_11161 · May 1 · 01:16

AI agents gain intelligence via metacognition and prompt optimization

Recent research explores advanced agent architectures that move beyond simple retry loops for complex tasks. Studies like "Supervising Ralph Wiggum" demonstrate that separating metacognitive critique into a distinct age…
RESEARCH · CL_10080 · Apr 30 · 04:00

LLMs excel at extracting data from electricity invoices with prompt engineering

A new study published on arXiv evaluates the effectiveness of general-purpose Large Language Models (LLMs) for extracting structured data from Spanish electricity invoices. Researchers benchmarked Gemini 1.5 Pro and Mis…
RESEARCH · CL_09823 · Apr 29 · 06:22

New DSIPA framework detects LLM text by analyzing sentiment patterns

Researchers have developed DSIPA, a new framework designed to detect text generated by large language models without requiring model parameters or extensive labeled datasets. The method analyzes sentiment distribution s…
RESEARCH · CL_08588 · Apr 29 · 04:00

AdaTooler-V research improves multimodal LLMs' adaptive vision tool use

Researchers have introduced AdaTooler-V, a multimodal large language model designed to improve efficiency in visual reasoning tasks. Unlike previous models that sometimes unnecessarily invoke vision tools, AdaTooler-V a…
RESEARCH · CL_08320 · Apr 28 · 09:25

AI chatbots excel at emergency psychiatric triage but over-assign urgency

A new study evaluated 15 advanced AI chatbots on their ability to perform emergency psychiatric triage using 112 clinical vignettes. The chatbots demonstrated high accuracy in identifying true emergencies, with an under…
RESEARCH · CL_13606 · Apr 26 · 09:14

Bankers find AI-generated reports unusable, while software engineers embrace coding agents in 2026

A recent benchmark involving 500 investment bankers found that AI-generated client reports are unusable for professional engagement in the banking sector. Models such as GPT-5.4 and Claude Opus 4.6 produced reports that…
TOOL · CL_17686 · Oct 28 · 14:13

LLMs fail 'pass the butter' robot test, scoring far below human performance

A new evaluation called Butter-Bench has revealed that current state-of-the-art large language models struggle significantly with controlling robots for practical tasks. In tests designed to assess their ability to perf…
RESEARCH · CL_00387 · Feb 24 · 18:30

Google and OpenAI advance AI factuality, multilingualism, and safety

Google DeepMind has introduced the FACTS Benchmark Suite, a new set of evaluations designed to systematically assess the factuality of large language models across various use cases. This suite includes benchmarks for p…