ENTITY GPT-4

GPT-4

PulseAugur coverage of GPT-4 — every cluster mentioning GPT-4 across labs, papers, and developer communities, ranked by signal.

Total · 30d

249

249 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

150

150 over 90d

TIER MIX · 90D

frontier release 10
significant 27
research 72
tool 113
commentary 27

RELATIONSHIPS

SENTIMENT · 30D

8 day(s) with sentiment data

RECENT · PAGE 2/4 · 69 TOTAL

RESEARCH · CL_15409 · May 5 · 05:08

New benchmarks reveal military LLM compliance gaps and jailbreak vulnerabilities

A new military-aligned safety benchmark called ARMOR 2025 has been introduced to evaluate large language models on their compliance with military doctrines such as the Law of War and Rules of Engagement. Initial results…
COMMENTARY · CL_30038 · May 4 · 21:53

Anthropic engineer pushes HTML over Markdown for Claude Code agent outputs

Anthropic's Claude Code team is advocating for a shift from Markdown to HTML for agent outputs, arguing that Markdown's token efficiency is no longer a primary concern with large context windows. A Claude Code engineer,…
TOOL · CL_17217 · May 3 · 07:06

What is Tokenization Drift and How to Fix It?

Tokenization drift occurs when minor formatting changes in input text, such as spacing or line breaks, lead to different token IDs being generated by a model. This can cause unpredictable shifts in model behavior becaus…
COMMENTARY · CL_13298 · May 2 · 21:37

Hacker News commenters rank top coding models by performance

A recent analysis of Hacker News comments reveals that while models like GPT-4 and Claude 3 Opus are highly regarded for their coding capabilities, they are not perceived as the absolute state-of-the-art. Users frequent…
RESEARCH · CL_13057 · May 2 · 13:46

GPT-5.5 and Opus 4.7 show systematic reasoning failures on ARC-AGI-3 benchmark

A new benchmark, ARC-AGI-3, has revealed significant reasoning errors in advanced AI models like GPT-5.5 and Opus 4.7. These models achieved a mere 0.8% success rate on the benchmark, highlighting persistent gaps in abs…
COMMENTARY · CL_12702 · May 2 · 02:30

Developers urged to build on cheap AI before subsidies end

AI companies are currently offering subsidized access to powerful models like GPT-4 and Claude Opus, similar to how Uber and AWS subsidized early adoption. This strategy aims to capture market share by making advanced A…
RESEARCH · CL_12039 · May 1 · 09:34

Google DeepMind's AI Co-Clinician beats GPT-5.4 in medical tests, aids doctors

Google DeepMind has developed an AI co-clinician designed to assist physicians with diagnostics and patient care, aiming to reduce errors and improve efficiency. In blind evaluations, this AI demonstrated superior perfo…
RESEARCH · CL_10517 · Apr 30 · 10:24

IBM's new 8B Granite 4.1 model outperforms older 32B MoE version

IBM has released Granite 4.1, a family of open-source language models designed for enterprise use, featuring three sizes (3B, 8B, and 30B parameters). Notably, the 8B dense model demonstrates performance matching or exc…
COMMENTARY · CL_07403 · Apr 28 · 10:08

The Social Edge of Intellgience: Individual Gain, Collective Loss https://www.theideasletter.org/essay/the-social-edge-of-intelligence/ # HackerNews # Tech # AI

A recent study suggests that while AI tools can enhance individual creativity, they may lead to a collective loss of diversity in output. Researchers found that writers using GPT-4 produced more creative individual stor…
RESEARCH · CL_08320 · Apr 28 · 09:25

AI chatbots excel at emergency psychiatric triage but over-assign urgency

A new study evaluated 15 advanced AI chatbots on their ability to perform emergency psychiatric triage using 112 clinical vignettes. The chatbots demonstrated high accuracy in identifying true emergencies, with an under…
RESEARCH · CL_07230 · Apr 28 · 08:00

AI models achieve 10x intelligence gains via Mixture of Experts and Transformer architectures

The Transformer architecture, introduced in the paper "Attention Is All You Need," revolutionized AI by enabling models to process information more efficiently. This innovation is key to understanding how models like Op…
FRONTIER RELEASE · CL_07150 · Apr 28 · 06:25

AI models demonstrate dominance, rewriting human achievement benchmarks

AI models have demonstrated a significant leap in performance, moving from failing exams two years ago to achieving dominance. This rapid advancement suggests that AI is not only mastering existing benchmarks but is als…
RESEARCH · CL_06681 · Apr 28 · 04:00

New N-Gram attack probes black-box LLMs for training data leakage

Researchers have developed a new membership inference attack called N-Gram Coverage Attack, which can be used on black-box language models like GPT-4 by only analyzing their text outputs. This method leverages the obser…
RESEARCH · CL_05815 · Apr 27 · 19:12

AI tools increase self-represented court cases, straining the justice system

A new research paper indicates a significant increase in self-represented litigants in U.S. federal courts since 2022, coinciding with the widespread adoption of generative AI tools. The study, which analyzed millions o…
RESEARCH · CL_05561 · Apr 27 · 14:03

Open-source AI agent surpasses Gemini and GPT-4 on TerminalBench 2.0

An open-source AI agent, developed in Turkey and named OSS Agent I, has achieved a 65.2% success rate on the TerminalBench 2.0 benchmark. This performance surpasses that of established models like Google's Gemini-3-flas…
RESEARCH · CL_05297 · Apr 27 · 08:06

ChatGPT aces Japanese university exams; OpenAI tests ads; Anthropic adds agent learning

ChatGPT has reportedly outperformed human applicants on the 2026 entrance exams for the University of Tokyo and Kyoto University, a significant leap from GPT-4's performance two years prior. Meanwhile, OpenAI is testing…
FRONTIER RELEASE · CL_04875 · Apr 27 · 04:34

Meituan tests trillion-parameter AI model built on domestic compute

Meituan has reportedly initiated a private test of a trillion-parameter AI model, developed using only Chinese computing infrastructure. This model is said to rival GPT-4's performance and was likely trained using Huawe…
RESEARCH · CL_06304 · Apr 26 · 16:49

New RAG methods for medical QA show mixed results, with multimodal approach outperforming fine-tuning on larger scales

Researchers have developed MED-VRAG, a novel iterative multimodal retrieval-augmented generation framework that processes medical document page images, including tables and figures, rather than just text. This system ac…
FRONTIER RELEASE · CL_03573 · Apr 24 · 18:50

Deepseek V4 model rumored to achieve AGI capabilities

DeepSeek has reportedly released its V4 model, with claims of achieving AGI capabilities. The model is said to have surpassed GPT-4 on several benchmarks, including coding and reasoning tasks. This development suggests …
RESEARCH · CL_04970 · Apr 24 · 14:31

LLMs struggle to detect culturally specific health misinformation on YouTube

Two new research papers explore the limitations of Large Language Models (LLMs) in detecting culturally specific health misinformation, particularly concerning the promotion of cow urine as a remedy on YouTube in India.…