PulseAugur
LIVE 01:33:14
ENTITY GPT-4

GPT-4

PulseAugur coverage of GPT-4 — every cluster mentioning GPT-4 across labs, papers, and developer communities, ranked by signal.

Total · 30d
249
249 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
150
150 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

8 day(s) with sentiment data

RECENT · PAGE 2/4 · 69 TOTAL
  1. RESEARCH · CL_15409 ·

    New benchmarks reveal military LLM compliance gaps and jailbreak vulnerabilities

    A new military-aligned safety benchmark called ARMOR 2025 has been introduced to evaluate large language models on their compliance with military doctrines such as the Law of War and Rules of Engagement. Initial results…

  2. COMMENTARY · CL_30038 ·

    Anthropic engineer pushes HTML over Markdown for Claude Code agent outputs

    Anthropic's Claude Code team is advocating for a shift from Markdown to HTML for agent outputs, arguing that Markdown's token efficiency is no longer a primary concern with large context windows. A Claude Code engineer,…

  3. TOOL · CL_17217 ·

    What is Tokenization Drift and How to Fix It?

    Tokenization drift occurs when minor formatting changes in input text, such as spacing or line breaks, lead to different token IDs being generated by a model. This can cause unpredictable shifts in model behavior becaus…

  4. COMMENTARY · CL_13298 ·

    Hacker News commenters rank top coding models by performance

    A recent analysis of Hacker News comments reveals that while models like GPT-4 and Claude 3 Opus are highly regarded for their coding capabilities, they are not perceived as the absolute state-of-the-art. Users frequent…

  5. RESEARCH · CL_13057 ·

    GPT-5.5 and Opus 4.7 show systematic reasoning failures on ARC-AGI-3 benchmark

    A new benchmark, ARC-AGI-3, has revealed significant reasoning errors in advanced AI models like GPT-5.5 and Opus 4.7. These models achieved a mere 0.8% success rate on the benchmark, highlighting persistent gaps in abs…

  6. COMMENTARY · CL_12702 ·

    Developers urged to build on cheap AI before subsidies end

    AI companies are currently offering subsidized access to powerful models like GPT-4 and Claude Opus, similar to how Uber and AWS subsidized early adoption. This strategy aims to capture market share by making advanced A…

  7. RESEARCH · CL_12039 ·

    Google DeepMind's AI Co-Clinician beats GPT-5.4 in medical tests, aids doctors

    Google DeepMind has developed an AI co-clinician designed to assist physicians with diagnostics and patient care, aiming to reduce errors and improve efficiency. In blind evaluations, this AI demonstrated superior perfo…

  8. RESEARCH · CL_10517 ·

    IBM's new 8B Granite 4.1 model outperforms older 32B MoE version

    IBM has released Granite 4.1, a family of open-source language models designed for enterprise use, featuring three sizes (3B, 8B, and 30B parameters). Notably, the 8B dense model demonstrates performance matching or exc…

  9. COMMENTARY · CL_07403 ·

    The Social Edge of Intellgience: Individual Gain, Collective Loss https://www.theideasletter.org/essay/the-social-edge-of-intelligence/ # HackerNews # Tech # AI

    A recent study suggests that while AI tools can enhance individual creativity, they may lead to a collective loss of diversity in output. Researchers found that writers using GPT-4 produced more creative individual stor…

  10. RESEARCH · CL_08320 ·

    AI chatbots excel at emergency psychiatric triage but over-assign urgency

    A new study evaluated 15 advanced AI chatbots on their ability to perform emergency psychiatric triage using 112 clinical vignettes. The chatbots demonstrated high accuracy in identifying true emergencies, with an under…

  11. RESEARCH · CL_07230 ·

    AI models achieve 10x intelligence gains via Mixture of Experts and Transformer architectures

    The Transformer architecture, introduced in the paper "Attention Is All You Need," revolutionized AI by enabling models to process information more efficiently. This innovation is key to understanding how models like Op…

  12. FRONTIER RELEASE · CL_07150 ·

    AI models demonstrate dominance, rewriting human achievement benchmarks

    AI models have demonstrated a significant leap in performance, moving from failing exams two years ago to achieving dominance. This rapid advancement suggests that AI is not only mastering existing benchmarks but is als…

  13. RESEARCH · CL_06681 ·

    New N-Gram attack probes black-box LLMs for training data leakage

    Researchers have developed a new membership inference attack called N-Gram Coverage Attack, which can be used on black-box language models like GPT-4 by only analyzing their text outputs. This method leverages the obser…

  14. RESEARCH · CL_05815 ·

    AI tools increase self-represented court cases, straining the justice system

    A new research paper indicates a significant increase in self-represented litigants in U.S. federal courts since 2022, coinciding with the widespread adoption of generative AI tools. The study, which analyzed millions o…

  15. RESEARCH · CL_05561 ·

    Open-source AI agent surpasses Gemini and GPT-4 on TerminalBench 2.0

    An open-source AI agent, developed in Turkey and named OSS Agent I, has achieved a 65.2% success rate on the TerminalBench 2.0 benchmark. This performance surpasses that of established models like Google's Gemini-3-flas…

  16. RESEARCH · CL_05297 ·

    ChatGPT aces Japanese university exams; OpenAI tests ads; Anthropic adds agent learning

    ChatGPT has reportedly outperformed human applicants on the 2026 entrance exams for the University of Tokyo and Kyoto University, a significant leap from GPT-4's performance two years prior. Meanwhile, OpenAI is testing…

  17. FRONTIER RELEASE · CL_04875 ·

    Meituan tests trillion-parameter AI model built on domestic compute

    Meituan has reportedly initiated a private test of a trillion-parameter AI model, developed using only Chinese computing infrastructure. This model is said to rival GPT-4's performance and was likely trained using Huawe…

  18. RESEARCH · CL_06304 ·

    New RAG methods for medical QA show mixed results, with multimodal approach outperforming fine-tuning on larger scales

    Researchers have developed MED-VRAG, a novel iterative multimodal retrieval-augmented generation framework that processes medical document page images, including tables and figures, rather than just text. This system ac…

  19. FRONTIER RELEASE · CL_03573 ·

    Deepseek V4 model rumored to achieve AGI capabilities

    DeepSeek has reportedly released its V4 model, with claims of achieving AGI capabilities. The model is said to have surpassed GPT-4 on several benchmarks, including coding and reasoning tasks. This development suggests …

  20. RESEARCH · CL_04970 ·

    LLMs struggle to detect culturally specific health misinformation on YouTube

    Two new research papers explore the limitations of Large Language Models (LLMs) in detecting culturally specific health misinformation, particularly concerning the promotion of cow urine as a remedy on YouTube in India.…