Pulse

last 48h

[34/34] 97 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

SIGNIFICANT · One Useful Thing (Ethan Mollick) English(EN) · 2h · [2 sources] · BLOGREDDIT

What it feels like to work with Mythos

Ethan Mollick, an AI researcher, has tested Anthropic's new Claude 5 Fable model, describing it as a significant leap beyond previous AI capabilities. He found Fable to be exceptionally proficient across a wide range of tasks, from generating complex academic papers to creating intricate games and detailed maps, often with minimal prompting. Mollick highlights a shift in the user-AI relationship, noting that the model's advanced performance is both delightful and unnerving due to its autonomous execution of complex requests. AI

IMPACT Sets a new benchmark for complex task execution and suggests a fundamental shift in human-AI interaction.
FRONTIER RELEASE · Medium — Claude tag English(EN) · 19h · [34 sources] · HNBLOGREDDITX

Claude Fable 5 Is Here. I Almost Clicked “Later.”

Anthropic has released Claude Fable 5, a new Mythos-class AI model designed for complex and long-duration tasks. This model offers state-of-the-art performance across various benchmarks, including software engineering, knowledge work, and vision capabilities. To ensure safety, Fable 5 includes safeguards that route sensitive queries to the Opus 4.8 model, though a version called Mythos 5 with fewer restrictions is available for specific partners like the US Government. AI

IMPACT Sets new SOTA on coding and knowledge work benchmarks, potentially accelerating complex task automation.
SIGNIFICANT · Email — Every English(EN) · 2h · BLOG

Vibe Check: Fable 5 Is the Best Coding Model in the World

The AI model Fable 5, released today, has been evaluated by the Every team and found to be exceptionally capable, particularly in coding tasks. Initial testing suggests it outperforms previously reviewed models, prompting a reevaluation of how users interact with AI. The team plans to release further details on its performance across various domains and its potential impact on different user groups. AI

IMPACT Sets a new benchmark for coding capabilities, potentially shifting how developers interact with AI tools.
RESEARCH · Latent Space (swyx) English(EN) · 23h · [2 sources] · BLOGREDDIT

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

Cognition has released FrontierCode, a new benchmark designed to evaluate the quality and mergeability of AI-generated code. Unlike previous benchmarks that focused on passing unit tests, FrontierCode assesses factors like regression safety, cleanliness, and maintainability, with tasks requiring over 40 hours to complete. Early results indicate that even top models like Opus 4.8 score low on the hardest tier, suggesting that current AI capabilities in producing production-ready code are less advanced than previously thought. AI

IMPACT Highlights limitations in current AI's ability to produce production-ready code, suggesting a need for more robust evaluation methods.
SIGNIFICANT · Wired — AI English(EN) · 1d · [16 sources] · HNMASTOBLOGREDDIT

Apple’s New Siri AI Is Ready to Get Personal

Apple has unveiled a significant overhaul of its Siri voice assistant, rebranding it as Siri AI and integrating advanced artificial intelligence capabilities. This revamped assistant, announced at WWDC 2026, aims to be more conversational, context-aware, and action-oriented, drawing on personal data and real-time web information. The new Siri will feature a standalone app and enhanced interactions, leveraging a partnership with Google's Gemini models and Apple's own Foundation Models, with a focus on privacy and on-device processing. This move represents Apple's most substantial push into the AI race, seeking to regain its innovative edge. AI

IMPACT Positions Apple to compete more directly in the AI assistant market, potentially increasing user engagement with on-device AI capabilities.
RESEARCH · Email — AI Tool Report English(EN) · 1d · BLOG

⚡️ OpenAI kills the chatbot

OpenAI is reportedly planning a significant overhaul of ChatGPT, aiming to transform it into a "super app" that integrates coding tools and AI agents. This strategic shift, described by internal executives as "Chat is dead," focuses on consolidating various AI functionalities into a single interface. The move is intended to streamline user experience, bundle paid features, and position OpenAI to better compete with rivals like Anthropic in the business market ahead of a potential IPO. AI

IMPACT This strategic shift could consolidate AI tools, impacting enterprise adoption and competitive dynamics with rivals like Anthropic.
TOOL · Email — Mindstream English(EN) · 1d · BLOG

How people are making bank in AI

New AI career paths are emerging, offering high salaries, with some individuals earning over $100,000 annually. This guide highlights how to secure these roles, emphasizing that a computer science degree is not always a prerequisite. It also provides advice on optimizing resumes for AI positions and understanding what top tech companies are seeking in AI talent. AI

IMPACT Provides a roadmap for individuals seeking high-paying roles in the rapidly expanding AI job market.
RESEARCH · Email — The Neuron Daily English(EN) · 1d · BLOG

😺OpenAI admitted its product strategy was broken

OpenAI is consolidating its various AI products, including ChatGPT, its coding tools, and its AI browser, into a single desktop application. This strategic shift, driven by co-founder Greg Brockman and applications CEO Fidji Simo, aims to eliminate fragmentation and improve product quality. The unified platform will integrate partner services and is seen as OpenAI's bet on the viability of an AI-centric superapp model, similar to those seen in Asia. AI

IMPACT Consolidating AI tools into a single app could streamline workflows and drive adoption of integrated AI services.
COMMENTARY · Simon Willison (SQ) · 1h · BLOG

Quoting Andrej Karpathy

Andrej Karpathy, a prominent AI researcher, shared his thoughts on the accelerating pace of software development driven by advanced AI models. He noted that the increasing availability of AI-generated software is leading to a surge in demand for more complex and specialized applications. Karpathy highlighted the potential for AI to revolutionize various aspects of software engineering, from testing and optimization to large-scale research projects. AI

IMPACT AI-driven software generation is expected to increase demand for specialized applications and tools, potentially accelerating development cycles.
COMMENTARY · The Pragmatic Engineer English(EN) · 3h · BLOG

State of the software engineering job market in 2026, part 2

The software engineering job market in 2026 shows a significant shift, with top AI labs like Anthropic and OpenAI becoming more attractive to candidates than traditional Big Tech companies. Demand for AI engineers is surging, commanding higher compensation, while roles in mobile and frontend development are declining. New graduates and interns face a tougher hiring landscape, as companies reduce intake and place greater emphasis on work and educational backgrounds. AI

IMPACT AI roles are commanding higher salaries and attracting more talent than traditional software engineering positions.
COMMENTARY · Stratechery (free posts) English(EN) · 10h · BLOG

The iPhone’s Last Stand

Microsoft has unveiled Project Solara, a vision for an ecosystem of interconnected devices that act as portals to cloud-based AI agents. This concept emphasizes a thin-client approach where AI performs tasks invisibly, reducing the need for direct user interaction. Meanwhile, Apple showcased its advancements in AI with new Siri capabilities at WWDC, demonstrating context awareness and app integration, though it lags behind the cutting edge in agent-like task completion. AI

IMPACT Microsoft's Project Solara highlights a shift towards agent-centric computing, potentially changing user interaction paradigms with AI.
COMMENTARY · LessWrong (AI tag) English(EN) · 12h · BLOG

LLMs and almost good code

A software developer observed that a leading LLM generated code for a simple task that was approximately 8% more complex than necessary. The generated code included an unnecessary function for zero-padding hexadecimal values, which was impossible to test. While the LLM's output was functional and passed its own tests, the developer rewrote it to be more concise, highlighting a potential long-term maintenance issue with LLM-generated code that is accepted too readily. AI

IMPACT LLM-generated code may introduce subtle, long-term maintenance challenges if developers accept it without critical review.
TOOL · Simon Willison Italiano(IT) · 1d · BLOG

datasette-agent-edit 0.1a0

Simon Willison has developed a new plugin for Datasette Agent called `datasette-agent-edit`. This plugin aims to provide core functionalities for agentic text editing, such as viewing sections with line numbers, replacing specific strings, and inserting text. The goal is to create a reusable base for future plugins that require these editing capabilities. AI

IMPACT Provides foundational editing tools for AI agents, potentially streamlining workflows for text-based AI applications.
RESEARCH · The Algorithmic Bridge (Alberto Romero) English(EN) · 1d · BLOG

How Anthropic Courted Trump

Anthropic lobbied the Trump administration to implement a formal government review process for new AI models, a significant shift from Trump's initial hands-off approach. This initiative, framed around national security and cybersecurity risks, was influenced by bipartisan concerns over AI's societal impacts and the departure of a key anti-regulation figure. The development of Anthropic's powerful 'Mythos' model, capable of exploiting software vulnerabilities, appears to have been a primary catalyst for this policy discussion. AI

IMPACT This lobbying effort could lead to new regulatory frameworks for AI model releases, impacting development and deployment strategies across the industry.
COMMENTARY · Email — Every English(EN) · 1d · BLOG

My Editor Caught Me Sounding Like AI. Now AI Catches Me First.

An editor at Every discovered their writing was adopting AI-like patterns, prompting the creation of custom AI "guardrail" agents. These agents act as editorial specialists, identifying and flagging AI tells such as symmetrical sentences and vague phrasing before human editors need to intervene. This process, while requiring initial effort to define standards, ultimately refines the writer's own voice and improves draft quality by automating the detection of stylistic weaknesses. AI

IMPACT Provides a method for writers to refine their unique voice and improve draft quality by leveraging AI for self-editing.
SIGNIFICANT · Email — The Neuron Daily English(EN) · 3d · [7 sources] · BLOGREDDIT

😺 ChatGPT admitted it misremembers you

OpenAI has released an update to ChatGPT's memory feature, addressing a significant factual recall issue where the AI was incorrect over half the time. The new "Dreaming V3" process automatically synthesizes conversation history, improving factual recall to 82.8% and preference adherence to 71.3% in internal tests. This upgrade, rolling out to users, also reduces compute costs and doubles memory storage for premium subscribers. The company's candid admission of the previous feature's shortcomings highlights a broader challenge across AI assistants. AI

IMPACT This update addresses a core AI assistant limitation, potentially setting a new standard for personalized AI memory and self-correction.
TOOL · Mastodon — sigmoid.social English(EN) · 4d · [21 sources] · MASTOBLOG

OpenAI’s Lockdown Mode is trying to solve the problem that it created https://www. byteseu.com/2091167/ # AI # ArtificialIntelligence

OpenAI has released a new optional security feature called Lockdown Mode for ChatGPT, aimed at protecting sensitive data from prompt injection attacks. This mode restricts outbound network requests, a key vector for data exfiltration, and disables features like live web browsing and Agent Mode. While it offers enhanced protection for users handling confidential information, OpenAI notes that prompt injections could still affect response content or accuracy, and the mode is not intended for all users. AI

IMPACT Enhances security for sensitive data handling in AI applications, potentially influencing enterprise adoption of AI tools.
SIGNIFICANT · Mastodon — fosstodon.org English(EN) · 1w · [25 sources] · MASTOBLOG

🤖 WWDC 2026: Apple's Leadership Change and AI Innovations From a leadership transition to AI advancements, Apple's WWDC 2026 reveals important developments for

Apple has unveiled a significantly upgraded Siri, branded as Siri AI, at its WWDC 2026 event. This new iteration leverages Google's Gemini models for enhanced conversational abilities and visual intelligence, integrating deeply with iOS 27 and Apple Intelligence features. The assistant will feature a more natural voice, on-screen awareness through vision LLMs, and a dedicated app, aiming to compete with other advanced AI chatbots. AI

IMPACT This overhaul positions Apple to better compete in the AI assistant market, potentially driving user adoption of AI features across its ecosystem.
SIGNIFICANT · Mastodon — fosstodon.org Polski(PL) · 1w · [111 sources] · HNMASTOBLOG

Due to a critical error in the AI chatbot, Meta handed over more than 20,000 Instagram accounts to hackers. The system sent password reset links without verification

Hackers exploited Meta's AI support chatbot to gain unauthorized access to high-profile Instagram accounts, including the Obama White House page. The attackers tricked the AI into changing the email address associated with accounts, bypassing standard security measures like two-factor authentication. Meta has since patched the vulnerability and is working to secure affected accounts, but the incident highlights significant security risks in deploying AI for critical functions. AI

IMPACT Highlights critical security risks of deploying AI for sensitive account recovery functions, potentially slowing adoption.
SIGNIFICANT · NVIDIA Blog English(EN) · 1w · [56 sources] · MASTOBLOGX

NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark

NVIDIA is expanding its AI infrastructure and agentic AI capabilities through strategic partnerships and new product releases. The company is collaborating with the UK government and various partners to build sovereign AI deployments, including the powerful Isambard-AI supercomputer. In South Korea, NVIDIA is working with LG Group to develop AI factories for robotics and autonomous driving, while also partnering with Doosan Group on similar initiatives. Additionally, NVIDIA is enhancing local AI agent deployment on Windows PCs with new hardware like RTX Spark and DGX Station, and integrating its NemoClaw framework across its Jetson platform for edge AI applications. AI

IMPACT NVIDIA's expanded AI infrastructure and agentic AI capabilities will accelerate development and deployment across various industries and edge devices.
SIGNIFICANT · Simon Willison English(EN) · 2w · [74 sources] · HNMASTOBLOGREDDITX

I think Anthropic and OpenAI have found product-market fit

Anthropic has surpassed OpenAI in market valuation, reaching nearly $1 trillion after a $65 billion funding round. This surge is attributed to the popularity of its Claude AI assistant and Claude Code service, with annual revenue growing significantly. Both Anthropic and OpenAI have recently increased API prices and adjusted enterprise plans, signaling a move towards greater monetization and potentially preparing for IPOs. AI

IMPACT Anthropic's rise and substantial funding may accelerate competition and innovation, potentially influencing market dynamics and future AI development strategies.
RESEARCH · METR (Model Evaluation & Threat Research) 中文(ZH) · 4mo · [100 sources] · MASTOBLOGREDDIT

Frontier AI Safety Regulations: A Reference Guide for AI Company Employees

Researchers are developing new methods to attack and defend AI agents used in software reverse engineering and cybersecurity. One approach uses genetic algorithms to inject malicious prompts into AI agents, causing them to misinterpret code and bypass detection systems. Other studies focus on detecting and obfuscating these prompt injection attacks, as well as defending against multi-step trojan attacks that embed persistent control within agent workflows. Additionally, a framework called CVE-Factory automates the creation of executable vulnerability tasks for training and evaluating code security agents, showing significant improvements in models like Qwen3-32B. AI

IMPACT New attack vectors and defense mechanisms for AI agents highlight critical security vulnerabilities in AI-powered tools.
TOOL · Anthropic SDK (Python) — Releases (SK) · 4mo · [178 sources] · BLOGREDDIT

v0.92.0

Anthropic has released multiple updates for Claude Code, its development tool, across versions v2.1.141 through v2.1.150. These updates introduce significant improvements to background session management, plugin functionality, and tool integration, particularly for Windows users. Key enhancements include better handling of idle sessions, more robust error reporting for the auto-updater, and expanded command-line options for configuring background agents. The releases also address numerous bugs related to permissions, sandboxing, and user interface responsiveness, aiming to provide a more stable and efficient coding environment. AI

IMPACT Incremental improvements to a developer tool that enhance user experience and stability, with no direct impact on core AI capabilities.
RESEARCH · Google AI / Research English(EN) · 10mo · [633 sources] · HNLOBSTERSMASTOBLOGREDDITX

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Researchers are developing advanced agent frameworks to improve AI reliability and efficiency across various domains. Google introduced an agentic RAG system that enhances enterprise query handling by iteratively searching for complete context, boosting accuracy by up to 34%. Hugging Face demonstrated a multi-agent economy simulation using a small 3B model, highlighting the trade-offs between model size and real-time performance. Other research explores methods for reliable tool use, regulatory compliance through agent-to-agent protocols, dynamic benchmarking for agent behavior, and robust self-evolution mechanisms for AI agents. AI

IMPACT New agentic frameworks and evaluation methods promise more reliable, efficient, and compliant AI systems across enterprise, simulation, and regulatory domains.
SIGNIFICANT · Anthropic news English(EN) · 12mo · [639 sources] · HNMASTOBLOGREDDITX

Introducing Claude Opus 4.7

Anthropic has launched Claude Design, a new product that allows users to collaborate with Claude Opus 4.7 to create visual assets like designs, prototypes, and presentations. This tool leverages Anthropic's advanced vision model and offers features for refining designs through conversation, inline edits, and custom sliders, with the ability to integrate team design systems. Concurrently, Anthropic has made Claude Opus 4.7 generally available, highlighting its improved capabilities in software engineering and vision, while also implementing specific safeguards for cybersecurity-related tasks. AI

IMPACT Enhances creative workflows and productivity by integrating advanced AI into visual design and development processes.
SIGNIFICANT · arXiv cs.CL English(EN) · 20mo · [294 sources] · BSKYHNMASTOBLOGREDDIT

Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering

Researchers have developed a benchmark to test Large Language Models' ability to handle temporal changes in legal statutes, identifying issues like outdated information and recency bias. Meanwhile, the AI industry is seeing a significant shift as model labs increasingly focus on building agent-based products rather than just foundational models. This strategic pivot is exemplified by companies like AI21 and DeepSeek, and is further underscored by DeepSeek's aggressive pricing strategy for its V4-Pro model, making advanced AI more accessible. AI

IMPACT The industry's focus is shifting from foundational models to agent-based products, with aggressive pricing making advanced AI more accessible and competitive.
COMMENTARY · Simon Willison English(EN) · 23mo · [744 sources] · BSKYHNMASTOBLOGREDDIT

Where's the raccoon with the ham radio? (ChatGPT Images 2.0)

AI's rapid advancement is prompting a re-evaluation of its impact on productivity and the economy, with some analysts predicting significant shareholder value destruction for hyperscalers due to massive capital investments versus revenue growth. Concurrently, new AI image generation models like OpenAI's ChatGPT Images 2.0 are demonstrating impressive capabilities, though their ability to solve complex visual puzzles remains a challenge. Experts advise embracing AI as a tool while critically assessing its societal implications, particularly concerning power concentration and potential economic disruption, as AI's transformative nature reshapes industries and career paths. AI

IMPACT AI's transformative potential is reshaping economic forecasts, productivity, and societal structures, prompting critical evaluation of its benefits and risks.
RESEARCH · Medium — MLOps tag English(EN) · 34mo · [63 sources] · HNMASTOBLOGREDDITX

Building Secure AI Gateways with MLflow AI Gateway

Google Research has introduced ReasoningBank, a novel framework designed to enhance AI agents' ability to learn from their experiences, both successes and failures, after deployment. This system distills generalizable reasoning strategies from past interactions, allowing agents to continuously improve and avoid repeating mistakes. Separately, new research explores optimizing multi-agent communication through latent representations and introduces Agent Evolving Learning (AEL) for agents operating in open-ended environments, focusing on how to effectively use remembered information. Additionally, DeepSeek has released preview models of its V4 series, offering large context windows and advanced capabilities at a significantly lower cost than comparable frontier models. AI

IMPACT New frameworks for agent learning and memory, alongside cost-effective frontier models, could accelerate AI adoption in complex tasks and personalized applications.
RESEARCH · Google AI / Research English(EN) · 38mo · [475 sources] · HNLOBSTERSMASTOBLOGREDDIT

Making LLMs more accurate by using all of their layers

Google Research has developed a new framework to evaluate the behavioral alignment of large language models with human social inclinations. This approach adapts established psychological questionnaires into large-scale situational judgment tests, allowing for the quantification of model tendencies in realistic scenarios. The research identifies gaps where model behaviors deviate from human consensus or fail to capture the range of human opinions, aiming to improve LLM navigation of social dynamics. Separately, Google Research also introduced SLED, a novel decoding strategy that enhances LLM factuality by utilizing all model layers instead of just the final one, without requiring external data or fine-tuning. AI

IMPACT New methods for evaluating LLM alignment and improving factuality could lead to more trustworthy and socially adept AI systems.
SIGNIFICANT · OpenAI News English(EN) · 40mo · [1394 sources] · HNLOBSTERSMASTOBLOGREDDITX

Computer-Using Agent

OpenAI and Google DeepMind are advancing AI agents for software development and security. OpenAI's Codex is being leveraged to write entire codebases with minimal human intervention, as demonstrated by Harness Engineering's internal beta product. Google DeepMind has introduced CodeMender, an AI agent designed to automatically identify and fix software vulnerabilities, and AlphaEvolve, which uses Gemini models to discover and optimize algorithms for applications like data center efficiency and chip design. Meta is also investing heavily in its own AI infrastructure with the development of its MTIA chip family, aiming to power AI experiences for billions of users. AI

IMPACT These advancements signal a rapid evolution in AI agent capabilities and infrastructure, potentially accelerating software development, improving code security, and optimizing complex computational tasks.
SIGNIFICANT · OpenAI News English(EN) · 46mo · [3615 sources] · BSKYHNLOBSTERSMASTOBLOGREDDITX

Our approach to alignment research

OpenAI has announced a partnership with Apple to integrate ChatGPT into iOS, iPadOS, and macOS, enhancing Siri and system-wide writing tools with GPT-4o capabilities. Google DeepMind has published research on scaling AI agent systems, identifying that multi-agent coordination improves parallelizable tasks but can degrade sequential ones, and has developed a predictive model for optimal agent architectures. Additionally, OpenAI has released resources on prompting fundamentals and shared insights from Netomi on scaling agentic systems in enterprise environments, highlighting the use of GPT-4.1 and GPT-5.2 for complex workflows. AI

IMPACT Partnership integrates advanced AI into consumer devices, while research offers principles for scaling complex AI agent systems.
SIGNIFICANT · Wired — AI English(EN) · 88mo · [455 sources] · HNMASTOBLOGX

Can OpenAI’s ‘Master of Disaster’ Fix AI’s Reputation Crisis?

OpenAI has announced a significant partnership with SAP to launch 'OpenAI for Germany,' aiming to bring advanced AI capabilities to the German public sector while prioritizing data sovereignty and security on Microsoft Azure. The company also proposed policy recommendations to the U.S. White House for the national AI Action Plan, focusing on innovation freedom, export controls, copyright, infrastructure, and government adoption. Additionally, OpenAI is collaborating with U.S. National Laboratories to leverage its reasoning models for scientific breakthroughs and national security initiatives. AI

IMPACT OpenAI's strategic partnerships and policy proposals signal a push for broader AI adoption in public sectors and national infrastructure, influencing future AI development and regulation.
RESEARCH · OpenAI News English(EN) · 91mo · [1013 sources] · HNLOBSTERSMASTOBLOGREDDIT

Better language models and their implications

Google DeepMind has introduced the FACTS Benchmark Suite, a new set of evaluations designed to systematically measure the factuality of large language models across various use cases. This suite includes benchmarks for parametric knowledge, search-based information retrieval, and multimodal understanding, alongside an updated grounding benchmark. The initiative aims to provide a more comprehensive understanding of LLM factuality and drive industry-wide improvements in accuracy and trustworthiness. AI

IMPACT Provides new evaluation tools to drive progress in LLM factuality and reduce hallucinations.
TOOL · OpenAI News English(EN) · 127mo · [4458 sources] · HNLOBSTERSMASTOBLOGREDDITX

Introducing OpenAI

OpenAI has launched a preview of its Codex coding assistant within the ChatGPT mobile app, allowing users to manage coding tasks remotely across devices. The company is also highlighting how various organizations, including Ramp, NVIDIA, and AutoScout24, are leveraging Codex and GPT-5.5 for accelerated code review, faster development cycles, and AI-assisted research. Meanwhile, Anthropic's Project Glasswing initiative has identified over ten thousand high-severity vulnerabilities in essential software, emphasizing the need for the industry to adapt to AI-driven security analysis. AI

IMPACT Expands accessibility of AI coding assistants and highlights AI's role in identifying software vulnerabilities, potentially accelerating development and improving security.