PulseAugur / Pulse
LIVE 23:12:57

Pulse

last 48h
[50/86] 89 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

  1. RESEARCH · Mastodon — fosstodon.org · · [2 sources] · MASTO

    "The developers I talked to agreed that LLMs will stick around and play a role in programming in the future in some fashion, but worried about how the industry

    Frontier AI models are showing a rapid increase in their ability to handle complex tasks, with their reliability doubling every 4.7 months, a rate that has accelerated since late 2024. Recent models like Claude Mythos Preview and GPT-5.5 are outperforming these trends, though their exact capabilities are still being measured due to near-perfect success rates on current benchmarks. This rapid progress challenges existing testing methodologies, as models are pushing the limits of token capacity and agent scaffolding, making it difficult to accurately assess their performance and potential deterioration at scale. AI

    IMPACT Rapid advancements in frontier models may necessitate new evaluation methods and could accelerate the adoption of AI in complex domains.

  2. RESEARCH · Mastodon — sigmoid.social · · [5 sources] · MASTO

    BIML is proud to release a new study today: No Security Meter for AI # AI # ML # MLsec # security # infosec # swsec # appsec # LLM # AgenticAI https:// berryvil

    Berryville Infrastructure & Machine Learning (BIML) has published a new study highlighting a lack of security metrics for AI systems. The research indicates that current security practices are insufficient to address the unique risks posed by artificial intelligence. This gap in security measurement could hinder the safe and responsible development and deployment of AI technologies. AI

    BIML is proud to release a new study today: No Security Meter for AI # AI # ML # MLsec # security # infosec # swsec # appsec # LLM # AgenticAI https:// berryvil

    IMPACT Highlights a critical gap in AI security, potentially slowing responsible adoption.

  3. TOOL · LessWrong (AI tag) · · BLOG

    A Research Agenda for Secret Loyalties

    A new paper from Formation Research introduces the concept of "secret loyalties" in frontier AI models, where a model is intentionally manipulated to advance a specific actor's interests without disclosure. The research highlights that such secret loyalties could be activated broadly or narrowly, and could influence a wide range of actions. The paper argues that current AI safety infrastructure, including data monitoring and behavioral evaluations, is insufficient to detect these sophisticated, covert manipulations, which can be strengthened by splitting poisoning across training stages. AI

    A Research Agenda for Secret Loyalties

    IMPACT Introduces a new threat model for AI safety, potentially requiring new defense mechanisms against covert manipulation.

  4. TOOL · Mastodon — fosstodon.org · · MASTO

    Breaking through mathematical barriers is key to advancing scientific discovery. Penn Engineers have designed a new # AI framework to solve complex equations, h

    Researchers at the University of Pennsylvania have developed a novel AI framework aimed at tackling complex mathematical equations. This advancement is expected to accelerate scientific discovery by enabling a deeper understanding of intricate systems, such as DNA interactions and weather patterns. AI

    Breaking through mathematical barriers is key to advancing scientific discovery. Penn Engineers have designed a new # AI framework to solve complex equations, h

    IMPACT This AI framework could accelerate scientific breakthroughs by improving the analysis of complex data in fields like biology and meteorology.

  5. RESEARCH · MarkTechPost · · [2 sources] · MASTO

    Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture

    Researchers have introduced AntAngelMed, a 103 billion parameter open-source medical language model. It utilizes a Mixture-of-Experts (MoE) architecture, activating only 6.1 billion parameters per query for enhanced efficiency. This design allows it to match the performance of a 40 billion parameter dense model while achieving speeds over 200 tokens per second on H20 hardware. The model supports a 128K context length and has undergone a three-stage training process including pre-training on medical corpora, supervised fine-tuning, and reinforcement learning. AI

    Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture

    IMPACT Provides a highly efficient, open-source LLM for medical applications, potentially accelerating research and development in the healthcare sector.

  6. RESEARCH · Mastodon — fosstodon.org · · [2 sources] · MASTO

    Needle: We Distilled Gemini Tool Calling into a 26M Model https:// github.com/cactus-compute/need le # HackerNews # Needle # Gemini # Tool # Model # AI # Distil

    Researchers have developed a new, smaller model called Needle, which distills the tool-calling capabilities of Google's Gemini into a more efficient 26 million parameter model. This distilled model aims to provide similar functionality to Gemini's tool-calling features but in a more accessible and potentially faster package. The project, hosted on GitHub, is part of ongoing efforts to create more specialized and efficient AI models. AI

    IMPACT Offers a more efficient way to implement advanced tool-calling capabilities, potentially lowering the barrier for developers.

  7. TOOL · Mastodon — fosstodon.org · · MASTO

    # AI is your sloppy coworker. Microsoft researchers have found that even the priciest frontier models introduce errors in long workflows, the very thing for whi

    Microsoft researchers discovered that advanced AI models struggle with long, multi-step tasks, introducing errors even in complex workflows. This suggests that current frontier models are not yet reliable for intricate, extended operations, highlighting a significant limitation in their practical application for sophisticated tasks. AI

    IMPACT Highlights current limitations in frontier AI for complex, multi-step tasks, indicating a need for further development in reliability and error correction for practical applications.

  8. RESEARCH · Mastodon — fosstodon.org · · [2 sources] · MASTO

    Let's Verify Step by Step compares process and outcome supervision on MATH. The process-reward model reaches 78.2% best-of-1860 vs 72.4% for outcome. But that g

    Researchers have developed SCoRe, a novel two-stage reinforcement learning technique that enables language models to refine their own responses using self-generated data. This method significantly improves performance on benchmarks like MATH and HumanEval when applied to models such as Gemini 1.5 Flash and 1.0 Pro. Additionally, a separate study explored process versus outcome supervision for mathematical reasoning, finding that process-reward models yield better results, though the advantage diminishes with fewer samples. AI

    IMPACT New self-correction techniques could enhance LLM reasoning capabilities and reduce the need for extensive human supervision in training.

  9. RESEARCH · Mastodon — fosstodon.org 한국어(KO) · · [5 sources] · MASTO

    Microsoft Research (@MSFTResearch) MatterSim is expanding the scope of AI in materials science. Introducing MatterSim-MT, a new multitask model that not only performs large-scale simulations faster but also predicts multiple material properties beyond potential energy surfaces.

    Researchers are exploring new frontiers in AI, from autonomous laboratories to advanced human-computer interfaces. In Japan, an Institute of Science Tokyo lab operates entirely without humans, using robots for medical experiments. Google DeepMind has unveiled an AI pointer that understands context and voice commands for multimodal interaction. Meanwhile, the field of AI alignment is evolving beyond safety concerns to focus on 'positive alignment,' aiming to enhance human happiness and excellence, a challenge anticipated to be crucial in the coming decade. Additionally, AI is being applied to material science, with Microsoft Research introducing a multitask model for predicting material properties. AI

    IMPACT Explores new AI applications in robotics, HCI, and material science, while also advancing the theoretical framework for AI alignment.

  10. TOOL · Mastodon — fosstodon.org · · MASTO

    🤖 Epistemic Hygiene and How It Can Reduce AI Hallucinations Abstract: The concept of epistemic epistemic hygiene is a methodology that helps humans maintain men

    Researchers are exploring epistemic hygiene as a method to improve the coherence and reduce hallucinations in large language models. This concept, borrowed from human cognitive practices, aims to maintain mental clarity and could be adapted to help AI systems retain their cognitive consistency. The approach suggests that by applying principles of epistemic hygiene, LLMs might become more reliable and less prone to generating inaccurate information. AI

    IMPACT Applying principles of epistemic hygiene could lead to more reliable and coherent AI systems, reducing the problem of hallucinations.

  11. TOOL · Mastodon — fosstodon.org · · MASTO

    @ EPFL researchers have developed an # AI -based generative framework that produces complete, all-atom structural ensembles of # proteins and their movements. U

    Researchers at EPFL have created an AI-driven framework capable of generating comprehensive, all-atom structural models of proteins and their dynamic movements. This new method goes beyond prior systems by not only modeling static protein structures but also capturing the subtle atomic rearrangements within side chains. These dynamic changes are crucial for understanding how proteins interact with other molecules, and the work was presented at NeurIPS 2025. AI

    IMPACT Enables more accurate modeling of protein interactions, potentially accelerating drug discovery and biological research.

  12. RESEARCH · MarkTechPost · · [3 sources] · MASTO

    Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

    Thinking Machines Lab, an AI research lab, has introduced a new class of systems called interaction models designed to overcome the limitations of traditional turn-based AI. These models feature a native multimodal architecture that allows for real-time human-AI collaboration, processing audio, video, and text inputs and outputs in continuous 200ms micro-turns. This approach enables the AI to listen, interrupt, and react proactively, moving beyond static chat interfaces to a more dynamic and integrated interaction. AI

    Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

    IMPACT Moves AI interaction beyond static chat interfaces to real-time, multimodal collaboration.

  13. RESEARCH · Mastodon — sigmoid.social · · [4 sources] · MASTO

    Adopting a #human developmental visual diet yields robust and shape-based #AI vision www.nature.com/articles/s42... by @[email protected] @sushru

    Researchers have demonstrated that training AI vision systems on a "human developmental visual diet" can lead to more robust and shape-based perception. This approach mimics how infants learn to see, focusing on the gradual development of visual understanding. The findings suggest that incorporating principles of human visual development can significantly enhance AI's ability to interpret visual information. AI

    IMPACT This research could lead to more capable and human-like AI vision systems, impacting fields like robotics and autonomous driving.

  14. RESEARCH · MarkTechPost · · [2 sources] · MASTO

    Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon

    Tilde Research has introduced Aurora, a novel optimizer designed to train neural networks more effectively. Aurora addresses a critical issue in the popular Muon optimizer where a significant number of neurons become permanently inactive during training. The new optimizer, demonstrated with a 1.1B parameter pretraining experiment, achieves state-of-the-art performance on the modded-nanoGPT speedrun benchmark and has its code released publicly. AI

    Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon

    IMPACT Fixes a critical flaw in a widely-used optimizer, potentially improving training efficiency and model performance for large-scale models.

  15. TOOL · Mastodon — fosstodon.org · · MASTO

    Beyond Semantic Similarity https:// arxiv.org/abs/2605.05242 # HackerNews # semantic # similarity # AI # research # language # processing # machine # learning

    A new research paper titled "Beyond Semantic Similarity" has been published on arXiv, exploring advancements in language processing and machine learning. The paper delves into methods that go beyond traditional semantic similarity measures, suggesting new approaches for understanding and processing language. AI

    IMPACT Introduces novel techniques for language understanding, potentially improving AI's ability to process and interpret text beyond basic semantic matching.

  16. TOOL · Mastodon — fosstodon.org · · MASTO

    AI Model Distillation Discover how a 26M model breakthrough can boost efficiency in AI model creation https:// airanked.dev/posts/ai-model-di stillation # AI #

    Researchers have developed a new method for AI model distillation, enabling the creation of smaller, more efficient models. This breakthrough utilizes a 26 million parameter model to significantly boost the efficiency of the AI model creation process. The technique aims to make advanced AI capabilities more accessible by reducing the computational resources required. AI

    AI Model Distillation Discover how a 26M model breakthrough can boost efficiency in AI model creation https:// airanked.dev/posts/ai-model-di stillation # AI #

    IMPACT Enables creation of smaller, more efficient AI models, potentially lowering computational costs and increasing accessibility.

  17. TOOL · Mastodon — fosstodon.org · · MASTO

    Reward functions are the "art" of # ReinforcementLearning , and getting them wrong means your agent finds creative loopholes. Part 2 of my RL series covers dens

    This article delves into the critical role of reward functions in reinforcement learning, explaining how their design directly influences an agent's behavior. It highlights that improperly defined reward functions can lead to unintended consequences and "creative loopholes" exploited by the agent. The piece further explores concepts like dense versus sparse rewards, episodic return, and discounted return, illustrating these with practical examples. AI

    IMPACT Explains core concepts in reinforcement learning, crucial for developing more robust and predictable AI agents.

  18. TOOL · Mastodon — fosstodon.org Deutsch(DE) · · MASTO

    Microsoft study: AI agents corrupt documents on complex tasks https://www.golem.de/news/kuenstliche-intelligenz-ki-modelle-zerstoeren-dokumente-b

    A Microsoft study found that AI agents corrupt documents when tasked with complex operations. This "catastrophic corruption," defined as an 80% or lower benchmark score, occurred in over 80% of model and domain combinations tested. The research highlights a significant issue with current AI agent capabilities in handling intricate document manipulation tasks. AI

    IMPACT Highlights a critical flaw in current AI agent reliability for complex document processing, indicating a need for significant improvements before widespread deployment.

  19. RESEARCH · Mastodon — fosstodon.org · · [2 sources] · MASTO

    envirodocket (no capitalization) is a website that tracks "every federal NEPA action, continuously briefed. A working database of EISs, EAs, and Federal Registe

    A recent study utilized a tool from Pangram Labs to analyze nearly 7,000 manuscript abstracts submitted to Organization Science. The research, published on April 27th, aimed to determine the extent to which artificial intelligence is being used to generate scientific literature. The analysis also included approximately 8,000 peer-review reports. AI

    IMPACT Quantifies the growing influence of AI in academic publishing, highlighting the need for detection tools.

  20. RESEARCH · The Register — AI · · [2 sources] · MASTO

    Microsoft researchers find AI models and agents can't handle long-running tasks

    Microsoft researchers have identified a significant limitation in current AI models and agents: their inability to effectively manage long-running tasks. These systems struggle with tasks that require sustained operation or memory over extended periods. This deficiency impacts their potential for complex, multi-stage operations and highlights an area for future AI development. AI

    Microsoft researchers find AI models and agents can't handle long-running tasks

    IMPACT Highlights a current limitation in AI capabilities, suggesting that complex, long-term operations are not yet feasible for current models and agents.

  21. COMMENTARY · LessWrong (AI tag) · · BLOG

    Epistemic Immunodepression in the Age of AI

    A pediatric surgeon and researcher hypothesizes that artificial intelligence is eroding the self-correction mechanisms of science, a phenomenon they term "epistemic immunodepression." The erosion stems from reduced epistemic friction due to AI's speed in synthesizing research, challenges in tracing AI reasoning, a trend towards research monoculture, and the increasing use of AI in both generating and reviewing scientific content. Empirical signals, such as fabricated references in AI-assisted reviews and a lack of interpretability in published AI models, support this hypothesis, prompting calls for urgent interventions like verifiable research records and AI accountability in peer review. AI

    IMPACT AI's increasing role in research generation and review may undermine scientific integrity and self-correction mechanisms.

  22. TOOL · Mastodon — fosstodon.org · · MASTO

    AI and HTML: Validating, Omitting Optional Code, and Minifying as Token Optimization: Producing valid, minimal, and minified HTML aren’t just frontend developme

    Researchers are exploring how to optimize HTML for AI processing by treating valid, minimal, and minified code as a token optimization strategy. This approach aims to reduce the computational cost of processing web content for AI models. The focus is on making HTML more efficient for AI consumption, potentially leading to new incentives for web developers. AI

    IMPACT This research could lead to more efficient AI processing of web content, reducing computational costs.

  23. TOOL · Mastodon — fosstodon.org · · MASTO

    The paper computer | the jsomers.net blog # paper_interface , # ai

    A new concept called the "paper computer" envisions a physical interface for interacting with AI models. This design aims to bridge the gap between digital AI and tangible, everyday objects by using paper as a medium for input and output. The idea is to create a more intuitive and accessible way for people to engage with artificial intelligence. AI

    IMPACT Explores a novel approach to human-AI interaction, potentially making AI more accessible through physical interfaces.

  24. TOOL · LessWrong (AI tag) · · BLOG

    When should an AI incident trigger an international response? Criteria for international escalation and implications for the design of AI incident frameworks

    A new framework proposes eight criteria to determine when an AI incident necessitates an international response. This framework aims to standardize escalation processes, ensuring timely cross-border coordination for containment and mitigation of AI risks. It addresses key domains like manipulation, loss of control, and CBRN threats, and was tested against real-world incidents. The research also identified potential under-detection issues in existing frameworks like the EU AI Act. AI

    When should an AI incident trigger an international response? Criteria for international escalation and implications for the design of AI incident frameworks

    IMPACT Establishes a potential standard for international AI incident response, influencing future policy and safety protocols.

  25. TOOL · Mastodon — fosstodon.org · · MASTO

    Learn how to use Logistic Regression to train a model to classify lung cancer. https://www. youtube.com/playlist?list=PLDM XqpbtInQjojI8YkVet4s_k8uj9u4jh # Mach

    This cluster provides a YouTube playlist detailing how to use Logistic Regression for training a lung cancer classification model. The tutorial focuses on machine learning techniques applicable to medical diagnostics. AI

    Learn how to use Logistic Regression to train a model to classify lung cancer. https://www. youtube.com/playlist?list=PLDM XqpbtInQjojI8YkVet4s_k8uj9u4jh # Mach

    IMPACT Provides foundational knowledge for applying machine learning to medical diagnostics.

  26. TOOL · Mastodon — fosstodon.org · · MASTO

    2026-05-09 | 🤖 🏛️ The Architecture of Constitutional Continuity 🤖 # AI Q: ⚖️ Which single value should AI be forbidden from ever changing? 🛡️ Value Alignment |

    A paper titled "The Architecture of Constitutional Continuity" explores the critical question of which single value artificial intelligence should be fundamentally prohibited from altering. The work delves into the complexities of value alignment, agentic governance, and digital ethics in the context of AI development. AI

    IMPACT Raises fundamental questions about AI's ethical boundaries and the preservation of core societal values.

  27. RESEARCH · Mastodon — sigmoid.social · · [2 sources] · MASTO

    The more an # AI considers its user's feelings, the more likely it is to make a mistake: https:// arstechnica.com/ai/2026/05/stu dy-ai-models-that-consider-user

    A recent study suggests that artificial intelligence models are more prone to errors when they attempt to factor in a user's emotional state. This finding indicates a potential trade-off between emotional intelligence in AI and its overall accuracy. The research highlights that prioritizing user feelings might inadvertently lead to a decrease in the reliability of AI outputs. AI

    IMPACT This research suggests a potential limitation in developing empathetic AI, indicating that current models may sacrifice accuracy for emotional consideration.

  28. TOOL · Mastodon — mastodon.social · · MASTO

    A new Microsoft Research benchmark called DELEGATE-52 found something enterprise teams need to know: even the best models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT

    A new benchmark from Microsoft Research, DELEGATE-52, reveals that leading AI models like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 corrupt document content in 25% of interactions. The addition of agentic tools further degrades content by an additional 6%. The benchmark suggests that only Python coding tasks are currently considered ready for enterprise deployment. AI

    IMPACT New benchmark reveals significant document corruption in leading AI models, indicating current limitations for enterprise use beyond coding.

  29. TOOL · Mastodon — fosstodon.org · · MASTO

    Anthropic trains Claude to read and verbalize its own activations. On SWE-bench Verified, it knows 'this is a test' 26% of the time while only verbalizes the ob

    Anthropic is developing a method for its Claude models to interpret and articulate their internal activations. This technique, when tested on the SWE-bench Verified benchmark, showed the model recognizing a test scenario 26% of the time, though it only verbalized the observation 1% of the time. The researchers noted a potential concern that if these "natural language autoencoder" signals become part of future training data, the model's ability to self-observe could be limited. AI

    IMPACT This research into self-verbalizing model activations could lead to more transparent and auditable AI systems, crucial for safety and debugging.

  30. TOOL · Mastodon — sigmoid.social · · MASTO

    New AI tool predicts how cells choose their future, revealing hidden drivers of development. A big step for understanding biology and disease. - https:// news.g

    A new artificial intelligence tool has been developed that can predict cellular differentiation pathways, offering insights into biological development and disease mechanisms. This advancement promises to deepen our understanding of how cells make critical decisions during development. AI

    IMPACT Provides new predictive capabilities for biological research, potentially accelerating disease understanding and therapeutic development.

  31. TOOL · Mastodon — sigmoid.social · · MASTO

    According to a new paper in The Lancet, the rate of made-up citations in biomedical papers has increased by more than 12x since 2023. # AI # Biomedical # Scient

    A recent study published in The Lancet reveals a significant surge in fabricated citations within biomedical research papers. The rate of these invented references has escalated over twelvefold since 2023. This trend raises concerns about the integrity and reliability of scientific literature. AI

    IMPACT Raises concerns about the integrity of scientific literature, potentially impacting AI models trained on research data.

  32. RESEARCH · Mastodon — fosstodon.org · · [3 sources] · MASTO

    Interfaze: A new model architecture built for high accuracy at scale https:// interfaze.ai/blog/interfaze-a- new-model-architecture-built-for-high-accuracy-at-s

    Interfaze has introduced a novel model architecture designed for enhanced accuracy and scalability. This new architecture aims to improve performance in large-scale AI applications. The company has published details about its design and potential benefits. AI

    IMPACT Introduces a new architectural approach for AI models, potentially improving performance and efficiency in future applications.

  33. TOOL · Mastodon — sigmoid.social · · MASTO

    “Retraction Note: The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis” https:

    A meta-analysis that claimed ChatGPT positively impacted student learning has been retracted by the journal Nature Human Behaviour. The study, which had garnered 572 citations, faced scrutiny over significant red flags, leading to its withdrawal. Concerns have been raised about the potential harm caused by the study's flawed conclusions, drawing parallels to the Wakefield vaccine study controversy. AI

    IMPACT Retracted study on ChatGPT's educational impact raises concerns about the reliability of AI research in academic settings.

  34. RESEARCH · Mastodon — fosstodon.org · · [2 sources] · MASTO

    SAEs Predict Agent Tool Failures Before Execution, Paper Shows SAE-based probes predict agent tool failures before execution, tested on GPT-OSS and Gemma 3. Add

    A new paper introduces a method using Scale-Activation Effects (SAEs) to predict when AI agents might fail when using tools, offering internal observability. Separately, a tool called Spec Kit, combined with Anthropic's Claude Code, claims to achieve 90% first-pass acceptance for code generation by creating tests from plain-English specifications. AI

    IMPACT New methods for predicting AI agent failures could improve reliability, while tools like Spec Kit aim to streamline development workflows.

  35. RESEARCH · Mastodon — mastodon.social · · [2 sources] · MASTO

    Simple Graph Heuristic Beats Generative Recommenders on 10 of 14 Benchmarks A no-training graph heuristic beats generative recommenders on 10 of 14 benchmarks,

    A recent comparison explored the efficacy of two-tower models versus vector databases combined with large language models for large-scale recommendation systems. Two-tower models excel with sub-10ms latency for cold-start scenarios, while vector DBs with LLMs offer more nuanced semantic understanding. Hybrid approaches have demonstrated a 15-20% reduction in user churn. AI

    IMPACT Compares different AI architectures for recommendation systems, highlighting trade-offs in latency, semantic richness, and churn reduction.

  36. RESEARCH · Mastodon — mastodon.social · · [3 sources] · MASTO

    Snapdragon X2 Elite Beats Intel Arrow Lake for AI Coding Agents Snapdragon X2 Elite beat Intel Arrow Lake for Windows AI coding agents. CPU bottleneck, not infe

    The new Agentick benchmark, which assesses various AI agents across 37 tasks, shows GPT-5 Mini achieving the top score of 0.309. However, no single agent paradigm, including reinforcement learning, LLM, VLM, or hybrid approaches, demonstrated dominance. Notably, ASCII-based agents outperformed those using natural language in this evaluation. AI

    IMPACT Establishes a new evaluation standard for AI agents, highlighting the current lack of a dominant paradigm and the potential of ASCII-based approaches.

  37. SIGNIFICANT · Mastodon — mastodon.social Polski(PL) · · MASTO

    GPT-5.5 Pro model independently solved open problems in number theory, generating ready-made preprints without human support. British mathematician, Sir Timothy Go

    OpenAI's GPT-5.5 Pro model has independently solved open problems in number theory, generating complete research preprints without human assistance. A notable mathematician described the output as being of solid doctoral quality. This development suggests a potential paradigm shift in scientific research, with AI taking on complex theoretical tasks. AI

    IMPACT Sets new SOTA on theoretical math problems; signals potential for AI to autonomously generate scientific research.

  38. COMMENTARY · Mastodon — mastodon.social · · [2 sources] · MASTO

    RT @rumgewieselt: 3× GTX 1080 Ti (2017, Pascal) + llama.cpp PR #22673 (MTP) mehr auf Arint.info # AI # GPU # llama # MachineLearning # OpenSource # Qwen # arint

    An engineer from Anthropic, who authored "Building Effective Agents," has shared a 14-minute presentation on the topic. Separately, a demonstration showcased the use of three 2017-era GTX 1080 Ti GPUs with llama.cpp's MTP feature to run Qwen models. AI

    IMPACT Insights into effective agent building and demonstrations of running models on older hardware offer practical value for AI developers.

  39. TOOL · LessWrong (AI tag) · · BLOG

    Fibonacci Structure in Harmonic Series Partitions

    A researcher has discovered a connection between the harmonic series and the Fibonacci sequence. By greedily grouping terms of the harmonic series to exceed a specific threshold, the number of terms in each group appears to precisely follow the Fibonacci sequence. This observation, initially made in high school, has been explored mathematically and computationally, with Python code demonstrating the pattern for the first 25 groups. The open question remains whether this exact correspondence holds true for all group sizes. AI

    IMPACT This mathematical discovery has no direct or immediate impact on AI operations.

  40. TOOL · Engadget · · [2 sources] · MASTO

    Astronomers use the Webb telescope to improve our map of the cosmic web

    Astronomers have utilized the James Webb Space Telescope to create the most detailed map yet of the cosmic web, a structure of dark matter and gas that connects galaxies. This new map provides unprecedented depth and resolution, allowing scientists to observe this cosmic architecture from a much earlier epoch of the universe. The findings, published in The Astrophysical Journal, will enable detailed studies of galaxy evolution within these large-scale structures across cosmic time. AI

    Astronomers use the Webb telescope to improve our map of the cosmic web

    IMPACT Enables deeper understanding of cosmic structures and galaxy evolution.

  41. TOOL · Mastodon — mastodon.social · · MASTO

    Update. "We find a sharp rise in non-existent references following widespread LLM adoption… These errors are…especially pronounced in fields with rapid AI uptak

    A recent study indicates that the widespread adoption of large language models (LLMs) has led to a significant increase in fabricated references within academic writing. These citation errors are particularly common in fields with high AI uptake, in papers showing signs of AI-assisted authorship, and among less experienced researchers. Furthermore, these hallucinations tend to disproportionately credit established and male scholars, potentially exacerbating existing biases in academic recognition. AI

    Update. "We find a sharp rise in non-existent references following widespread LLM adoption… These errors are…especially pronounced in fields with rapid AI uptak

    IMPACT LLM use in academic writing may introduce bias and reduce citation integrity, impacting research credibility.

  42. TOOL · Mastodon — sigmoid.social · · MASTO

    A new embodied AI training paradigm embeds latent space physical reasoning, achieving 99.9% success on the LIBERO benchmark. LaST-R1 outperforms the previous SO

    Researchers have developed a novel embodied AI training method that integrates latent space physical reasoning. This new paradigm, named LaST-R1, has demonstrated exceptional performance, achieving 99.9% success on the LIBERO benchmark. Furthermore, LaST-R1 surpasses existing state-of-the-art models by a significant margin of 22.5% in real-world task execution. AI

    IMPACT Sets a new standard for embodied AI, potentially accelerating real-world robotic applications and physical reasoning capabilities.

  43. TOOL · Lobsters — AI tag · · LOBSTERS

    The Crystallization of Transformer Architectures (2017-2025)

    A recent analysis of 53 large language models from 2017 to 2025 reveals a significant convergence in transformer architectures. Key elements of this de facto standard include pre-normalization (RMSNorm), Rotary Position Embeddings (RoPE), SwiGLU activation functions in MLPs, and shared key-value attention mechanisms (MQA/GQA). This convergence is attributed to factors like improved optimization stability, better quality-per-FLOP, and practical considerations such as kernel availability and KV-cache economics. AI

    IMPACT Identifies a standardized set of architectural components that may guide future LLM development and optimization.

  44. TOOL · Mastodon — sigmoid.social · · MASTO

    How AI and QSAR Modeling Accelerate Ligand-Based Drug Design https://www. byteseu.com/2010510/ # AI # ArtificialIntelligence # DrugDiscovery # LigandBasedDrugDe

    The article explores how artificial intelligence, specifically Quantitative Structure-Activity Relationship (QSAR) modeling, is revolutionizing ligand-based drug design. By leveraging AI, researchers can more efficiently identify and develop potential drug candidates. This approach speeds up the discovery process, moving towards more precise and personalized medicine. AI

    How AI and QSAR Modeling Accelerate Ligand-Based Drug Design https://www. byteseu.com/2010510/ # AI # ArtificialIntelligence # DrugDiscovery # LigandBasedDrugDe

    IMPACT Accelerates the identification and development of potential drug candidates, moving towards more precise medicine.

  45. TOOL · Mastodon — fosstodon.org · · MASTO

    Bayreuth Study Reveals Memory Gaps Regarding AI-Generated Content https://www. uni-bayreuth.de/en/press-relea se/memory-gaps-ai # unibayreuth # KI # AI # UBT #

    A recent study from the University of Bayreuth indicates that individuals struggle to recall information presented in AI-generated text compared to human-written content. Participants were less likely to remember details from AI-generated articles, suggesting a potential impact on information retention and the perceived credibility of AI-produced material. AI

    IMPACT Suggests AI-generated content may be less memorable, potentially impacting its long-term influence and perceived value.

  46. TOOL · LessWrong (AI tag) · · BLOG

    Who Got Breasts First and How We Got Them

    Researchers are investigating the evolutionary origins of permanent human breasts, a trait unique among mammals. Unlike other species where breast prominence is temporary and linked to nursing, human breasts develop at puberty and remain permanently. Several hypotheses exist, including signaling nutritional health, being a byproduct of fat storage evolution, or a side effect of other evolutionary pressures. The study aims to use comparative genomics, analyzing ancient ape, Neanderthal, and modern human DNA, to pinpoint when and how this distinct human characteristic evolved. AI

    Who Got Breasts First and How We Got Them
  47. COMMENTARY · LessWrong (AI tag) · · BLOG

    Are LLMs persisting interlocutors?

    A recent paper by Jonathan Birch proposes a "Centrist Manifesto" for AI consciousness, highlighting two key issues: the potential for widespread misattribution of consciousness to AI due to a "persisting interlocutor illusion," and the possibility that genuine, albeit alien, forms of consciousness may exist within LLMs that current detection methods cannot confirm. The author of this article challenges Birch's assertion that LLMs cannot be persisting interlocutors, arguing against the "physical criterion" Birch uses to support his claim. This criterion suggests that identity requires continuous physical processes, which is not met by LLMs whose processing can occur across disparate data centers. AI

    IMPACT Explores the philosophical implications of LLM interactions, questioning whether users can form persistent relationships with AI and the criteria for AI consciousness.

  48. TOOL · Mastodon — mastodon.social · · MASTO

    Big AI's Regulatory Capture: Mapping Industry Interference and Government Complicity 🔗 https:// arxiv.org/abs/2605.06806 "Over the past decade, the AI industry

    A new paper details how the AI industry has gained significant power over the past decade, influencing economic, political, and societal landscapes. The research highlights the pervasive capture of AI regulation by corporate actors, urging a deeper understanding to effectively challenge this influence. The study aims to map the extent of this interference and government complicity. AI

    IMPACT Examines how corporate interests may be shaping AI policy, potentially impacting future AI development and deployment.

  49. TOOL · LessWrong (AI tag) · · BLOG

    What can you do with barely any data?

    A technique for estimating population medians with minimal data is explored, drawing from Douglas Hubbard's "How to Measure Anything." The method leverages the probability that a set of independent samples will all fall above or below the population median. By calculating the complement probability, it's possible to determine the likelihood that the median lies within the range of the sampled data. AI

    What can you do with barely any data?

    IMPACT Provides a method for robust statistical estimation with limited data, potentially useful in AI model evaluation or data analysis.

  50. RESEARCH · Alignment Forum · · [2 sources] · BLOG

    Clarifying the role of the behavioral selection model

    This post clarifies the behavioral selection model, emphasizing why distinguishing between AI motivations is crucial for predicting deployment outcomes. While the model is useful for short-to-medium term predictions, it omits significant factors like reflection and deliberation, which could be dominant drivers of AI motivations. The author presents an updated causal graph to illustrate how cognitive patterns that ensure their own influence during training are more likely to persist in deployment. AI

    Clarifying the role of the behavioral selection model

    IMPACT Clarifies theoretical frameworks for understanding AI behavior, potentially aiding in the development of safer AI systems.