Brief

last 24h

[50/288] 185 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI · 1d

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

Researchers have developed KV-Fold, a novel method for extending the context window of large language models without requiring retraining. This technique treats the key-value cache as an accumulator in a functional programming-style fold, allowing the model to process sequential chunks of data while maintaining a stable internal state. KV-Fold has demonstrated 100% exact-match retrieval on needle-in-a-haystack benchmarks across various context lengths and model sizes, operating within the memory constraints of a single GPU. AI

IMPACT Enables LLMs to process significantly longer contexts without costly retraining, potentially improving performance on tasks requiring extensive background information.
- KV-Fold
- Llama-3.1-8B
RESEARCH · 36氪 (36Kr) 中文(ZH) · 21h

Scotiabank Canada: Global copper market expected to see a deficit of 350,000 tons in 2027

Xunfei's Doubao LLM is reportedly receiving enhanced capabilities, though specific details remain undisclosed. Separately, Scenovation Technology has secured nearly $100 million in Series C funding, led by Suzhou Industrial Park Investment Group, to advance its automotive and embodied AI chip development. Additionally, a report from Scotiabank predicts a global copper deficit of 350,000 tons by 2027, driven by robust demand and supply-side challenges. AI

IMPACT AI advancements in chip technology and LLMs continue, while market predictions highlight resource constraints impacting future AI development.
TOOL · arXiv cs.CL · 1d

Geometric Factual Recall in Transformers

Researchers have proposed a new theory of how transformer language models memorize factual information, suggesting a 'geometric' form of memorization rather than traditional associative memory. This model posits that learned embeddings encode relational structure, with the MLP acting as a relation-conditioned selector. Experiments with a single-layer transformer demonstrated that logarithmic embedding dimensions suffice for memorizing random bijections, and the MLP learned a generic selection mechanism transferable to new facts. AI

IMPACT Proposes a new understanding of how LLMs store information, potentially leading to more efficient model architectures.
- Transformers
- MLP
TOOL · arXiv cs.AI · 1d

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

Researchers have proposed a new framework for understanding how Large Language Models (LLMs) learn within a given context. Their work suggests that LLMs update their behavior by performing Bayesian inference over a low-dimensional geometric space, termed a conceptual belief space. By analyzing LLMs' performance on story understanding tasks, the study found that these belief updates follow predictable trajectories on structured manifolds, which are reflected in both the models' external behavior and internal representations. Furthermore, interventions on these internal representations could causally influence the belief trajectories, supporting the geometric account of LLM belief dynamics. AI

IMPACT Proposes a geometric framework for understanding LLM in-context learning, potentially enabling more predictable and steerable model behavior.
- Large Language Models
- Bayesian inference
TOOL · arXiv cs.CL · 1d

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Researchers have introduced ORBIT, a new method designed to prevent large language models from losing their foundational language capabilities during task-specific fine-tuning. This issue, known as catastrophic forgetting, is particularly prevalent in Generative Retrieval tasks and is linked to the divergence of model parameters. ORBIT addresses this by monitoring the distance between fine-tuned and original model weights, employing a weight averaging strategy to limit parameter drift when a set threshold is exceeded. Experiments demonstrate that ORBIT effectively preserves text and retrieval performance, outperforming existing continual learning and regularization techniques. AI

IMPACT Preserves general language abilities during task-specific LLM fine-tuning, potentially improving model versatility.
- ORBIT
- LLM
TOOL · arXiv cs.CV · 1d

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images

Researchers have introduced PCSR-Bench, a new diagnostic benchmark designed to evaluate the spatial reasoning capabilities of multimodal large language models (MLLMs) when processing omnidirectional images. The benchmark, comprising over 84,000 question-answer pairs across 2,600 images, reveals a significant gap between foundational perception and advanced reasoning tasks. While models perform moderately well on basic tasks like object counting, their accuracy plummets on more complex reasoning involving viewpoint changes and egocentric distortions. Further experiments using reinforcement learning on a smaller model indicate that spatial reasoning abilities can be improved through targeted optimization, though gains are task-specific and sensitive to reward design. AI

IMPACT Highlights a key bottleneck in current MLLMs, suggesting a need for improved spatial reasoning capabilities for more robust AI applications.
TOOL · arXiv cs.AI · 1d

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

Researchers have developed a novel text-tabular modeling approach to predict the decisions of unfamiliar AI agents during negotiations. The method combines structured game state and dialogue history with representations derived from a frozen LLM, acting as an "LLM-as-Observer." This approach was tested on numerous frontier LLM agents, outperforming baseline methods by improving response-prediction AUC and reducing bargaining offer-prediction error. AI

IMPACT Introduces a method to predict AI agent behavior in negotiations, potentially improving automated transaction systems.
TOOL · arXiv cs.CL · 1d

Pretraining Exposure Explains Popularity Judgments in Large Language Models

Researchers have analyzed how large language models (LLMs) develop preferences for well-known entities, a phenomenon often linked to popularity bias. Using the open OLMo models and their complete Dolma pretraining corpus, they calculated entity exposure across 7.4 trillion tokens. Their findings indicate that LLM popularity judgments align more closely with pretraining exposure than with external signals like Wikipedia pageviews, especially for larger models and in the long tail of less popular entities. This suggests that data exposure during pretraining is the primary driver of popularity bias in LLMs. AI

IMPACT Demonstrates that LLM biases stem primarily from training data exposure, not external popularity metrics.
TOOL · arXiv cs.CL · 1d

Context Convergence Improves Answering Inferential Questions

Researchers have developed a new method called "context convergence" to improve how Large Language Models (LLMs) answer inferential questions. This technique focuses on how effectively sentences in a passage can eliminate incorrect answers, a measure that proves more effective than simple cosine similarity for inferential reasoning. Experiments using the TriviaHG dataset and various LLMs demonstrated that passages constructed with higher convergence sentences significantly boost answer accuracy, suggesting that LLMs prioritize information-rich cues presented earlier in the text. AI

IMPACT Introduces a novel metric for passage construction that enhances LLM accuracy on complex inferential reasoning tasks.
TOOL · arXiv cs.CL · 1d

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

Researchers have explored methods to generalize parameter-efficient fine-tuning (PEFT) techniques beyond single-task applications. Their work investigates training on combined datasets, composing weight matrices of separate PEFT modules, and composing the outputs of these modules during inference. The study found that summing PEFT module outputs was a particularly effective composition method, outperforming or matching other approaches across different large language models and controlled text generation tasks. AI

IMPACT This research could enable more flexible and cost-effective fine-tuning of large language models for multiple attributes simultaneously.
- QLoRA
- PEFT
- LLMs
TOOL · arXiv cs.AI · 1d

Reinforcing VLAs in Task-Agnostic World Models

Researchers have introduced RAW-Dream, a novel approach to adapt Vision-Language-Action (VLA) models for new tasks using reinforcement learning within task-agnostic world models. This method disentangles world model learning from specific task dependencies by leveraging a world model pre-trained on diverse, task-free behaviors and an off-the-shelf Vision-Language Model for reward generation. By relying on generalized physical priors instead of task-specific data, RAW-Dream enables zero-shot adaptation for VLAs, significantly improving scalability and mitigating world model hallucinations through a dual-noise verification mechanism. AI

IMPACT Enables more scalable and efficient adaptation of VLA models to new tasks by relying on generalized physical priors.
SIGNIFICANT · 量子位 (QbitAI) 中文(ZH) · 2d · [3 sources]

Valued at $20 billion! Keling AI reportedly spun off from Kuaishou for separate financing

Kuaishou Technology is planning to spin off its AI video generation business, KeLing AI, which is reportedly seeking to raise $2 billion at a $20 billion valuation. KeLing AI has already achieved an annualized revenue of $500 million, doubling its income since February. The company is in discussions with potential investors, including Tencent, though the deal is not yet finalized. If successful, KeLing AI would become the highest-valued independent video generation model globally. AI

IMPACT This spin-off and substantial funding could accelerate advancements and competition in the AI video generation space.
TOOL · arXiv cs.LG · 1d

In-context learning to predict critical transitions in dynamical systems

Researchers have developed a new in-context learning framework called TipPFN to predict critical transitions in dynamical systems. This method uses a prior-data fitted network to identify when a system is approaching an abrupt and potentially irreversible change. TipPFN was trained on synthetic data and demonstrated state-of-the-art early detection capabilities in unseen tipping regimes, sim-to-real examples, and real-world observations, outperforming existing methods that struggle with limited data or extrapolation. AI

IMPACT Introduces a novel AI approach for early detection of abrupt system changes, potentially improving forecasting in fields ranging from climate science to economics.
- TipPFN
- Benjamin Herdeanu
TOOL · arXiv cs.CV · 1d

Large-Small Model Collaboration for Farmland Semantic Change Detection

Researchers have developed a new framework for farmland semantic change detection, addressing limitations in existing benchmarks and models. The proposed method, called Fine-grained Difference-aware Mamba (FD-Mamba) integrated with Cross-modal Logical Arbitration (CMLA), uses a small, task-specific model alongside a large, frozen vision-language model. This collaboration aims to improve fine-grained monitoring by preserving boundaries, localizing small regions, and suppressing pseudo-changes through textual priors. Experiments on the new HZNU-FCD benchmark and other datasets demonstrate high accuracy and robustness with a relatively small number of trainable parameters. AI

IMPACT Introduces a novel approach to semantic change detection in agriculture, potentially improving land management and monitoring.
- HZNU-FCD
- FD-Mamba
- CMLA
- CLIP
- ChangeCLIP-ViT
- LEVIR-CD
- WHU-CD
TOOL · arXiv cs.CV · 1d

KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks

Researchers have introduced KAN-CL, a new framework for continual learning that addresses catastrophic forgetting by leveraging the unique structure of Kolmogorov-Arnold Networks (KANs). This method applies importance-weighted regularization at a per-knot level, allowing for more precise control over parameter updates across tasks. When tested on classification tasks, KAN-CL significantly reduced forgetting compared to baseline methods while maintaining high accuracy, demonstrating its effectiveness in preserving learned information. AI

IMPACT Introduces a novel regularization technique for continual learning that significantly reduces catastrophic forgetting in neural networks.
TOOL · arXiv cs.LG · 1d

Hypernetworks for Dynamic Feature Selection

Researchers have developed a new machine learning framework called Hyper-DFS for dynamic feature selection, which aims to optimize feature acquisition under budget constraints. This approach utilizes a hypernetwork to generate classifier parameters on demand for specific feature subsets, improving efficiency and generalization. Benchmarks indicate that Hyper-DFS outperforms existing state-of-the-art methods on various datasets, including tabular and image data, and demonstrates superior zero-shot generalization capabilities. AI

IMPACT Introduces a novel framework that improves efficiency and generalization in dynamic feature selection tasks.
- Hyper-DFS
- Javier Fumanal-Idocin
SIGNIFICANT · Mastodon — sigmoid.social · 1d · [2 sources]

SubQ is a new "subquadratic" LLM that can handle context windows of 12 million tokens. 12 million tokens is a massive amount of text, roughly equivalent to 9 mi

A new large language model named SubQ has been announced, boasting the ability to process context windows of up to 12 million tokens. This represents a significant leap in context handling, potentially equivalent to hundreds of novels. The model also claims to offer 52 times faster AI inference speeds, though details on its cost and performance are still emerging. AI

IMPACT Potentially enables new classes of applications requiring deep understanding of long documents or conversations.
- SubQ
- LLM
TOOL · arXiv cs.AI · 1d

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Researchers have introduced Token-level Bregman Preference Optimization (TBPO), a new method for aligning language models using pairwise preferences. Unlike existing approaches that focus on full sequences, TBPO operates at the token level, modeling preferences for individual next-token actions based on the preceding context. This approach aims to improve alignment quality, training stability, and output diversity compared to current methods. AI

IMPACT Introduces a new principled method for aligning language models at the token level, potentially improving training efficiency and output quality.
- TBPO
- DPO
TOOL · arXiv cs.CV · 1d

Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm

Researchers have introduced a new framework called V2V-Zero, which enables visual-to-visual generation by using visual inputs instead of text prompts. This approach allows users to condition generative models with visual specifications like sketches or reference images, bypassing the limitations of text-based descriptions. V2V-Zero achieves performance comparable to text-to-image models without fine-tuning and has been evaluated across various tasks and models, revealing challenges in content generation and structural control. AI

IMPACT Enables more intuitive visual content creation by replacing text prompts with visual inputs, potentially improving user control and expressiveness in generative models.
TOOL · arXiv cs.AI · 1d

How Useful Is Cross-Domain Generalization for Training LLM Monitors?

Researchers explored the effectiveness of cross-domain generalization for training language model monitors. Their findings indicate that training on multiple classification tasks with distinct prompts can partially improve performance on new, unseen domains. However, they identified failure cases where models struggle with entirely new prompts even within familiar data domains. The study also suggests that mixing classification training with general instruction following can mitigate these generalization issues and potentially benefit other classifier and monitoring systems. AI

IMPACT This research could lead to more robust and adaptable LLM monitoring systems, improving their reliability across diverse tasks and domains.
- LLM Monitors
- arXiv
TOOL · arXiv cs.AI · 1d

Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

Researchers have developed a new method for correcting disfluencies in multilingual speech transcripts using large language models (LLMs). The pipeline first identifies disfluent tokens and then uses these signals to fine-tune an LLM for rewriting transcripts into fluent text. A contrastive learning objective was added to penalize the reproduction of disfluent tokens, ensuring grammar and meaning are preserved. Experiments in Hindi, Bengali, and Marathi demonstrated significant improvements over existing baselines, offering a practical solution for speech-driven NLP systems. AI

IMPACT Enhances the accuracy and usability of speech-driven NLP applications by improving transcript quality.
- LLMs
- Hindi
- Bengali
- Marathi
TOOL · arXiv cs.AI · 1d

Pretraining Strategies and Scaling for ECG Foundation Models: A Systematic Study

Researchers have conducted a systematic study on pretraining strategies and scaling for electrocardiography (ECG) foundation models. They evaluated five different self-supervised learning objectives, finding that contrastive predictive coding and JEPA yielded the most transferable representations. The study also demonstrated that increasing pretraining data up to 11 million samples consistently improved performance for most objectives. Furthermore, structured state space models showed superior performance compared to transformers and CNNs, suggesting their inductive biases are key for effective ECG representation learning. AI

IMPACT Suggests structured state space models and contrastive learning are key for effective ECG representation learning, potentially guiding future medical AI development.
RESEARCH · 36氪 (36Kr) 中文(ZH) · 23h

Lantu Motors: Dongfeng Hong Kong increases holdings by 20.192 million H shares

Samsung Electronics is set to begin providing samples of its next-generation CXL 3.1 memory modules (CMM-D) to major server and data center manufacturers in the third quarter. Following customer quality certification, the company plans to initiate mass production preparations, including finalizing production scale and schedules for the fourth quarter. Separately, Google's new Gemini Omni model has been previewed, showcasing its ability to accurately interpret and process video content, including complex academic scenarios. AI

IMPACT Samsung's CXL 3.1 memory module samples will enable faster data processing for AI workloads, while Gemini Omni's video capabilities could enhance AI's understanding of complex real-world scenarios.
TOOL · arXiv cs.AI · 1d

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

Researchers have investigated the parameter placement problem within Low-Rank Adaptation (LoRA) for fine-tuning large language models. Their study reveals that for Supervised Fine-Tuning (SFT), the specific placement of trainable parameters in the LoRA adapter's B matrix does not significantly impact performance. However, under Gradient-based Reinforcement Learning (GRPO), random parameter placement fails to improve the base model, while informed placement recovers standard LoRA accuracy. This difference is attributed to the gradient structure, with SFT gradients being stable and GRPO gradients being near-orthogonal, necessitating a gradient-informed approach for effective learning in the latter. AI

IMPACT Identifies critical parameter placements for effective GRPO fine-tuning, potentially optimizing resource usage for specific LLM adaptation tasks.
TOOL · arXiv cs.LG · 1d

Investigating simple target-covariate relationships for Chronos-2 and TabPFN-TS

A new research paper investigates how well two prominent time series foundation models, Chronos-2 and TabPFN-TS, integrate covariate information. The study found that TabPFN-TS is more effective at capturing simple relationships between covariates and the target variable, particularly for shorter prediction horizons. This suggests that Chronos-2's strong overall performance on benchmarks may not directly indicate superior handling of covariate dependencies. AI

IMPACT This research highlights potential differences in how advanced time series models handle covariate data, which could influence model selection for forecasting tasks.
TOOL · arXiv cs.LG · 1d

A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning

Researchers have introduced UniGraphLM, a novel Unified Graph Language Model designed to enhance the generalization capabilities of existing models. UniGraphLM addresses the challenge of aligning graph-encoded representations across various domains and tasks with the Large Language Model (LLM) token space. This alignment is crucial for creating unified graph tokens that combine the structural modeling of Graph Neural Networks (GNNs) with the generalization of LLMs. AI

IMPACT UniGraphLM aims to improve cross-domain and multi-task performance for graph language models by better aligning GNN representations with LLMs.
RESEARCH · Mastodon — sigmoid.social 한국어(KO) · 13h · [2 sources]

StepFun (@StepFun_ai) Step Image Edit 2 has been released, with a new version of the image editing model now available in real-time. This 3.5B parameter image model ranked first in all categories (overall, faithfulness, and concept) on the KRIS-Bench, an instruction-based image editing benchmark.

StepFun has released Step Image Edit 2, a 3.5 billion parameter image editing model that has achieved top rankings on the KRIS-Bench benchmark across multiple categories. This new version surpasses significantly larger models in performance and offers a rapid response time of 0.7 seconds. Concurrently, Tencent's Hy AI model is now available in preview on gmi_cloud, allowing developers to test its latest features. AI

IMPACT New image editing and generative models are released, with Step Image Edit 2 setting new benchmarks and Tencent offering early access to its Hy3 model for developer testing.
- StepFun
- Step Image Edit 2
- KRIS-Bench
- Tencent
- Hy3
- gmi_cloud
TOOL · arXiv cs.AI · 1d

Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding

Researchers have developed a new decoding method called Dynamic Cognitive Reconciliation Decoding (DCRD) to address conflicts between a large language model's internal knowledge and external context. DCRD uses attention maps to predict potential conflicts and then routes the input to either a greedy decoding path or a context fidelity-based dynamic decoding path. This approach aims to efficiently mitigate outdated or incorrect parametric knowledge while maintaining performance in conflict-free scenarios. Experiments on multiple LLMs and datasets demonstrate that DCRD achieves state-of-the-art results, outperforming existing baselines. AI

IMPACT This new decoding method could improve the reliability and accuracy of LLM outputs by better handling conflicting information.
TOOL · arXiv cs.CV · 1d

SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning

Researchers have developed SyncDPO, a new post-training framework designed to improve temporal synchronization in video-audio joint generation models. This method utilizes Direct Preference Optimization (DPO) to enhance the alignment between audio events and their visual counterparts, addressing limitations of traditional supervised fine-tuning. SyncDPO introduces efficient, on-the-fly negative construction strategies to create preference pairs without extensive sampling, and employs a curriculum learning approach to progressively increase the difficulty of temporal misalignments. AI

IMPACT Enhances temporal alignment in video-audio generation, potentially improving realism and user experience in multimedia AI applications.
- SyncDPO
- Direct Preference Optimization
TOOL · arXiv cs.CL · 1d

Metaphor Is Not All Attention Needs

A new research paper investigates why stylistic reformulations, like poetic language, can bypass safety mechanisms in large language models. The study, using Qwen3-14B as a case study, found that models can distinguish poetic from prose formats but struggle to predict jailbreak success within these formats. The findings suggest that accumulated stylistic irregularities, rather than specific poetic devices or a failure to recognize literary formatting, lead to distinct processing patterns that circumvent safety measures. AI

IMPACT Reveals that stylistic irregularities in prompts, not just lexical triggers, can bypass LLM safety, necessitating new approaches to robustness.
- Qwen3-14B
- Olga Sorokoletova
TOOL · Simon Willison (CA) · 1d

llm 0.32a2

OpenAI has updated its API, moving most reasoning-capable models to a new endpoint that supports interleaved reasoning across tool calls. This change allows users to view summarized reasoning tokens, which are displayed distinctly from standard errors. The new functionality is available for GPT-5 class models and can be toggled on or off using specific flags. AI

IMPACT Enables more transparent and controllable reasoning for advanced AI models, potentially improving agentic workflows.
- OpenAI
- GPT-5
TOOL · arXiv cs.CV · 1d

Cross-Modal-Domain Generalization Through Semantically Aligned Discrete Representations

Researchers have developed a new framework called CoDAAR to improve multimodal learning by creating semantically aligned discrete representations. This approach balances the need for cross-modal generalizability with the preservation of modality-specific structures. CoDAAR utilizes Discrete Temporal Alignment and Cascading Semantic Alignment to achieve state-of-the-art performance on various cross-modal generalization benchmarks, including event classification and video segmentation. AI

IMPACT Introduces a new paradigm for discrete and generalizable multimodal representation learning, potentially improving performance across various AI tasks.
- CoDAAR
- arXiv
TOOL · arXiv cs.CV · 1d

When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy

Researchers have identified a critical flaw in Reinforcement Learning from Human Feedback (RLHF) when applied to flow-matching text-to-image models, where standard policy entropy fails to prevent a collapse in perceptual diversity. They propose a new metric, perceptual entropy, to accurately capture diversity in the perceptual space, addressing the limitations of policy entropy which remains constant despite diversity loss. Experiments demonstrate that strategies based on perceptual entropy significantly improve the quality-diversity trade-off in image generation models. AI

IMPACT Introduces a novel metric to address diversity collapse in AI image generation, potentially improving the quality and variety of outputs.
TOOL · arXiv cs.CV · 1d

UniCustom: Unified Visual Conditioning for Multi-Reference Image Generation

Researchers have introduced UniCustom, a novel framework designed to enhance multi-reference image generation by unifying visual conditioning. This approach integrates semantic and appearance-rich features before encoding, allowing models to better associate subjects with their specific visual details from reference images. UniCustom employs a two-stage training strategy and a slot-wise binding regularization to improve subject consistency and reduce attribute leakage, demonstrating superior performance on relevant benchmarks. AI

IMPACT Enhances multi-reference image generation by improving subject consistency and reducing attribute leakage.
- UniCustom
- arXiv
TOOL · Medium — Claude tag · 1d

Welcome, Mythos.

Mythos, a new AI model, has been introduced, described as "The Day AI Sat on Bedrock." The announcement was made on Medium, with further details available via a link to the platform. AI

IMPACT Introduction of a new AI model, potentially impacting future AI development and applications.
- Mythos
- Medium
TOOL · arXiv cs.CV · 1d

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

Researchers have introduced OmniHumanoid, a new framework for generating videos of humanoids performing actions across different embodiments. This system separates transferable motion learning from embodiment-specific adaptation, allowing it to learn from paired videos across multiple embodiments and then adapt to new ones using unpaired data via lightweight adapters. OmniHumanoid employs a branch-isolated attention design to prevent interference between motion conditioning and embodiment modulation, demonstrating strong performance in motion fidelity and embodiment consistency on both synthetic and real-world benchmarks. AI

IMPACT Enables more scalable data generation for embodied intelligence by facilitating motion transfer across diverse humanoid embodiments.
- OmniHumanoid
- arXiv
TOOL · arXiv cs.CV · 1d

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

Researchers have introduced the What-Where Transformer (WWT), a novel visual backbone designed to better separate object appearance from spatial location. This new architecture uses a slot-based design where tokens represent 'what' an object is and attention maps represent 'where' it is located. The WWT demonstrates emergent capabilities in discovering multiple objects directly from attention maps, even when trained with standard classification supervision, and shows improved performance on zero-shot object discovery and weakly supervised semantic segmentation tasks. AI

IMPACT Introduces a new architectural bias for visual models that could improve localization tasks and emergent object discovery.
TOOL · arXiv cs.CV · 1d

Spectral Vision Transformer for Efficient Tokenization with Limited Data

Researchers have developed a new Spectral Vision Transformer (SVT) architecture designed for efficient tokenization, particularly in scenarios with limited data such as medical imaging. The SVT leverages spectral projection, offering theoretical advantages like spatial invariance and improved signal-to-noise ratio, which result in reduced computational complexity compared to standard spatial vision transformers. Experiments across simulated, public, and clinical datasets demonstrate that the SVT achieves comparable or better performance with fewer parameters than various other models, including compact and standard vision transformers, CNNs with attention, and MLPs. AI

IMPACT Introduces a more efficient model architecture for image tokenization, potentially improving performance in data-scarce domains like medical imaging.
- Spectral Vision Transformer
- Alexandra Roberts
TOOL · arXiv cs.CV · 1d

L2P: Unlocking Latent Potential for Pixel Generation

Researchers have developed a new framework called Latent-to-Pixel (L2P) that efficiently transfers knowledge from pre-trained Latent Diffusion Models (LDMs) to create powerful pixel-space models. This method avoids the need for extensive computational resources and real-world data by freezing most of the source LDM and training only shallow layers for the latent-to-pixel transformation. L2P utilizes synthetic images generated by LDMs as its training corpus, enabling rapid convergence with minimal hardware. The approach also eliminates the VAE bottleneck, allowing for native generation of ultra-high resolution images. AI

IMPACT Enables efficient creation of high-resolution pixel-space models by leveraging existing latent diffusion models, reducing training costs.
TOOL · arXiv cs.CV · 1d

RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

Researchers have developed RealDiffusion, a new framework for generating coherent multi-character storybooks using diffusion models. The system employs heat diffusion as a prior to average features and stabilize character identity across sequential frames. Additionally, a region-aware stochastic process introduces controlled perturbations to maintain narrative dynamism and scene evolution. This approach aims to resolve the trade-off between character coherence and story progression, outperforming existing methods in experiments. AI

IMPACT Introduces a novel framework for improving coherence in AI-generated sequential media, potentially impacting creative content generation.
- RealDiffusion
- arXiv
RESEARCH · arXiv cs.CL · 2d · [2 sources]

Infinite Mask Diffusion for Few-Step Distillation

Researchers have developed new techniques for improving the efficiency of training large language models (LLMs). One method, Step Rejection Fine-Tuning (SRFT), leverages unsuccessful training trajectories by assessing the correctness of each step, allowing models to learn from errors without repeating them. This approach improved resolution rates on SWE-bench tasks by 3.7%. Another development, Infinite Mask Diffusion Model (IMDM), addresses factorization errors in Masked Diffusion Models (MDMs) by introducing a stochastic infinite-state mask. IMDM demonstrates superior few-step generation capabilities and surpasses existing methods on LM1B and OpenWebText datasets when combined with distillation. AI

IMPACT These new training techniques could lead to more capable and efficient LLMs, improving performance on complex tasks and reducing training costs.
TOOL · Medium — Anthropic tag · 1d

Anthropic built an AI so powerful they refused to release it.

Anthropic developed an AI model with advanced capabilities that they chose not to release due to safety concerns. This AI demonstrated its power by discovering a 27-year-old security vulnerability within the OpenBSD operating system. The decision to withhold the model highlights Anthropic's commitment to responsible AI development and deployment. AI

IMPACT Highlights the potential for advanced AI to uncover security vulnerabilities, influencing AI safety and responsible release strategies.
- Anthropic
- OpenBSD
SIGNIFICANT · dev.to — Claude Code tag · 1d · [4 sources]

Cowork Just One-Shotted a Flight. Anthropic's Shell Play.

Anthropic has released Claude Agent View as a research preview, aiming to enhance its Claude Code product by providing a unified interface for managing multiple coding sessions. This release, coupled with improvements in the Claude Cowork tool, signifies Anthropic's strategy to capture the 'shell layer' of agentic workflows, not just the core AI engine. The enhanced Cowork, powered by Opus 4.7, demonstrated a successful end-to-end flight and hotel booking, indicating improved reliability for agentic tasks. AI

IMPACT Anthropic's push into the 'shell layer' with Agent View and improved Cowork could accelerate enterprise adoption of agentic workflows.
TOOL · arXiv cs.CV · 1d

Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

Researchers have introduced a new framework for guided depth super-resolution that utilizes an Interactive State Space Model. This approach aims to efficiently create high-resolution depth maps from low-resolution inputs, using RGB images as guidance. The model incorporates a cross-modal local scanning mechanism to enable detailed semantic interactions between RGB and depth features, leveraging the Mamba architecture for linear complexity. Experiments indicate that this method achieves competitive results compared to existing state-of-the-art techniques. AI

IMPACT Introduces a novel approach for depth super-resolution, potentially improving efficiency and accuracy in computer vision tasks.
- Interactive State Space Model
- Mamba architecture
TOOL · arXiv cs.CL · 1d

StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

Researchers have developed StepCodeReasoner, a new framework designed to improve code reasoning by focusing on intermediate execution states rather than just final outputs. This approach uses structured print statements to create execution-trace anchors, training models to predict runtime states at each step. The framework also incorporates a novel reinforcement learning algorithm, Bi-Level GRPO, for better credit assignment across and within execution paths. Experiments show that StepCodeReasoner achieves state-of-the-art performance on code reasoning benchmarks, with its 7B model surpassing models like GPT-4o and a previous CodeReasoner baseline. AI

IMPACT This new method for code reasoning could lead to more reliable AI code generation and debugging tools.
TOOL · arXiv cs.CL · 1d

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

Researchers have introduced Yoked Feature Preference Optimization (YFPO), a novel framework designed to enhance the mathematical reasoning capabilities of large language models. Unlike existing methods that rely solely on external preference data, YFPO incorporates internal neuron activation patterns to guide the optimization process. By identifying neurons associated with mathematical concepts and logical reasoning, YFPO constructs an auxiliary reward signal that complements external supervision. Preliminary experiments on a small-scale model using the GSM8K benchmark indicate that this neuron-guided approach can potentially improve reasoning performance and offers a more interpretable path for model fine-tuning. AI

IMPACT Introduces a novel neuron-guided approach to LLM fine-tuning, potentially improving mathematical reasoning and interpretability.
TOOL · arXiv cs.CL · 1d

More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing

Researchers have developed a theoretical framework to understand Lifelong Normalization (LN), a key strategy for continuously updating Large Language Models without causing catastrophic forgetting or model collapse. Their analysis reveals that LN creates a self-reinforcing stability loop, ensuring parameter updates are orthogonal and bounded, which directly combats forgetting. Building on this, they introduce StableEdit, a method that enhances this stability through an explicit warm-up stage and full whitening, demonstrating improved long-horizon stability with minimal overhead. AI

IMPACT Provides theoretical grounding and a new method for stable, continuous LLM updates, potentially improving model maintainability.
TOOL · 雷峰网 (Leiphone) 中文(ZH) · 1d

OpenAI's former CTO's startup model debuts, clashes with MiniMax

Former OpenAI researcher Lilian Weng's new venture, Thinking Machines Lab (TML), has unveiled a vision for full-duplex, real-time conversational AI. This concept closely mirrors the capabilities demonstrated by China's MiniCPM-o 4.5, which was open-sourced by company "面壁智能" (OpenBMB) three months prior. Both TML and "面壁智能" aim to break away from traditional turn-based AI interactions, proposing a "full-duplex" or "time-aligned micro-turn" framework that processes interleaved multimodal information streams. AI

IMPACT Confirms a shift towards full-duplex, real-time conversational AI, potentially accelerating the development of more natural human-AI interactions.
RESEARCH · Hugging Face Daily Papers · 2d · [2 sources]

Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training

Researchers have developed Transcoda, a novel system for Optical Music Recognition (OMR) that can transcribe sheet music into a textual format. The system addresses the scarcity of annotated datasets by employing an advanced synthetic data generation pipeline and a grammar-based decoding approach. Transcoda, with its compact 59M-parameter model, achieves state-of-the-art performance, outperforming larger models and significantly reducing error rates on historical music scans. AI

IMPACT Advances OMR capabilities, potentially enabling new tools for music analysis and digitization.
RESEARCH · arXiv cs.AI · 2d · [2 sources]

Active Tabular Augmentation via Policy-Guided Diffusion Inpainting

Researchers have developed a new method called TAP (Tabular Augmentation Policy) to improve the generation of synthetic tabular data, particularly in scenarios with limited real data. This approach addresses a gap where existing methods prioritize data distribution fidelity over actual utility for downstream models. TAP combines diffusion inpainting with a policy that guides the generation process towards samples that demonstrably reduce evaluation loss, leading to significant accuracy improvements on classification and regression tasks. AI

IMPACT Improves synthetic data generation for AI models in data-scarce environments, potentially boosting performance on critical tasks.
- TAP
- diffusion inpainting