Brief

last 24h

[50/932] 185 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV · 1d

Optimizing 4D Wires for Sparse 3D Abstraction

Researchers have developed a novel framework for 3D geometric abstraction by utilizing a single, continuous 4D wire. This approach, parameterized as a B-spline with spatial coordinates and variable width, represents complex volumetric forms with global topological coherence, unlike methods that use collections of independent curve segments. The framework transforms 3D sketching into a global routing problem, enhancing structural coherence and aesthetics through a differentiable rendering pipeline that supports gradient-based optimization with signals like Score Distillation Sampling (SDS) and CLIP. Applications include image-to-3D abstraction and multi-view wire art generation, yielding results with higher semantic fidelity and improved structural coherence. AI

IMPACT Introduces a novel method for 3D abstraction that could improve generative modeling and content creation pipelines.
TOOL · arXiv cs.CV · 1d

H2G: Hierarchy-Aware Hyperbolic Grouping for 3D Scenes

Researchers have introduced H2G, a novel method for hierarchical 3D grouping that leverages hyperbolic geometry to represent scene structures. This approach aims to group elements in 3D scenes from fine object parts to complete objects without requiring semantic labels. H2G derives hierarchical supervision from 2D foundation models and embeds this structure within a hyperbolic feature field, enabling multi-level grouping within a single representation. AI

IMPACT Introduces a new geometric approach for scene understanding, potentially improving 3D data processing and analysis in AI applications.
- H2G
- arXiv
TOOL · arXiv cs.CL · 1d

Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging

Researchers have developed a new method to improve proactive dialogue systems that aim to guide conversations toward specific targets. The approach introduces "conversational scenario modeling" by incorporating user profiles and domain knowledge to dynamically influence system responses. Additionally, "intent-keyword bridging" is used to predict keywords for upcoming turns, offering more flexible guidance. Evaluations show these techniques significantly enhance proactivity, fluency, and informativeness in guided conversations. AI

IMPACT Enhances AI's ability to steer conversations toward specific goals, improving user experience in targeted dialogue applications.
TOOL · arXiv cs.CV · 1d

What Does It Mean for a Medical AI System to Be Right?

A new paper explores the complex definition of "correctness" for AI systems in medical contexts, using the diagnosis of multiple myeloma as a case study. It argues that accuracy is not solely determined by benchmark performance but also by factors like the quality of labeled data, model interpretability, clinically relevant metrics, and accountability in human-AI collaboration. The research highlights challenges such as unstable ground truth labels, opaque AI predictions, inadequate standard metrics, and the risk of automation bias in clinical settings. AI

IMPACT This research prompts a deeper consideration of how AI performance is measured in critical fields like medicine, moving beyond simple accuracy to encompass data quality, interpretability, and accountability.
- AI
- multiple myeloma
TOOL · arXiv cs.CV · 1d

Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters

Researchers have introduced Chronicles-OCR, a new benchmark designed to test the cross-temporal perception abilities of Vision Large Language Models (VLLMs) on Chinese characters. This benchmark covers the complete evolutionary trajectory of Chinese scripts, from ancient tortoise shells to modern calligraphy, addressing the lack of datasets that capture systematic visual shifts over thousands of years. Chronicles-OCR includes 2,800 balanced images and proposes a novel annotation paradigm to handle drastic morphological variations, offering four tasks to evaluate VLLMs' limitations in historical text perception. AI

IMPACT Provides a new evaluation tool for VLLMs to assess their robustness on historical scripts, potentially improving AI's utility in digital humanities.
RESEARCH · Hugging Face Daily Papers · 2d · [2 sources]

BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation

Researchers have developed BabelDOC, a new framework designed to improve PDF translation by preserving document layout. This system uses an intermediate representation to decouple visual metadata from semantic content, allowing for better handling of terminology, cross-page context, and formulas. BabelDOC's adaptive typesetting engine then re-anchors translated text to the original layout, showing improvements in fidelity, aesthetics, and consistency. AI

IMPACT Improves cross-lingual communication for visually rich documents, potentially aiding global collaboration and information access.
- BabelDOC
- PDF
- Hugging Face
- arXiv
RESEARCH · Hugging Face Daily Papers · 2d · [2 sources]

Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training

Researchers have developed Transcoda, a novel system for Optical Music Recognition (OMR) that can transcribe sheet music into a textual format. The system addresses the scarcity of annotated datasets by employing an advanced synthetic data generation pipeline and a grammar-based decoding approach. Transcoda, with its compact 59M-parameter model, achieves state-of-the-art performance, outperforming larger models and significantly reducing error rates on historical music scans. AI

IMPACT Advances OMR capabilities, potentially enabling new tools for music analysis and digitization.
SIGNIFICANT · dev.to — MCP tag · 2d · [2 sources]

All Data and AI Weekly #241-11 May 2026

Snowflake has launched a public preview for its multimodal video and audio analysis capabilities, allowing users to extract insights from rich media directly within the platform. This new feature supports models like Claude 4 Opus and Gemini 3.1 Pro for analyzing various video and audio formats. Additionally, Apache Iceberg v3 has reached general availability with enhanced data type support, and Snowflake is preparing for its upcoming Summit 2026, featuring speakers like Anthropic's Daniela Amodei. AI

IMPACT Enables direct analysis of video and audio data within Snowflake, potentially reducing data movement and accelerating AI-driven insights from rich media.
TOOL · arXiv cs.CV · 1d

Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models

Researchers have developed ClipSum, a new framework for summarizing instructional videos by leveraging CLIP's vision-language features. This approach uses semantically aligned visual features from CLIP, trained on a vast dataset of image-text pairs, to bridge the gap between visual understanding and language generation. ClipSum demonstrated superior performance on the YouCook2 dataset compared to traditional methods, achieving a higher ROUGE-1 score with significantly lower dimensionality, indicating the importance of semantic alignment over raw feature capacity. AI

IMPACT Introduces a novel approach to video summarization by enhancing semantic alignment between visual and language modalities.
- ClipSum
- CLIP
- YouCook2
TOOL · arXiv cs.CV · 2d

RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

Researchers have developed RealDiffusion, a new framework for generating coherent multi-character storybooks using diffusion models. The system employs heat diffusion as a prior to average features and stabilize character identity across sequential frames. Additionally, a region-aware stochastic process introduces controlled perturbations to maintain narrative dynamism and scene evolution. This approach aims to resolve the trade-off between character coherence and story progression, outperforming existing methods in experiments. AI

IMPACT Introduces a novel framework for improving coherence in AI-generated sequential media, potentially impacting creative content generation.
- RealDiffusion
- arXiv
RESEARCH · arXiv cs.CL · 2d · [3 sources]

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Researchers are exploring new methods to optimize Sparse Mixture-of-Experts (SMoE) models, which are crucial for scaling large language models efficiently. One paper reveals a geometric coupling between routers and experts, suggesting that matched directions accumulate similar routed token histories and that auxiliary load-balancing losses can disrupt this structure. Another study systematically analyzed over 2,000 pretraining runs to optimize design choices like expert count and granularity, finding that these factors have a greater impact than others such as shared experts or load-balancing mechanisms. A third paper introduces DECO, an SMoE architecture designed for end-side devices that matches dense Transformer performance with significantly fewer active parameters and offers hardware acceleration. AI

IMPACT New research explores architectural optimizations for Mixture-of-Experts models, potentially improving efficiency and performance for large language models.
RESEARCH · Hugging Face Daily Papers · 2d · [3 sources]

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Researchers have introduced MulTaBench, a new benchmark designed to evaluate multimodal tabular learning. This benchmark comprises 40 datasets that combine tabular data with either text or images, focusing on tasks where these modalities offer complementary predictive signals. The goal is to encourage the development of foundation models that can effectively integrate and leverage diverse data types for improved performance. AI

IMPACT Establishes a new standard for evaluating multimodal tabular models, potentially driving advancements in foundation models for diverse data integration.
- MulTaBench
- Hugging Face
RESEARCH · arXiv cs.LG · 3d · [2 sources]

Joint sparse coding and temporal dynamics support context reconfiguration

Researchers have identified joint sparse coding and temporal dynamics as key mechanisms for how the brain reconfigures neural representations to adapt to new contexts without losing prior knowledge. This balance is crucial for lifelong learning in dynamic environments and has implications for artificial intelligence systems struggling with catastrophic forgetting. The study found that sparsity in representations reduces interference between contexts, while temporal dynamics enhance context separation over time, leading to more stable adaptation. AI

IMPACT Identifies core mechanisms for stable lifelong learning, potentially guiding the development of more robust AI systems.
- mouse medial prefrontal cortex
- spiking neural networks
RESEARCH · arXiv cs.AI · 3d · [2 sources]

MTA-RL: Robust Urban Driving via Multi-modal Transformer-based 3D Affordances and Reinforcement Learning

Researchers have developed MTA-RL, a novel framework that integrates multi-modal transformer-based 3D affordances with reinforcement learning for robust urban autonomous driving. This approach fuses RGB images and LiDAR data to predict explicit, geometry-aware affordances, creating a structured observation space for the RL policy. Evaluations in the CARLA simulator demonstrate MTA-RL's superior performance in sample efficiency, stability, and zero-shot generalization compared to existing baselines. AI

IMPACT Introduces a novel approach to bridge perception and control for autonomous driving, improving sample efficiency and generalization.
RESEARCH · arXiv cs.AI · 3d · [2 sources]

When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications

Researchers have developed a new security framework to combat SQL injection attacks in applications that use large language models (LLMs) to interact with databases. These attacks exploit the translation process from natural language prompts to SQL queries, allowing malicious users to generate unsafe commands. The proposed multi-layered system includes prompt sanitization, anomaly detection, and signature-based controls to identify and block these threats, aiming to enhance the security of LLM-driven database applications. AI

IMPACT Enhances security for LLM-powered database interfaces, enabling safer adoption of natural language querying.
RESEARCH · arXiv cs.AI · 3d · [2 sources]

Explainability of Recurrent Neural Networks for Enhancing P300-based Brain-Computer Interfaces

Researchers have developed a new Post-Recurrent Module (PRM) to enhance the explainability and performance of Recurrent Neural Networks (RNNs) used in P300-based Brain-Computer Interfaces (BCIs). This module improves classification accuracy by 9% over existing methods while also providing insights into the spatio-temporal patterns of EEG data that contribute to model decisions. The framework aims to make EEG-based models more transparent and can be applied to various neurological tasks beyond P300 detection. AI

IMPACT Enhances the accuracy and interpretability of AI models for brain-computer interfaces, potentially accelerating their adoption in healthcare and assistive technologies.
RESEARCH · arXiv cs.AI · 3d · [4 sources]

Think as Needed: Geometry-Driven Adaptive Perception for Autonomous Driving

Researchers are developing advanced AI techniques to improve autonomous driving systems. One approach, CaAD, focuses on causality-aware end-to-end modeling to better predict vehicle and agent interactions, showing strong performance on benchmarks. Another method, Enhanced HOPE, uses adaptive perception that adjusts computation based on scene complexity and incorporates temporal memory to track occluded objects. Additionally, generative AI is being used to create diverse synthetic pedestrian data for training more robust perception models, highlighting the benefits and limitations of cross-domain training. Finally, a novel attack paradigm leverages view-induced trajectory manipulation, using static camouflage to trick autonomous vehicles into inferring incorrect paths and triggering unnecessary braking. AI

IMPACT New AI methodologies promise to enhance the safety, robustness, and efficiency of autonomous driving systems.
- CaAD
- Enhanced HOPE
- StyleGAN2
- nuScenes
- CARLA
- Bench2Drive
- NAVSIM
RESEARCH · arXiv cs.LG · 3d · [3 sources]

The Value of Mechanistic Priors in Sequential Decision Making

Two new arXiv papers explore theoretical frameworks for sequential decision-making in machine learning. The first paper introduces a "mechanistic information" metric to quantify the value of hybrid models that combine physical priors with learned residuals, demonstrating sample-efficiency gains in simulations and cautioning against LLM priors in safety-critical applications. The second paper develops a sequential supersample framework to establish information-theoretic generalization bounds for adaptive data settings, applicable to online learning, streaming active learning, and bandits. AI

IMPACT These papers offer theoretical advancements in understanding and bounding the performance of sequential decision-making models, potentially impacting the design of future AI systems in data-scarce or safety-critical domains.
- arXiv
- LLM
TOOL · arXiv stat.ML · 1d · [2 sources]

Enhancing a Risk Model by Adding Transient Statistical Factors

Researchers have developed a new method to enhance existing financial risk models by incorporating transient statistical factors. This approach uses maximum likelihood estimation to refine models and add new factors, improving the capture of changing market regimes and temporary influences. The methodology is designed to handle missing asset return data, making it practical for real-world equity datasets, and has been demonstrated on the Barra short-term US risk model. AI

IMPACT Enhances financial modeling techniques, potentially improving portfolio construction and risk evaluation.
- Barra short-term US risk model
- arXiv
TOOL · arXiv cs.CV · 2d

Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

Researchers have introduced a new framework for guided depth super-resolution that utilizes an Interactive State Space Model. This approach aims to efficiently create high-resolution depth maps from low-resolution inputs, using RGB images as guidance. The model incorporates a cross-modal local scanning mechanism to enable detailed semantic interactions between RGB and depth features, leveraging the Mamba architecture for linear complexity. Experiments indicate that this method achieves competitive results compared to existing state-of-the-art techniques. AI

IMPACT Introduces a novel approach for depth super-resolution, potentially improving efficiency and accuracy in computer vision tasks.
- Interactive State Space Model
- Mamba architecture
TOOL · arXiv cs.CL · 2d

StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

Researchers have developed StepCodeReasoner, a new framework designed to improve code reasoning by focusing on intermediate execution states rather than just final outputs. This approach uses structured print statements to create execution-trace anchors, training models to predict runtime states at each step. The framework also incorporates a novel reinforcement learning algorithm, Bi-Level GRPO, for better credit assignment across and within execution paths. Experiments show that StepCodeReasoner achieves state-of-the-art performance on code reasoning benchmarks, with its 7B model surpassing models like GPT-4o and a previous CodeReasoner baseline. AI

IMPACT This new method for code reasoning could lead to more reliable AI code generation and debugging tools.
TOOL · arXiv cs.CV · 2d

Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization

Researchers have developed a new framework called Vector Scaffolding to improve the process of converting raster images into editable vector graphics. This method addresses issues like topology collapse and redundant "polygon soup" by employing a hierarchical optimization approach instead of a flat one. Vector Scaffolding stabilizes learning dynamics and progressively densifies vector primitives, leading to faster optimization and better image quality compared to existing methods. AI

IMPACT Introduces a novel hierarchical optimization framework for image vectorization, potentially improving the quality and efficiency of converting raster images to editable vector formats.
- Vector Scaffolding
- arXiv
TOOL · arXiv cs.CL · 2d

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

Researchers have introduced Yoked Feature Preference Optimization (YFPO), a novel framework designed to enhance the mathematical reasoning capabilities of large language models. Unlike existing methods that rely solely on external preference data, YFPO incorporates internal neuron activation patterns to guide the optimization process. By identifying neurons associated with mathematical concepts and logical reasoning, YFPO constructs an auxiliary reward signal that complements external supervision. Preliminary experiments on a small-scale model using the GSM8K benchmark indicate that this neuron-guided approach can potentially improve reasoning performance and offers a more interpretable path for model fine-tuning. AI

IMPACT Introduces a novel neuron-guided approach to LLM fine-tuning, potentially improving mathematical reasoning and interpretability.
TOOL · arXiv cs.CV · 2d

Beyond Point-wise Neural Collapse: A Topology-Aware Hierarchical Classifier for Class-Incremental Learning

Researchers have developed a novel classifier called Hierarchical-Cluster SOINN (HC-SOINN) to improve Class-Incremental Learning (CIL). This new approach addresses the limitations of traditional Nearest Class Mean (NCM) classifiers by capturing the topological structure of class manifolds rather than assuming single points. The HC-SOINN classifier is further enhanced by the Structure-Topology Alignment via Residuals (STAR) method, which actively adapts the learned topology to complex feature drift. Integrating HC-SOINN into existing CIL methods has shown consistent performance improvements. AI

IMPACT Introduces a novel classifier that improves performance in class-incremental learning by better handling complex data topologies.
RESEARCH · arXiv cs.CV · 2d · [2 sources]

Personal Visual Context Learning in Large Multimodal Models

Two new benchmarks, MMCL-Bench and Personal-VCL-Bench, have been introduced to evaluate the multimodal context learning capabilities of large language models. MMCL-Bench focuses on learning from visual rules, procedures, and evidence, while Personal-VCL-Bench assesses the ability of models to utilize user-specific visual context for personalized queries. Both benchmarks reveal significant limitations in current frontier multimodal models, indicating a substantial gap in their ability to effectively extract, reason over, and apply visual information. AI

IMPACT Highlights a critical bottleneck in current multimodal models, suggesting future research directions for personalized AI assistants.
RESEARCH · arXiv cs.CV · 2d · [2 sources]

BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

Two new benchmarks, CADBench and BenchCAD, have been released to evaluate AI's ability to generate Computer-Aided Design (CAD) programs from various inputs. These benchmarks aim to standardize the assessment of multimodal AI systems in tasks like reconstructing editable CAD programs from images or 3D models. Early evaluations show that while specialized models perform better on mesh-to-CAD tasks, current general-purpose vision-language models struggle with complex geometric details and industrial design parameters, indicating a gap in their industrial readiness. AI

IMPACT Establishes new evaluation standards for AI in CAD, highlighting current limitations in generating industrially relevant parametric programs.
- CADBench
- BenchCAD
- AI
- DeepCAD
- Fusion 360
- ABC
- MCB
- Objaverse
- CadQuery
RESEARCH · Hugging Face Daily Papers · 2d · [2 sources]

Muown: Row-Norm Control for Muon Optimization

Researchers have developed Muown, a novel optimization method designed to improve the training of large language models. Muown addresses issues with the Muon optimizer, specifically the upward drift of spectral norms in weight matrices during training. By treating row-magnitude vectors as explicit variables, Muown enhances perplexity and learning rate stability across various model scales, outperforming existing optimizers like AdamW and Lion. AI

IMPACT Improves LLM training efficiency and stability, potentially enabling larger models and faster development cycles.
- Muown
- Muon
- AdamW
- Lion
- Hugging Face
- arXiv
- FineWeb-Edu
RESEARCH · Hugging Face Daily Papers · 2d · [2 sources]

Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization

Researchers have developed a new method called Untargeted Jailbreak via Entropy Maximization (UJEM-KL) to bypass safety measures in vision-language models (VLMs). This technique focuses on manipulating high-entropy tokens during decoding to flip refusal outcomes, rather than relying on fixed patterns. UJEM-KL demonstrates improved transferability across different VLMs and remains effective against common defenses, suggesting that previous limitations in multimodal jailbreaks were due to overly constrained optimization objectives. AI

IMPACT This research highlights a novel vulnerability in vision-language models, potentially impacting the security and reliability of AI systems.
RESEARCH · arXiv stat.ML · 3d · [6 sources]

One-Shot Generative Flows: Existence and Obstructions

Researchers are exploring new methods for generative modeling, focusing on Wasserstein gradient flows to improve efficiency and sample quality. One approach, W-Flow, achieves state-of-the-art one-step generation for images with significantly faster sampling times compared to traditional diffusion models. Other papers investigate optimizing outputs from generative models and the theoretical underpinnings of score-difference flows, linking different generative modeling techniques and identifying potential obstructions for certain flow types. AI

IMPACT Advances in Wasserstein gradient flows and one-step generation promise faster, more efficient AI models for complex tasks.
TOOL · arXiv cs.CV · 2d

$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement

Researchers have introduced "$h$-control," a novel method for training-free camera control in video generation models. This approach enhances existing flow-matching techniques by incorporating block-conditional pseudo-Gibbs refinement within the sampling process. The method aims to improve the balance between adherence to camera trajectories and overall visual quality, outperforming previous methods on benchmarks like RealEstate10K and DAVIS. AI

IMPACT Introduces a new method for improved camera control in video generation, potentially enhancing realism and trajectory adherence.
RESEARCH · arXiv cs.AI · 3d · [2 sources]

NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding

Researchers have developed NCO, a new decoding strategy designed to enhance control over Large Language Model (LLM) outputs. This plug-in addresses the challenge of preventing multiple forbidden patterns, such as profanity or personally identifiable information (PII), from appearing in generated text. NCO achieves this by performing efficient online pattern matching, avoiding the state explosion issues common with converting multiple constraints into a single automaton. The strategy is compatible with standard inference methods and has demonstrated effectiveness in practical applications. AI

IMPACT Provides a more efficient method for LLMs to avoid generating harmful or sensitive content.
TOOL · arXiv cs.CL · 2d

Concordance Comparison as a Means of Assembling Local Grammars

Researchers have developed a new method for Named Entity Recognition (NER) specifically for identifying person names. This technique involves comparing concordances from different local grammars to highlight differences, which aids in selecting the most effective grammar. In a case study on Portuguese texts, this approach improved the F-Measure for person name extraction by 6 points, reaching 76.86 and surpassing the previous state-of-the-art. AI

IMPACT Introduces a novel technique for improving Named Entity Recognition, potentially enhancing information extraction systems.
TOOL · arXiv cs.CL · 2d

More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing

Researchers have developed a theoretical framework to understand Lifelong Normalization (LN), a key strategy for continuously updating Large Language Models without causing catastrophic forgetting or model collapse. Their analysis reveals that LN creates a self-reinforcing stability loop, ensuring parameter updates are orthogonal and bounded, which directly combats forgetting. Building on this, they introduce StableEdit, a method that enhances this stability through an explicit warm-up stage and full whitening, demonstrating improved long-horizon stability with minimal overhead. AI

IMPACT Provides theoretical grounding and a new method for stable, continuous LLM updates, potentially improving model maintainability.
RESEARCH · arXiv cs.CV · 2d · [2 sources]

MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection

Two new research papers challenge the current direction of video anomaly detection (VAD). The first paper argues that the field's focus on general models and multi-modal large language models (MLLMs) has shifted focus away from scene-specific, context-dependent anomaly identification. The second paper introduces MMVIAD, a new dataset and benchmark for industrial VAD, and presents a model called VISTA that improves performance on multi-task evaluation, outperforming GPT-5.4. AI

IMPACT Challenges current LLM-based approaches in video anomaly detection, potentially redirecting research towards more scene-specific and explainable methods.
RESEARCH · arXiv cs.CV · 2d · [2 sources]

Qwen-Image-2.0 Technical Report

Alibaba's Qwen team has released technical reports for two new image models: Qwen-Image-VAE-2.0 and Qwen-Image-2.0. Qwen-Image-VAE-2.0 is a high-compression Variational Autoencoder designed for improved reconstruction fidelity and diffusability, incorporating architectural enhancements and large-scale training. Qwen-Image-2.0 is an omni-capable image generation model that unifies high-fidelity generation and precise editing within a single framework, addressing limitations in text rendering, multilingual fidelity, and photorealism. AI

IMPACT These models advance image generation and editing capabilities, particularly for text-rich content and high-compression scenarios.
RESEARCH · arXiv cs.AI · 3d · [2 sources]

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

Researchers have developed MAGE, a framework that uses a co-evolutionary knowledge graph to manage self-evolving language model agents. This approach externalizes the agent's knowledge into a graph, allowing it to learn and adapt without altering its core model. The framework has demonstrated strong performance across nine diverse benchmarks, outperforming existing methods that rely on natural language feedback or implicit reinforcement signals. AI

IMPACT Introduces a novel method for stable AI agent evolution, potentially improving performance on complex reasoning and navigation tasks.
RESEARCH · arXiv cs.AI Română(RO) · 3d · [2 sources]

From Single-Step Edit Response to Multi-Step Molecular Optimization

Researchers have developed new AI frameworks for molecular optimization, aiming to improve molecule properties while maintaining structural similarity. One approach, FORGE, uses a two-stage process that ranks and generates fragment replacements, outperforming larger models by leveraging explicit fragment-level supervision. Another method, SMER-Opt, employs a response-oriented discrete edit strategy with a single-step predictor and a multi-step planner to guide optimization trajectories through guided tree search. AI

IMPACT These new AI methods offer more efficient and accurate ways to design molecules with desired properties, potentially accelerating drug discovery and materials science.
- FORGE
- SMER-Opt
- arXiv
TOOL · arXiv cs.CL · 2d

ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems

Researchers have introduced ROMER, a post-training calibration framework designed to enhance the robustness of Mixture-of-Experts (MoE) Large Language Models (LLMs) when deployed on analog Compute-in-Memory (CIM) systems. This framework addresses hardware imperfections in CIM by replacing underutilized experts and recalibrating router decisions to maintain load balance and optimal routing under noisy conditions. Experiments show ROMER significantly reduces perplexity for models like DeepSeek-MoE, Qwen-MoE, and OLMoE when subjected to real-chip noise. AI

IMPACT Improves the viability of deploying LLMs on energy-efficient analog hardware by mitigating noise-induced performance degradation.
- ROMER
- LLMs
- MoE
- CIM
- DeepSeek-MoE
- Qwen-MoE
- OLMoE
TOOL · arXiv cs.CL · 2d

Choosing features for classifying multiword expressions

A new research paper proposes an improved method for classifying multiword expressions (MWEs), which are challenging linguistic units. The study focuses on selecting the most effective features to ensure reliable and computationally useful classifications across various languages. The proposed classification aims to enhance the suitability of MWE analysis for diverse linguistic applications. AI

IMPACT Introduces a refined approach to linguistic feature selection, potentially improving NLP model performance on tasks involving complex word structures.
- arXiv
- Multiword expressions
RESEARCH · arXiv stat.ML · 2d · [2 sources]

A Stable Distance Persistence Homology for Dynamic Bayesian Network Clustering

Researchers have developed a new topological method for analyzing dynamic Bayesian networks (DBNs). This approach associates a time-varying graph with each DBN, highlighting strong dependencies between variables. By applying persistent homology, the method generates a barcode that tracks the evolution of these dependency structures over time, offering a stable and noise-resistant summary. AI

IMPACT Introduces a novel analytical framework for time-series probabilistic models, potentially improving the understanding of complex evolving systems.
- Dynamic Bayesian Networks
- Kim and Mémoli
RESEARCH · Hugging Face Daily Papers · 2d · [2 sources]

CausalGS: Learning Physical Causality of 3D Dynamic Scenes with Gaussian Representations

Researchers have developed CausalGS, a new framework capable of learning the physical causality of 3D dynamic scenes directly from multi-view videos. This approach avoids the need for explicit physical priors or high-quality geometry reconstruction, instead inferring initial velocities and intrinsic material properties. The system then uses this inferred information within a differentiable physics simulator to achieve state-of-the-art performance in long-term future frame extrapolation and novel view interpolation. AI

IMPACT Enables learning complex physical interactions and causal relationships in 3D scenes solely from visual observations, advancing AI's understanding of the physical world.
RESEARCH · arXiv cs.LG · 3d · [2 sources]

Anchor-guided Hypergraph Condensation with Dual-level Discrimination

Two new research papers explore advancements in hypergraph neural networks (HGNNs), a type of AI model designed to learn from complex, higher-order interactions. The first paper introduces the "WidthWall" concept, establishing a fundamental hierarchy of expressivity for HGNNs based on their ability to detect and count structural patterns. The second paper presents "Anchor-guided Hypergraph Condensation" (AHGCDD), a method to distill large hypergraphs into smaller, more manageable synthetic ones for efficient training of HGNNs. Both studies aim to improve the capabilities and efficiency of HGNNs for various applications. AI

IMPACT These papers advance the theoretical understanding and practical efficiency of hypergraph neural networks, potentially enabling more sophisticated AI models for complex relational data.
RESEARCH · arXiv cs.LG · 3d · [2 sources]

Chebyshev Center-Based Direction Selection for Multi-Objective Optimization and Training PINNs

Researchers have developed a novel method for training physics-informed neural networks (PINNs) by formulating the update-direction selection as a Chebyshev-center problem. This approach aims to simplify the simultaneous optimization of multiple loss terms inherent in PINNs, which often complicates their training. The new method selects a normalized direction that maximizes the minimum distance to cone facets, offering a unified geometric principle that recovers desirable properties of existing techniques without explicit imposition. Experiments indicate strong empirical performance on PINN benchmarks. AI

IMPACT Offers a more interpretable and unified approach to training complex neural networks used in scientific simulations.
- PINNs
- Chebyshev Center
TOOL · arXiv cs.CL · 2d

From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction

Researchers have developed a new method called Medical Token-Pair Encoding (MedTPE) to efficiently compress long electronic health record sequences for large language models. This technique merges frequently occurring medical token pairs into single composite tokens, achieving lossless compression without adding computational overhead or sacrificing predictive accuracy. MedTPE has demonstrated significant reductions in input token length and inference latency across various clinical prediction tasks and LLMs, while also showing robustness and generalizability to other domains and languages. AI

IMPACT Introduces a novel compression technique for LLMs processing lengthy clinical data, potentially reducing costs and improving efficiency in healthcare AI applications.
- MedTPE
- LLMs
- EHRs
TOOL · arXiv cs.CL · 2d

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

Researchers have developed DreamAvoid, a novel framework designed to prevent failures in Vision-Language-Action (VLA) models during critical manipulation tasks. The system uses a "dreaming" process at test time to anticipate and avoid potential errors that can lead to irrecoverable failures. By identifying critical phases, proposing candidate actions, and evaluating their potential short-horizon futures, DreamAvoid aims to improve overall task success rates in real-world robotics and simulation benchmarks. AI

IMPACT Introduces a novel method to enhance the reliability and success rate of VLA models in complex manipulation tasks.
- DreamAvoid
- Vision-Language-Action (VLA) models
RESEARCH · arXiv stat.ML · 3d · [2 sources]

Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning

Researchers have introduced Consolidation-Expansion Operator Mechanics (OpMech), a new framework to precisely define adaptive learning systems. OpMech uses an 'order-gap' metric, computable from a system's trajectory, to signal how sensitive it is to the sequence of learning operations. This metric can be used as a real-time control signal to determine when a system has converged, offering provable guarantees in various learning settings. AI

IMPACT Introduces a theoretical framework for adaptive learning systems, potentially improving convergence guarantees in areas like reinforcement learning and recursive language models.
- Consolidation-Expansion Operator Mechanics
- Debashis Guha
TOOL · arXiv cs.CL · 2d

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

Researchers have identified a key vulnerability in current large language model (LLM) unlearning techniques, where models can quickly recover forgotten information through relearning attacks. This fragility stems from existing methods primarily altering dominant components of model representations, leaving minor components intact and more resistant to reversal. To address this, a new method called Minor Component Unlearning (MCU) is proposed, which focuses on modifying these robust minor components to enhance resistance against relearning attacks, showing significant improvements in experiments. AI

IMPACT Enhances LLM security by making it harder to recover sensitive data after unlearning, crucial for privacy and copyright.
- Large language model
- Minor Component Unlearning
TOOL · arXiv cs.CL · 2d

Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability

Researchers have developed a new multimodal benchmark using data from Japan's National Assessment of Academic Ability, which includes approximately 900,000 aggregated student responses. This dataset features real exam materials from science, mathematics, and Japanese language subjects, preserving authentic layouts and diagrams. It aims to provide a human-grounded evaluation framework for multimodal large language models (MLLMs) by allowing direct comparison between model and human performance. AI

IMPACT Establishes a new, human-grounded benchmark for evaluating multimodal LLMs in educational contexts, particularly for Japanese language assessments.
- Japan's National Assessment of Academic Ability
- multimodal large language models
RESEARCH · Hugging Face Daily Papers · 2d · [2 sources]

Phoenix-VL 1.5 Medium Technical Report

Researchers have developed Phoenix-VL 1.5 Medium, a 123-billion parameter multimodal and multilingual foundation model specifically adapted for the Singaporean context. This model was pre-trained on a massive 1-trillion token multimodal corpus, extended for long-context understanding, and further refined with Singapore-specific cultural, legal, and legislative data. Phoenix-VL 1.5 Medium demonstrates state-of-the-art performance on localized benchmarks while maintaining global competitiveness in general intelligence and STEM fields. AI

IMPACT Sets a new benchmark for localized multimodal AI adaptation, potentially influencing future domain-specific model development.
RESEARCH · arXiv cs.LG · 3d · [2 sources]

Scaling the Memory of Balanced Adam

Two new research papers explore the nuances of the Adam optimizer, a popular tool in deep learning. The first paper proposes a "refresh rule" for Adam's momentum parameter, suggesting it should scale with training data size to optimize performance and robustness across different scales. The second paper delves into how mini-batch noise, influenced by batch size and Adam's hyperparameters, affects the optimizer's implicit bias and generalization capabilities, particularly in multi-epoch training scenarios. AI

IMPACT These studies offer theoretical insights and practical tuning strategies for the Adam optimizer, potentially improving model training efficiency and generalization across various deep learning tasks.