Brief

last 24h

[50/497] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 16h

Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence

Researchers have developed a generative AI model called Graph-to-SFILES to predict control structures for process diagrams. This model utilizes graph neural networks to interpret process topologies, offering an alternative to sequence-based methods. While effective in small-data scenarios, its performance on large datasets still requires further investigation for industrial applications. AI

IMPACT This research could accelerate P&ID development in data-scarce environments, though its industrial applicability needs further study.
TOOL · arXiv cs.AI English(EN) · 16h

AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model

Researchers have developed AMix-1, a protein foundation model utilizing Bayesian Flow Networks and a novel training methodology. This model demonstrates scalable pretraining, emergent capabilities, and effective in-context learning through multiple sequence alignments. AMix-1 has successfully designed an improved protein variant with a 50x activity increase and incorporates an evolutionary test-time scaling algorithm for enhanced in silico directed evolution. AI

IMPACT Introduces a new foundation model for protein design with potential to accelerate lab-in-the-loop engineering.
TOOL · arXiv cs.AI English(EN) · 16h

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

Researchers have developed new recurrent neural network architectures, the Cumulative Memory Recurrent Unit (CMRU) and its variant $\alpha$CMRU, to improve performance and learning stability in ultra-low power applications. These models address gradient blocking issues in previous designs by introducing a cumulative update formulation that enhances gradient flow and reduces initialization sensitivity. The CMRU and $\alpha$CMRU demonstrate competitive or superior performance compared to existing models like LRUs and minGRUs on various benchmarks, particularly for tasks requiring long-range memory retention, while maintaining essential features for analog implementation. AI

IMPACT Introduces more stable and efficient RNNs for edge devices, potentially enabling new low-power AI applications.
TOOL · arXiv cs.AI English(EN) · 16h

Pharmacogenomic Knowledge Graph Augmentation for Graph Neural Network-Based Drug-Drug Interaction Prediction

Researchers have developed a method to enhance drug-drug interaction (DDI) prediction using Graph Neural Networks (GNNs) by incorporating pharmacogenomic data. This approach augments molecular structure information with details about drug metabolism pathways, specifically focusing on cytochrome P450 enzymes. The study found that this knowledge graph augmentation significantly improves DDI classification accuracy, particularly for interactions mediated by CYP2C9, though it did not overcome inherent limitations in predicting interactions for entirely new drugs. AI

IMPACT Enhances AI's ability to predict drug interactions by integrating biological pathway data, potentially accelerating drug discovery and safety assessments.
TOOL · arXiv cs.AI English(EN) · 16h

Baichuan-M4: A Clinical-Grade Medical Agent System for Continuous Care

Baichuan Intelligence has introduced Baichuan-M4, a medical large model designed for continuous patient care. This system integrates a unified runtime for consistent training and deployment, a core reasoning model trained with reinforcement learning for long-term patient memory and multi-agent coordination, and a clinical tool layer for evidence retrieval and multimodal understanding. Baichuan-M4 demonstrates leading performance across various medical evaluations, including static knowledge, dynamic consultations, and image analysis, while significantly reducing hallucination rates. AI

IMPACT This advanced medical AI system could set new benchmarks for continuous patient care and diagnostic accuracy in healthcare.
TOOL · arXiv cs.AI English(EN) · 16h

Language-based Trial and Error Falls Behind in the Era of Experience

Researchers have developed a new framework called SCOUT to improve the performance of Large Language Models (LLMs) on non-linguistic tasks. SCOUT decouples exploration from exploitation, using lightweight "scouts" to efficiently gather data from environments. This data is then used to fine-tune LLMs, enabling them to perform better on tasks that previously required extensive and costly trial-and-error. In experiments, SCOUT allowed a Qwen2.5-3B-Instruct model to outperform proprietary models like Gemini-2.5-Pro while consuming fewer computational resources. AI

IMPACT This framework could significantly reduce the computational cost of training LLMs for complex, real-world tasks.
TOOL · arXiv cs.AI English(EN) · 16h

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings

Researchers have developed Polar Coordinate Positional Embeddings (PoPE) to improve Transformer architectures by decoupling content and positional information. This new method, PoPE, addresses limitations in existing RoPE embeddings where content and position are entangled, potentially hindering performance. PoPE demonstrates superior performance in tasks requiring positional or content-based indexing and shows significant gains in sequence modeling across music, genomics, and natural language, even outperforming methods designed for length extrapolation. AI

IMPACT PoPE could enhance Transformer performance in sequence modeling tasks by improving positional awareness, potentially leading to better language models and other sequence-based AI applications.
TOOL · arXiv cs.AI English(EN) · 16h

MatMind: A Structure-Activity Knowledge-Driven Generative Foundation Model for Materials Science

Researchers have introduced MatMind, a novel generative foundation model designed for materials science. This model unifies structure-activity knowledge and physics-informed feedback within a progressive training framework. MatMind demonstrates competitive performance across various tasks, including property prediction and crystal generation, surpassing specialized models in several benchmarks. AI

IMPACT MatMind's unified approach could accelerate discovery and design in materials science by providing a versatile backbone for various tasks.
- MatMind
- arXiv
TOOL · arXiv cs.AI English(EN) · 16h

Evaluating Advanced Prompting on Gemini Flash for Multi-Hop Biomedical QA

Researchers evaluated Google's Gemini Flash models on the MedHopQA challenge, which requires multi-hop reasoning in the biomedical domain. By employing an advanced prompt engineering strategy that included role-playing, Chain-of-Thought examples, and specific formatting, they achieved a Concept Level Score of 0.720 with Gemini 2.0 Flash. This sophisticated prompting significantly improved performance compared to a baseline prompt and nearly matched the results of the next-generation Gemini 2.5 Flash, highlighting the crucial role of prompt design in LLM reasoning. AI

IMPACT Demonstrates that sophisticated prompt engineering can unlock advanced reasoning capabilities in efficient LLMs for specialized domains.
TOOL · arXiv cs.AI English(EN) · 16h

FormalASR: End-to-End Spoken Chinese to Formal Text

Researchers have developed FormalASR, a novel end-to-end system designed to convert spoken Chinese directly into formal written text. This approach bypasses the need for a separate post-editing step by an LLM, reducing latency and computational costs. The system utilizes two models, 0.6B and 1.7B parameters, fine-tuned from Qwen3-ASR, and is trained on newly created large-scale datasets, WenetSpeech-Formal and Speechio-Formal. AI

IMPACT Offers a more efficient and direct method for transcribing spoken language into formal text, potentially improving downstream NLP applications.
TOOL · arXiv cs.AI English(EN) · 16h

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

Researchers have identified a key issue in scaling up AI model training data mixtures, termed "repetition mismatch." This occurs when the optimal data mixture changes as training budgets increase due to the varying repetition rates of high-quality, limited datasets. A new subsampling procedure that matches the target repetition rate can accurately predict optimal mixtures from significantly smaller experiments, improving efficiency and accuracy. AI

IMPACT Improves efficiency and accuracy in training large AI models by addressing data mixture scaling issues.
- arXiv
- Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them
TOOL · arXiv cs.AI English(EN) · 16h

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Researchers have developed a new framework called Zero-Expert Self-Distillation Adaptation (ZEDA) to make Mixture-of-Experts (MoE) language models more efficient. ZEDA allows post-trained static MoE models to dynamically skip over half of their experts during inference with minimal accuracy loss. This method was tested on Qwen3-30B-A3B and GLM-4.7-Flash, showing significant reductions in computation and an inference speedup of approximately 1.20x. AI

IMPACT Reduces inference costs for MoE models, potentially accelerating deployment and adoption.
TOOL · arXiv cs.AI English(EN) · 16h

Post-training is (Massive) Supervised Learning

A new paper argues that the current dominant method for training large language models (LLMs), which involves extensive post-training stages like supervised fine-tuning (SFT) and reinforcement learning (RL), is essentially a return to older "pre-train then fine-tune" approaches. The authors demonstrate that models trained from scratch on modern reasoning datasets can achieve significant performance on competitive benchmarks, suggesting that current post-training primarily serves to fit models to specific distributions rather than fostering general capabilities. They propose a shift towards training procedures that emphasize "learning how to learn" to develop more generally capable models. AI

IMPACT Suggests current LLM training methods may be overly focused on distribution fitting, potentially hindering the development of more general AI capabilities.
- BERT
- LLMs
- SFT
- RL
TOOL · arXiv cs.AI English(EN) · 16h

SmartMixed: A Two-Phase Training Strategy for Adaptive Activation Function Learning in Neural Networks

Researchers have developed SmartMixed, a new two-phase training strategy that enables neural networks to learn optimal activation functions for individual neurons. The first phase uses a differentiable mixture mechanism for neurons to select from a pool of candidate functions, while the second phase fixes these selections for computational efficiency. Experiments on the MNIST dataset with feedforward networks showed that neurons in different layers develop distinct activation function preferences, outperforming models with a single fixed activation function. AI

IMPACT Enables more efficient and potentially more powerful neural network architectures by optimizing activation functions at a granular level.
- ELU
- SmartMixed
- ReLU
- Tanh
- Leaky_ReLU
- MNIST
- Amin Omidvar
TOOL · arXiv cs.LG English(EN) · 16h

scCBGM: Interpretable Single-Cell Counterfactual Editing

Researchers have developed scCBGM, a novel framework for interpretable single-cell counterfactual editing using concept bottleneck generative models. This approach adapts concept bottleneck architectures for single-cell data, incorporating decoder skip connections and a cross-covariance penalty to enhance disentanglement. The framework has been extended to flow matching models, allowing for concept-guided editing in both encoding-decoding and generation scenarios, and includes a new synthetic benchmark for evaluation. AI

IMPACT Introduces a new method for analyzing and manipulating single-cell data, potentially accelerating disease research and therapeutic design.
TOOL · arXiv cs.AI English(EN) · 16h

Item Response Scaling Laws: A Measurement Theory Approach for Efficient and Generalizable Neural Scaling Estimation

Researchers have developed a new framework called Item Response Scaling Laws (IRSL) that integrates Item Response Theory with language model scaling laws. This approach aims to make the estimation of scaling laws more efficient and generalizable by disentangling model ability from question characteristics, reducing the complexity from O(M x N) to O(M + N). IRSL uses empirical response probabilities from LMs, such as token probabilities or pass rates, to derive more reliable scaling estimates with significantly fewer questions, enabling accurate performance forecasting across different benchmarks. AI

IMPACT This framework could significantly reduce the computational cost of evaluating and forecasting AI model performance.
TOOL · arXiv cs.AI English(EN) · 16h

OSMGraphCLIP: Learning Global Location Representations from OpenStreetMap Graphs

Researchers have developed OSMGraphCLIP, a novel model that learns global location representations using OpenStreetMap data. This model encodes geographic environments as graphs, capturing topological and semantic relationships between features like roads and buildings. OSMGraphCLIP demonstrates strong performance across various downstream tasks, including climate, ecology, and public health, often matching or surpassing satellite-based methods, particularly for socioeconomic and health-related predictions. AI

IMPACT This model demonstrates the potential of using structured map data for AI tasks, offering an alternative to satellite imagery for certain applications.
TOOL · arXiv cs.AI English(EN) · 16h

Towards Long-Horizon Vessel Trajectory and Destination Forecasting with Reasoning Large Language Models

Researchers have developed a new framework called RLVR to improve long-horizon maritime trajectory and destination forecasting using large language models. This approach converts vessel trajectories into semantic textual representations, enabling reinforcement learning to align LLMs with forecasting objectives. Experiments show that LLMs trained with RLVR significantly outperform existing deep learning methods, particularly in predicting destinations accurately, with 4B LLMs demonstrating optimal performance. AI

IMPACT Enhances LLM capabilities for complex, long-term predictive tasks in operational domains like maritime logistics.
TOOL · arXiv cs.AI English(EN) · 16h

STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning

Researchers have introduced STAR, a novel approach to Mixture-of-Experts (MoE) routing that treats routing as a structure-aware subspace learning problem. Unlike traditional MoE methods that use limited linear projections, STAR incorporates an evolving principal subspace to track dominant input structures, enhancing routing stability and expert specialization. This method has demonstrated improved performance on language and vision tasks, with potential for further robustness through optional test-time subspace updates. AI

IMPACT Improves routing stability and performance in MoE models, potentially leading to more efficient and capable AI systems.
TOOL · arXiv cs.AI English(EN) · 16h

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting

Researchers have developed MMR-GRPO, a novel method to accelerate training for mathematical reasoning models. This approach reweights rewards based on the diversity of model completions, recognizing that redundant outputs offer limited learning value. By prioritizing unique solutions, MMR-GRPO significantly reduces the number of training steps and wall-clock time needed to achieve peak performance, as demonstrated across various model sizes and benchmarks. AI

IMPACT Accelerates AI model training for mathematical reasoning, potentially reducing computational costs and development time.
- Kangda Wei
- MMR-GRPO
- GRPO
TOOL · arXiv cs.AI English(EN) · 16h

BareWave: Waveform-Native Flow-Matching Text-to-Speech

Researchers have developed BareWave, a novel text-to-speech system that generates audio directly from text without intermediate representations. This waveform-native approach addresses challenges in raw waveform modeling by aligning representations, using staged noise schedules, and incorporating velocity-aware perceptual alignment. The system demonstrates strong performance in zero-shot voice cloning, achieving high intelligibility, speaker similarity, and naturalness. AI

IMPACT Introduces a waveform-native approach to TTS, potentially simplifying model architectures and improving voice cloning capabilities.
- BareWave
- arXiv
TOOL · arXiv cs.AI English(EN) · 16h

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation

Researchers have developed I-Segmenter, a novel framework that enables Vision Transformers (ViTs) for semantic segmentation to operate entirely with integers. This approach significantly reduces the memory footprint and computational cost associated with ViTs, making them more suitable for resource-constrained devices. The system incorporates a new activation function, \u03bb-ShiftGELU, to improve stability during quantization and replaces certain operations to maintain an integer-only execution path. Experiments demonstrate that I-Segmenter achieves competitive accuracy compared to its floating-point counterpart while offering substantial reductions in model size and faster inference speeds. AI

IMPACT Enables efficient deployment of advanced segmentation models on edge devices, broadening AI accessibility.
TOOL · arXiv cs.AI English(EN) · 16h

Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model

Researchers utilized the open-weight LLaMA 3.1 large language model to automatically extract structured information from 947 Dutch brain MRI reports. The model demonstrated high performance in identifying visual rating scores for atrophy and lesion mentions, achieving over 90% accuracy for several categories. While zero-shot performance was strong for categorical data, few-shot prompting significantly improved accuracy for numerical variables like microbleed and infarct counts, suggesting LLaMA 3.1's potential for large-scale medical research. AI

IMPACT Demonstrates LLM capabilities in specialized medical data extraction, potentially accelerating research and clinical insights.
- arXiv
- LLaMA 3.1
TOOL · arXiv cs.AI English(EN) · 16h

Failure-Aware Refinement of Vision-Language Model for Lithography Defect Detection

Researchers have developed a two-stage vision-language model to improve the accuracy of detecting defects in semiconductor lithography images. The first stage uses a fine-tuned Qwen3-VL model to identify defect counts, categories, and locations. A second stage then refines these initial predictions by learning from the first stage's errors, thereby enhancing overall defect inference. AI

IMPACT Introduces a novel two-stage refinement approach for vision-language models, potentially improving accuracy in specialized industrial applications like defect detection.
- arXiv
- Qwen3-VL
TOOL · arXiv cs.AI English(EN) · 16h

High-Rate Quantized Matrix Multiplication II

Researchers have published a paper detailing advancements in quantized matrix multiplication, specifically for large language models. The work, a follow-up to previous research, focuses on scenarios where the covariance matrix of the second factor is known. This method can improve existing LLM quantization algorithms like GPTQ by optimizing rate allocation, moving away from equal distribution. AI

IMPACT Optimizes LLM quantization, potentially leading to more efficient model deployment and reduced computational costs.
TOOL · arXiv cs.AI English(EN) · 16h

PRISM: PRior-guided Imagination Sampling in world Models

Researchers have developed PRISM, a novel framework for improving action sampling in world models for robotics. PRISM extracts action intuition directly from the world model's own learned representations, avoiding the need for separate, large visual encoders or VLMs. This approach integrates a state-conditioned Gaussian prior into the planner's sampling distribution, significantly boosting success rates on tasks like Cube and PushT by up to 35 percentage points without adding substantial inference overhead. AI

IMPACT Enhances robot planning efficiency by improving action sampling in world models, potentially leading to more capable autonomous systems.
- Cube
- world model
- PRISM
TOOL · arXiv cs.AI English(EN) · 16h

When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents

Researchers have developed a new framework called AutoElicit to systematically identify unsafe unintended behaviors in computer-use agents (CUAs). This method iteratively perturbs benign instructions using agent execution feedback to surface long-tail harmful outcomes. The framework successfully uncovered hundreds of such behaviors in advanced CUAs like Claude 4.5 Haiku, Claude 4.5 Opus, and Operator, demonstrating a persistent susceptibility across various frontier agents. AI

IMPACT Highlights critical safety vulnerabilities in current AI agents, necessitating improved testing and alignment strategies.
TOOL · arXiv cs.AI English(EN) · 16h

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

Researchers have introduced AVI-Bench, a new benchmark designed to evaluate the audio-visual intelligence of Omni-Multimodal Large Language Models (Omni-MLLMs). This benchmark assesses models across perception, understanding, and reasoning stages using tasks that require joint audio-visual interpretation. An extension, AVI-Bench-PriSe, further tests robustness with unfamiliar stimuli to gauge generalization beyond typical training data. Experiments indicate current Omni-MLLMs have significant limitations in audio-visual intelligence. AI

IMPACT Provides a new framework for evaluating and improving the audio-visual capabilities of multimodal AI models.
TOOL · arXiv cs.AI English(EN) · 16h

Knowledge Graphs and Reasoning LLMs for Finding Simple Yet Effective Transcriptomic Perturbation Predictors

Researchers have developed a novel approach using knowledge graphs and Large Language Models (LLMs) to predict the effects of gene knockout perturbations on transcriptomic gene expression. Their simplest model, a K-nearest neighbor approach leveraging biological knowledge graphs, achieved competitive performance, outperforming most methods on out-of-distribution predictions. Further enhancements using LLMs trained via reinforcement learning for predictive accuracy matched state-of-the-art results, demonstrating the potential of knowledge graphs as model priors and LLMs as adaptable tools for complex biological response prediction. AI

IMPACT This research demonstrates a new method for applying LLMs and knowledge graphs to biological prediction, potentially improving drug discovery and genetic research.
TOOL · arXiv cs.AI English(EN) · 16h

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Researchers have developed a new framework called TRIAGE to improve risk prediction in medical time series data using large language models. TRIAGE addresses the issue of LLMs overconfidently predicting binary outcomes by training them to generate dialectical reasoning, which elicits outcome-specific rationales. This approach leads to more calibrated risk scores and higher quality clinical reasoning in explanations, outperforming existing methods on multiple benchmarks. AI

IMPACT Enhances LLM capabilities in medical risk prediction, potentially improving patient triage and clinical decision-making.
- LLMs
- TRIAGE
TOOL · arXiv cs.AI English(EN) · 16h

LogNEO: A GPT-Neo Reinforcement Learning Framework for Accurate Real-Time Log Anomaly Detection

Researchers have developed LogNEO, a new framework for detecting anomalies in system logs using EleutherAI's GPT-Neo model. This system employs a novel reinforcement learning approach with a position-aware reward scheme and cross-entropy regularization. LogNEO achieves high F1 scores on standard benchmarks, outperforming prior state-of-the-art methods in recall, and has been demonstrated in a production environment with low latency and high throughput. AI

IMPACT This framework enhances real-time log anomaly detection capabilities, potentially improving system reliability and security in production environments.
- LogGPT
- LogNEO
- GPT-Neo
- EleutherAI
- Thunderbird
- Apache Kafka
- Redis
- TensorRT
TOOL · arXiv cs.AI Nederlands(NL) · 16h

Deep Tree Tensor Networks

Researchers have introduced a new neural network architecture called the Deep Tree Tensor Network (DTTN), inspired by tensor networks from quantum physics. This model is designed to capture complex, high-order interactions between features through multilinear operations, essentially forming a tree-like structure. The DTTN aims to improve parameter efficiency and interpretability in image recognition tasks, demonstrating superior performance on various benchmarks compared to existing methods. AI

IMPACT Introduces a novel architecture potentially improving image recognition performance and interpretability.
- arXiv
- Deep Tree Tensor Network
TOOL · arXiv cs.AI English(EN) · 16h

Audio-FLAN: An Instruction-Following Dataset for Unified Audio Understanding and Generation of Speech, Music, and Sound

Researchers have introduced Audio-FLAN, a new large-scale dataset designed to unify audio understanding and generation tasks for large language models. The dataset comprises over 100 million instances across 80 diverse tasks, covering speech, music, and general sound domains. Audio-FLAN aims to enable zero-shot learning for unified audio-language models, allowing them to handle both comprehension and creation of audio content. AI

IMPACT Enables unified audio-language models for diverse understanding and generation tasks.
TOOL · arXiv cs.AI English(EN) · 16h

Ego-Pi: VLA Fine-Tuning for Ego-Centric Human and Robot Data

Researchers have developed Ego-Pi, a method for fine-tuning vision-language models (VLMs) using ego-centric data from both humans and robots. This approach addresses the data scarcity issue in robotics by leveraging readily available human data to train robots for new task semantics and skill composition. The findings indicate that human data significantly enhances robot learning capabilities, even in the absence of specific robot-collected data for novel tasks. AI

IMPACT Enables robots to learn new tasks and skills more efficiently by leveraging readily available human-centric data.
TOOL · arXiv cs.AI English(EN) · 16h

Reachability and asymptotics of Gaussian Transformer dynamics

Researchers have modeled data propagation in Transformers as a nonlinear control system. They proved that Gaussian distributions remain Gaussian throughout the process, simplifying the dynamics to a finite-dimensional system governing mean and covariance. This framework allows for the analysis of Transformer expressiveness as a reachability problem and reveals connections to classical control theory. AI

IMPACT Provides a theoretical framework for understanding Transformer behavior and expressiveness.
- Transformer
TOOL · arXiv cs.AI English(EN) · 16h

Q-Delta: Beyond Key-Value Associative State Evolution

Researchers have introduced Q-Delta, a novel approach to sequence modeling that enhances linear attention mechanisms. This method integrates query-conditioned state readout, allowing queries to influence state evolution alongside key-based retrieval. Q-Delta aims to improve efficiency and performance in tasks like language modeling and long-context retrieval. AI

IMPACT Introduces a new method for sequence modeling that could improve efficiency and performance in language and retrieval tasks.
- arXiv
- Q-Delta
TOOL · arXiv cs.AI English(EN) · 16h

LFNO: Bridging Laplace and Fourier via Transient-Steady Decomposition

Researchers have developed the Laplace-Fourier Neural Operator (LFNO), a novel framework designed to model dynamical systems. LFNO uniquely combines the strengths of Laplace and Fourier Neural Operators by decomposing system dynamics into transient and steady-state components. Evaluations across nine benchmarks, including ODE and PDE systems, show LFNO outperforming existing operators, particularly in transient-dominated ODE systems, and demonstrating competitive performance on PDE benchmarks. AI

IMPACT Introduces a unified framework for modeling dynamical systems, potentially improving accuracy and interpretability in scientific simulations.
- Laplace Neural Operators
- Fourier Neural Operators
TOOL · arXiv cs.AI English(EN) · 16h

Efficient Skill Grounding via Code Refactoring with Small Language Models

Researchers have developed RECENT, a framework designed to improve skill grounding for embodied agents using small language models (sLMs). This approach treats skills as executable code, allowing for semantic intent to be preserved while adapting to specific embodiment and environmental conditions through localized code refactoring. RECENT demonstrates robust long-horizon performance across various robotic embodiments and dynamic environments, matching the performance of larger language models while utilizing more constrained sLMs. AI

IMPACT Enables more efficient deployment of embodied agents in real-world scenarios by improving skill adaptability with smaller models.
TOOL · arXiv cs.AI English(EN) · 16h

PACT: Learning Diverse Diagnostic Strategies via Privileged Synthesis and Branch Consensus

Researchers have developed PACT, a new framework designed to improve the diagnostic reasoning of AI agents in clinical settings. PACT utilizes a novel approach that synthesizes dialogues across different reasoning paradigms without revealing hidden patient information. This method involves a Doctor-Patient-Supervisor (DPS) system and a training strategy that aggregates specialized AI branches through consensus. Experiments on a Chinese medical diagnosis benchmark show PACT outperforming existing baselines in both diagnostic accuracy and consultation process metrics. AI

IMPACT Enhances AI's ability to perform complex clinical diagnostics by integrating multiple reasoning strategies.
- LLM
TOOL · arXiv cs.AI English(EN) · 16h

NutriMLLM: Multimodal Large Language Models for Dietary Micronutrient Analysis

Researchers have developed NutriMLLM, a new family of multimodal large language models specifically designed for analyzing dietary micronutrients from food images. Existing models proved unreliable for this task, often abstaining or providing inaccurate data. To overcome this, the team created a large synthetic dataset of over a million image-description-nutrient triplets by repurposing dietary recall data. Fine-tuning models like Qwen3-VL on this dataset resulted in NutriMLLM variants that demonstrate near-complete coverage of 65 micronutrients and competitive accuracy against leading proprietary models. AI

IMPACT Enables more accurate and comprehensive dietary analysis from food images, potentially improving personalized nutrition and public health surveillance.
TOOL · arXiv cs.AI English(EN) · 16h

A retrieval conditioned rebinding circuit for dynamic entity tracking in large language models

Researchers have identified a specific circuit within large language models that handles dynamic entity tracking. This mechanism, termed a retrieval conditioned rebinding circuit, is responsible for binding entities to their attributes and updating this information as the model processes changing states. The study found this circuit present in models like Gemma and Llama, though its implementation varies, with Gemma expressing binding information in query/key subspaces and Llama primarily in key vectors. AI

IMPACT Reveals an interpretable mechanism for state tracking, potentially aiding in understanding and improving LLM reasoning capabilities.
TOOL · arXiv cs.AI English(EN) · 16h

EvoCSFL: Surrogate-Assisted Evolutionary Client Selection for Efficient and Robust Federated Learning

Researchers have developed a new framework called EvoCSFL to improve federated learning efficiency and robustness. This method uses an evolutionary algorithm guided by a surrogate model to select clients, optimizing for model performance, communication latency, and energy consumption. Experiments on several datasets showed that EvoCSFL achieves faster convergence, reduced energy use, and better robustness compared to existing approaches. AI

IMPACT This new framework could lead to more efficient and robust distributed AI model training, especially in environments with diverse and potentially unreliable clients.
- TinyImageNet
- EvoCSFL
- Federated Learning
- MNIST
- CIFAR10
- CINIC10
TOOL · arXiv cs.AI English(EN) · 16h

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

Researchers have developed OmniMem, a new framework designed to make audio-visual large language models more memory-efficient for processing long videos. OmniMem addresses the challenge of linearly growing video tokens and KV caches by employing a modality-aware allocation strategy that distinguishes between visual and audio contexts. It also uses perturbation-aware selection to retain crucial information, preventing memory compression from degrading understanding. Experiments show OmniMem improves accuracy by 2-4% over existing methods under similar memory constraints, with further gains possible through budget-aware fine-tuning. AI

IMPACT Enhances efficiency for audio-visual LLMs, potentially enabling more sophisticated long-form video analysis and understanding.
- video-SALMONN 2+
- OmniMem
- arXiv
- Qwen-2.5-Omni
- LLMs
- video
TOOL · arXiv cs.AI English(EN) · 16h

UniQL: Towards Dialect-Universal Benchmarking for Text-to-SQL

Researchers have introduced UniQL, a new benchmark designed to evaluate how well text-to-SQL models can generalize across different SQL dialects. Existing benchmarks primarily focus on SQLite, failing to capture the complexities of real-world database systems which often require dialect-specific SQL syntax and functions. UniQL includes 1,534 natural language questions paired with executable SQL annotations across 16 dialects, totaling 24,544 queries. Experiments reveal that current large language models struggle with dialect generalization, showing significant performance drops when moving beyond SQLite. AI

IMPACT Highlights the need for more robust text-to-SQL models capable of handling diverse database dialects, potentially impacting enterprise data integration and analysis tools.
- SQLite
- LLMs
TOOL · arXiv cs.AI English(EN) · 16h

Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning

Researchers have developed a new method called Thinking-Based Non-Thinking (TNT) to address reward hacking in hybrid reasoning models. This approach aims to optimize computational efficiency by enabling models to decide when to engage in complex reasoning and when to provide a direct answer. TNT reportedly reduces token usage by approximately 50% while improving accuracy on mathematical benchmarks, achieving a better trade-off between performance and efficiency than existing methods. AI

IMPACT This method could lead to more efficient and accurate reasoning models, reducing computational costs for complex tasks.
TOOL · arXiv cs.AI English(EN) · 16h

Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

Researchers have identified that language models struggle with simple syntactic tasks like generating balanced parentheses due to interference between reliable and unreliable internal mechanisms. Faulty components within the models can overshadow sound ones, leading to errors. To address this, a new method called RASteer was developed to identify and amplify the contribution of reliable components, significantly improving performance on balanced parentheses tasks and showing gains in arithmetic reasoning. AI

IMPACT This research offers a method to improve the reliability of language models on fundamental tasks, potentially enhancing their utility in code generation and logical reasoning applications.
TOOL · arXiv cs.AI English(EN) · 16h

DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs

Researchers have developed a new method called DyCP to efficiently manage context in long-form dialogues with large language models. This technique dynamically identifies and retrieves relevant dialogue segments, reducing inference costs and latency without requiring offline memory construction. DyCP preserves the sequential nature of conversations and has shown competitive performance across multiple benchmarks and LLM backends. AI

IMPACT Improves efficiency and reduces latency for LLMs handling long dialogues, potentially enabling more complex conversational AI applications.
- DyCP
- LLMs
- Nayoung Choi
TOOL · arXiv cs.AI English(EN) · 16h

A Regret Minimization Framework on Preference Learning in Large Language Models

Researchers have introduced a new framework called Regret-based Preference Optimization (RePO) for training large language models using human feedback. RePO reframes the process from reward maximization to regret minimization, modeling human preferences based on anticipated outcomes and counterfactual comparisons. Experiments on mathematical reasoning and human preference datasets show that RePO offers improved performance and better human alignment. AI

IMPACT Introduces a novel training methodology that could lead to more human-aligned and performant LLMs on complex reasoning tasks.
TOOL · arXiv cs.AI English(EN) · 16h

Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

Researchers have introduced "Contribution Weights," a novel metric for analyzing self-attention transformers in large language models. This new metric goes beyond traditional attention weights by incorporating the geometric properties of value vectors, offering a more accurate measure of a token's influence. The study demonstrates that Contribution Weights effectively identify semantically critical tokens and provides new insights into the functional role of "attention sinks," revealing their active role in stabilizing representations rather than merely storing information. AI

IMPACT Provides a more accurate method for interpreting LLM behavior, potentially improving model analysis and debugging.
TOOL · arXiv cs.AI English(EN) · 16h

AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Researchers have introduced AlloSpatial, a new framework designed to enhance the spatial reasoning capabilities of foundation models. This framework converts egocentric observations into structured allocentric representations, such as spatial trees and route maps, which can be queried for object topology, geometry, and trajectories. AlloSpatial also incorporates a Spatial Reasoning Harness to manage tool use and arbitrate between different sensory inputs. Experiments on benchmarks like VSI-Bench and MindCube demonstrated significant improvements in spatial reasoning for existing models, even outperforming larger general-purpose models. AI

IMPACT Enhances foundation models' ability to understand and reason about physical space, potentially improving robotics and embodied AI applications.