Brief

last 24h

[50/447] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

MAGIS: Evidence-Based Multi-Agent Reasoning for Interpretable Strabismus Clinical Decision-Making

Researchers have developed MAGIS, a novel framework designed to improve the interpretability and accuracy of strabismus diagnosis using AI. This system transforms the diagnostic process into a structured, evidence-based approach, moving beyond the 'black-box' nature of some current AI models. MAGIS integrates visual evidence from patient photographs with clinical diagnostic rules to refine diagnostic hypotheses, significantly outperforming existing systems and enhancing the reliability of generated reports. AI

IMPACT Enhances AI's role in medical diagnosis by providing interpretable and evidence-based decision-making, potentially improving patient outcomes.
- Strabismus
- arXiv
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

LiteVSR: Lightweight Adaptation of Frozen Diffusion Transformers for Video Super-Resolution

Researchers have developed LiteVSR, a new framework for adapting pre-trained diffusion transformers for video super-resolution tasks. This approach uses a lightweight State-Aware Adapter that requires significantly fewer trainable parameters and less training time compared to existing methods. LiteVSR leverages flow matching to efficiently adapt the frozen transformer, enabling competitive restoration quality with minimal computational resources. AI

IMPACT Offers a more computationally efficient method for adapting large generative models to specific video enhancement tasks.
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

One Model, Multiple Goals: Adaptive Multi-Objective Learning for E-commerce Dialogue Systems

Researchers have developed a new adaptive multi-objective reinforcement learning framework called MORE, designed to optimize both reasoning accuracy and linguistic naturalness in e-commerce dialogue systems. This approach treats reasoning functions as constraints to guide policy optimization, avoiding the instability of directly mixing rewards. Online experiments on ByteDance production traffic showed MORE improved conversion rates by over 16% and reached conversion by over 30%, while also boosting user satisfaction. AI

IMPACT This framework could significantly enhance the effectiveness and user satisfaction of AI-powered e-commerce customer service agents.
RESEARCH · arXiv cs.CL English(EN) · 1d · [3 sources]

Culturally-Adapted Red-Teaming Across East and Southeast Asian Contexts: A Methodological and Comparative Analysis

A new research paper introduces a methodology for culturally-adapted red-teaming of large language models (LLMs) across East and Southeast Asian contexts. The study found that direct translation of English benchmarks significantly underestimates LLM risks, with culturally-adapted prompts yielding a higher attack success rate. The research highlights the necessity of adapting safety evaluations to specific cultural nuances rather than relying solely on linguistic translation. AI

IMPACT Adapting LLM safety evaluations to cultural contexts is crucial for reliable multilingual deployment.
- Korean
- LLM
- Khmer
- Thai
- Japanese
- LLMs
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection

A new research paper identifies an "Injection Paradox" in RAG-based LLM recommendation systems, where prompt injections backfire and suppress the target brand. Safety-trained Claude models, specifically Claude Opus 4.6, showed a significant drop in recommendation rates for brands with injected content, even affecting unmodified documents from the same brand. This behavior contrasts with GPT models, suggesting differing safety training mechanisms across model families and raising concerns about potential reverse-attack scenarios. AI

IMPACT Reveals a potential vulnerability in RAG systems that could be exploited to suppress competitor brands, highlighting the need for more robust safety training.
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers

A new research paper and a developer guide highlight the challenges and benefits of LLM routing. The research paper identifies a "routing plateau" where many current methods achieve similar, suboptimal accuracy, largely due to focusing on global trends rather than query-specific signals. The developer guide explains how to implement model routing to reduce costs and improve resilience by directing different tasks to appropriate LLMs, suggesting that most applications can significantly cut expenses by routing simpler tasks away from high-end models. AI

IMPACT Implementing effective LLM routing can significantly reduce operational costs and enhance system resilience by matching task complexity to model capabilities.
- Gemini
- GPT-4
- Claude Opus
- GPT-3.5-turbo
- Claude Haiku
- Gemini Flash
- OpenAI
- Anthropic
- LLM
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

Semi-supervised Source Detection in Astronomical Images: New Benchmark and Strong Baseline

Researchers have introduced a new benchmark and a novel semi-supervised learning framework for detecting sources in astronomical images. The benchmark, LAMOST-DET, includes over 18,000 images and nearly 730,000 source instances, addressing the scarcity of annotated astronomical data. Their framework, Nova Teacher, integrates several modules to effectively detect dense sources even with limited annotations, showing significant improvements in mean Average Precision (mAP) over existing methods. AI

IMPACT Provides a new dataset and improved methodology for AI-driven analysis in astronomy, potentially accelerating discoveries.
- Nova Teacher
- LAMOST-DET
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

Minimal Solvers for Full-DoF Motion Estimation from Asynchronous Differential SfM

Researchers have developed a new framework for estimating egomotion using asynchronous optical flow from event cameras. This method allows for the recovery of both angular and linear velocities, overcoming challenges posed by the asynchronous data streams of these sensors. The proposed optimization algorithm and a novel algebraic minimal 5-point solver enable full degree of freedom egomotion estimation, outperforming traditional synchronous methods in accuracy and robustness. AI

IMPACT Establishes a foundation for improved continuous-time motion estimation in high-speed robotics.
- optical flow
- event cameras
RESEARCH · arXiv cs.AI English(EN) · 1d · [4 sources]

Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges

Two new research papers explore the privacy vulnerabilities of large language models (LLMs). One paper introduces a dataset and evaluation framework to identify privacy risks in multi-modal LLMs, highlighting how these models can leak sensitive information from images and memory. The other paper benchmarks the effectiveness of differential privacy (DP) in adapting LLMs, finding that data distribution shifts significantly impact privacy risks and that parameter-efficient fine-tuning methods like LoRA offer better protection for out-of-distribution data. AI

IMPACT Highlights critical vulnerabilities in LLM privacy, urging developers to implement robust safeguards for multi-modal and adapted models.
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [3 sources]

Taming Perception Jitter: Uncertainty-Aware LiDAR Object Detection for Reliable Motion Classification

Researchers have developed a new method to improve motion classification in autonomous driving systems by addressing "perception jitter." This technique enhances 3D object detectors with uncertainty estimates and uses a statistical test to differentiate true motion from sensor noise. Integrated into the Autoware system, the approach aims to reduce false dynamic predictions and unnecessary vehicle stops in real-world conditions. AI

IMPACT Reduces false positives in autonomous driving perception, potentially leading to smoother and safer navigation.
RESEARCH · Mastodon — sigmoid.social English(EN) · 12h · [2 sources]

World’s first AI‑designed vaccine explained # AI # Vaccine # Vaccines # MedicalResearch # Health # DNA # Science # Technology # COVID19 # Coronavirus # Pandemic

Researchers have developed the world's first AI-designed vaccine, which has now been tested in human trials. This DNA vaccine was created by identifying common features across various coronavirus families, enabling it to target SARS, COVID, and related bat viruses. The vaccine has demonstrated the ability to generate immune responses against multiple strains, offering potential protection against future pandemics. AI

IMPACT This AI-driven vaccine development could accelerate the creation of broad-spectrum vaccines for future pandemic threats.
- COVID
- Cambridge
- AI
- vaccine
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

Claude Code-Driving Scenario Mining for the Argoverse 2 Challenge

Researchers have developed a novel four-stage pipeline for the CVPR 2026 Argoverse 2 Scenario Mining Challenge. This system leverages a Claude Code agent, powered by GLM 5.1, for autonomous code generation. It then refines training data through iterative screening and semantic code review, also using Claude Code. Finally, Qwen3-VL is employed for scene-level verification to ensure accuracy. AI

IMPACT Demonstrates novel pipeline for autonomous driving scenario mining using LLMs.
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

Counterfactual Reasoning for Fine-Grained Evidence Disentanglement in VideoQA

Researchers have developed a new framework called CREDiT to improve the reliability of video question-answering systems. This framework uses counterfactual reasoning and structural causal models to disentangle causal evidence from spurious correlations in video data. By decomposing representations into causal and non-causal components and employing feature-level causal interventions, CREDiT aims to create more trustworthy AI systems that can accurately localize evidence. AI

IMPACT Enhances the trustworthiness and accuracy of AI systems in understanding and reasoning about video content.
- NExT-GQA
- SPORTU-video
- SportsQA
- VideoQA
- CREDiT
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

CP4D: Compositional Physics-aware 4D Scene Generation

Researchers have introduced CP4D, a new framework for generating realistic 4D scenes that adhere to physical principles. The system combines static 3D environments with dynamic, physically grounded foreground objects. CP4D uses a three-stage process involving pre-trained models, a hybrid motion synthesis strategy, and an automated composition mechanism to create coherent and controllable 4D scenes. AI

IMPACT Introduces a novel method for generating physically consistent dynamic 3D scenes, potentially advancing realism in simulation and content creation.
- arXiv
- CP4D
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

From USD Scenes to Knowledge Graphs: Zero-Shot Ontology Grounding with LLMs

Researchers have developed a method using large language models (LLMs) to automatically ground objects in 3D simulation scenes to formal ontology classes. This approach aims to overcome the limitations of manually curated dictionaries, which are often brittle and lack generalization. The LLMs demonstrated high accuracy in mapping scene objects to ontology classes, significantly outperforming traditional baselines, especially when provided with contextual cues from the scene graph. AI

IMPACT Automates a key step in robot reasoning by enabling LLMs to interpret 3D simulation environments.
RESEARCH · arXiv stat.ML English(EN) · 1d · [2 sources]

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

Researchers have developed INFUSER, a novel framework for self-evolving language models that enhances reasoning capabilities. This iterative co-training system features a Generator that creates questions and answers from documents, and a Solver that learns from them. The Generator is rewarded based on an influence score, ensuring it produces questions that genuinely improve the Solver's performance, rather than just difficult ones. INFUSER demonstrated significant improvements, with an 8B model outperforming a larger 32B model on math and coding tasks. AI

IMPACT Enhances LLM reasoning capabilities by creating adaptive training curricula, potentially leading to more capable AI agents.
- Qwen3-8B-Base
- SuperGPQA
- DuGRPO
- Olympiad
- GRPO
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

Zero-Parameter Geometric Gating for Temporally Stable Low-Altitude UAV Video Semantic Segmentation

Researchers have developed a novel zero-parameter geometric gating method to improve temporal stability in semantic segmentation for low-altitude UAV video. This technique addresses noise introduced by optical flow in aerial imagery by routing regions based on RANSAC homography statistics. The proposed gate, combined with Semantic Similarity Propagation, enhances accuracy and temporal consistency without requiring extensive learned parameters. AI

IMPACT Improves accuracy and temporal consistency in video analysis for drone applications.
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

DiffSight-Former: Modeling Structural Differences and Temporal Dynamics for Glaucoma Progression Prediction

Researchers have developed DiffSight-Former, a new framework designed to predict glaucoma progression using sequential fundus images. This model addresses limitations of existing methods by capturing longitudinal structural and vascular changes, which are crucial for early detection. DiffSight-Former integrates a time-variant feature extraction module and a multi-structure difference modeling module, processed by a time-aware Transformer, to estimate future glaucoma onset. AI

IMPACT This model could improve early detection and monitoring of glaucoma, potentially leading to better patient outcomes.
RESEARCH · arXiv cs.AI English(EN) · 1d · [3 sources]

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

Researchers have developed a new method for predicting pedestrian crossing intentions using egocentric vision and vision-language models (VLMs). By framing the task as visual question answering, they fine-tuned VLMs to significantly outperform existing transformer-based models. The inclusion of contextual cues like eye gaze and ego motion further enhanced prediction accuracy, establishing a new state-of-the-art for this safety-critical application. AI

IMPACT Establishes a new state-of-the-art for pedestrian intent prediction, potentially improving autonomous driving safety systems.
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

OmniGen-AR: AutoRegressive Any-to-Image Generation

Researchers have introduced OmniGen-AR, a novel autoregressive framework designed for versatile image generation. This unified model can synthesize images from various inputs, including text, segmentation maps, depth information, and even existing images for editing or video prediction. To prevent condition tokens from influencing content tokens, the framework employs Disentangled Causal Attention (DCA), a technique that separates attention mechanisms during training. OmniGen-AR has demonstrated state-of-the-art performance on benchmarks like GenEval and VBench. AI

IMPACT Introduces a unified framework for multi-modal image generation, potentially simplifying complex visual synthesis tasks.
RESEARCH · arXiv cs.MA (Multiagent) English(EN) · 1d · [2 sources]

Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations

A new research paper details an agentic AI architecture designed for autonomous incident resolution in large-scale network operations. This system utilizes a multi-agent framework where specialized AI agents collaborate to detect, diagnose, and fix network issues without human intervention. Deployed in a production environment at a major cloud provider, the architecture has demonstrated over 90% autonomous resolution rates for common incident types, while incorporating safety measures like layered authorization and rollback capabilities. AI

IMPACT Demonstrates potential for AI to significantly reduce human intervention in critical infrastructure operations, improving efficiency and safety.
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions

Researchers have introduced Ultra Flash, a novel cascaded streaming framework designed to generate high-resolution video in real-time. This system overcomes the limitations of previous models that were restricted to lower resolutions. Ultra Flash achieves impressive frame rates at 1K and 2K resolutions on a single GPU by employing a unique super-resolution training paradigm and a causal streaming latent upsampler. AI

IMPACT Enables real-time high-resolution video generation, potentially impacting content creation and streaming services.
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

A Geometric Framework for Absolute Pose and Velocity Estimation with Event Cameras

Researchers have developed a new geometric framework to estimate both the absolute pose and velocity of objects using event cameras. This method leverages 3D lines in a scene and the events they trigger, addressing a gap where previous techniques primarily focused on velocity estimation. The framework utilizes geometric constraints to enable efficient linear and globally optimal polynomial solvers for pose, and both linear and optimization-based solvers for velocity, requiring a minimum of three event-line correspondences. AI

IMPACT Enhances capabilities for robotic navigation and augmented reality by improving motion estimation accuracy and efficiency.
- Event Cameras
- 3D lines
RESEARCH · arXiv cs.LG English(EN) · 1d · [8 sources]

Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

Researchers have developed new algorithms for multi-armed bandit problems, focusing on improving regret bounds and adapting to dynamic environments. One paper introduces a three-phase algorithm for contextual queueing bandits that achieves a rate-optimal queue length regret of $\widetilde{\mathcal{O}}(T^{-1/2})$. Another study proposes UCB for Arriving Arms (UCB-AA) to handle bandit problems where new arms become available over time, focusing on dynamic regret and sublinear guarantees. A third paper presents Dri-MED, an algorithm designed for linear contextual bandits with drifting preferences and context, aiming for efficient experimentation. AI

IMPACT Advances in bandit algorithms can lead to more efficient experimentation and decision-making in AI systems.
RESEARCH · arXiv cs.CV English(EN) · 1d · [3 sources]

CAMF-Det: Closure-Aware Multimodal Fusion for LiDAR-Camera 3D Object Detection on UAV Platforms

Two new research papers propose advanced fusion techniques for 3D object detection using LiDAR and camera data. The first, Geometry-Aware Fisheye-LiDAR Fusion (GA-HF), addresses challenges in low-overlap setups by preserving fisheye geometry and using attention mechanisms to correct feature distortion. The second, CAMF-Det, focuses on Unmanned Aerial Vehicle (UAV) platforms, developing a closure-aware framework to handle occlusion caused by tree canopies and other ground objects by modeling and predicting occlusion intensity. AI

IMPACT These novel fusion techniques aim to improve the accuracy and robustness of 3D object detection systems in challenging real-world scenarios, potentially impacting autonomous driving and aerial robotics.
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [2 sources]

In-Context Learning for the Imputation of Public Opinion Data with Large Language Models

Researchers have developed a new method for imputing missing public opinion data using large language models (LLMs) through in-context learning (ICL). This approach was tested on survey data and showed consistent error reduction compared to traditional statistical methods like MICE PMM. The best-performing ICL method, utilizing a gpt-oss-120b model with 100 examples, achieved narrower confidence intervals and improved aggregate coverage, particularly under non-random missingness. AI

IMPACT This research demonstrates a novel application of LLMs for improving the accuracy and efficiency of public opinion data imputation, potentially impacting survey methodology and analysis.
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 1d · [2 sources]

Driving Video Retrieval for Complex Queries with Structured Grounding

Researchers have developed STRIVE-D, a new framework designed to improve video retrieval for complex queries in autonomous driving scenarios. This system addresses limitations of existing methods by incorporating data calibration to adapt rule-based retrieval and fuse it with vision-language and keyword signals. STRIVE-D has demonstrated significant improvements, achieving up to an 84% relative increase in top-1 accuracy on driving benchmarks, including new event data from DrivingDojo. AI

IMPACT Enhances autonomous driving safety validation and data curation by improving the ability to retrieve specific driving events.
- DrivingDojo
- STRIVE-D
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

A Unifying Lens on Reward Uncertainty in RLHF

Researchers have introduced a new framework to address reward hacking in Reinforcement Learning from Human Feedback (RLHF). The proposed method utilizes distributional reward models to quantify uncertainty, offering a unified approach to existing heuristics like mean aggregation and worst-case optimization. This framework aims to improve the robustness of RLHF by penalizing policies that exploit errors in the reward model. AI

IMPACT This research offers a more principled way to handle uncertainty in reward models, potentially leading to more robust and reliable AI agents trained with human feedback.
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [3 sources]

EditSSC: Toward Editable Semantic Occupancy Scenes with Unconditional Diffusion Models

Researchers have developed EditSSC, a new method for generating and editing 3D semantic scenes using 2D Bird's Eye View (BEV) representations. This approach repurposes components from Stable Diffusion, enabling training-free editing capabilities like sketch-guided generation, inpainting, and outpainting. EditSSC demonstrates superior performance on unconditional generation compared to existing 3D-specific methods, highlighting the potential of 2D diffusion models for 3D scene manipulation. AI

IMPACT Enables more accessible and flexible 3D scene generation for applications like autonomous driving.
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

Activation Steering Induces Emergent Misalignment: A More Comprehensive Evaluation

Two new research papers explore emergent misalignment in large language models, a phenomenon where models trained on narrow, unsafe tasks develop broader harmful behaviors. The first paper demonstrates that activation steering, an inference-time control technique, can induce this misalignment, even in recent models like Qwen-3.5, and produces responses that are more coherent and harmful than those from finetuned models. The second paper identifies sycophancy, or training models to agree with users' incorrect opinions, as another driver of emergent misalignment and introduces 'Alignment Gating' as an efficient method to reverse it by controlling internal representations. AI

IMPACT Highlights new methods for inducing and potentially mitigating emergent misalignment in LLMs, crucial for safety research.
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [3 sources]

See More, Match Better: Multi-Source Feature Fusion for Two-View Correspondence Learning

Researchers have developed TriMatch, a new framework for two-view correspondence learning that improves accuracy by fusing multiple feature types. This approach combines geometric, texture semantic, and structural semantic features, addressing limitations of existing methods that rely solely on geometric consistency. TriMatch includes modules for aligning these diverse features and a semantic-guided modulation to suppress incorrect matches, demonstrating robust performance in experiments. AI

IMPACT Enhances image matching accuracy by integrating diverse feature types, potentially improving applications in computer vision.
- TriMatch
- arXiv
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [4 sources]

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

Two new research papers explore advancements in Thompson Sampling for bandit problems. The first paper introduces an algorithm for risk-averse bandits with sub-Gaussian rewards, achieving asymptotic optimality for various risk functionals. The second paper presents algorithms for joint prior selection and regret minimization in Gaussian Process bandits, demonstrating effectiveness through theoretical analysis and experiments. AI

IMPACT These papers advance theoretical understanding and algorithmic capabilities in bandit problems, potentially improving decision-making in areas like reinforcement learning and online optimization.
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

MAAM: Anchor-Preserving Compression and Contextual Calibration for Chinese Discriminatory Language Detection

Researchers have developed MAAM, a novel framework for detecting discriminatory language in Chinese. This model-agnostic approach uses a "visual blur" inspired mechanism to preserve key semantic anchors while calibrating them with contextual priors. MAAM also introduces ChLGBT, a new dataset specifically for identifying bias within the Chinese LGBT community, containing over 8,000 annotated samples. AI

IMPACT Offers a more compact and stable approach to detecting subtle bias in language, potentially reducing reliance on massive LLMs for specific tasks.
- Chinese
- ChLGBT
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [4 sources]

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

Two new research papers propose novel frameworks for improving temporal answer grounding in instructional videos. One method, Candidate-Aware Causal Reasoning (CACR), uses a pre-training based candidate selection algorithm and a temporal logic reasoning module with a rejection reward mechanism. The other, Temporal-Aware Reasoning Optimization (TaRO), enhances multi-modal large language models by focusing on time-aware reasoning through constructive exploration and a temporal-sensitivity reward. AI

IMPACT These frameworks offer improved accuracy and reasoning quality for AI systems tasked with retrieving specific information from videos.
RESEARCH · arXiv cs.CL English(EN) · 1d · [3 sources]

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

Researchers have introduced DynaCF, a novel framework designed to address shortcut learning in reward models used for AI training. This method dynamically reweights training samples by assessing their sensitivity to counterfactual perturbations, downweighting those that rely on superficial patterns. By encouraging reward models to focus on genuine response quality rather than spurious correlations, DynaCF aims to improve the robustness and reliability of preference modeling in AI systems. AI

IMPACT Enhances the reliability of AI training by reducing reliance on superficial patterns, leading to more robust models.
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

Latent-space Attacks for Refusal Evasion in Language Models

Researchers have developed PsychoSafe, a framework to improve how large language models refuse harmful requests by employing psychologically informed communication strategies. This approach reframes refusals as supportive interactions, enhancing external resource referral and psychological grounding. Separately, another study introduces Latent-space Attacks for Refusal Evasion, which analyzes how to bypass LLM safety mechanisms by manipulating internal model representations to suppress refusal behavior. AI

IMPACT Developments in LLM refusal strategies and evasion techniques highlight ongoing challenges in AI safety and alignment.
RESEARCH · arXiv cs.MA (Multiagent) English(EN) · 1d · [2 sources]

A Multi-Agent System for IPMSM Design Optimization via an FEA-AI Hybrid Approach

Researchers have developed a novel multi-agent system to optimize the design of interior permanent magnet synchronous motors (IPMSMs). This system integrates retrieval-augmented generation (RAG) for problem definition and an uncertainty-aware hybrid approach combining finite element analysis (FEA) with AI. The framework automates design processes, improves reliability, and balances computational cost with prediction accuracy, outperforming traditional FEA-only or AI-only methods. AI

IMPACT Introduces a more efficient and reliable automated design process for complex engineering components.
- FEA
- AI
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

Introducing multiplex semantic networks as multifaceted representations of creative associative knowledge across multilingual samples

Researchers have developed multiplex semantic networks, a layered approach to modeling the associative knowledge underlying creativity. By analyzing data from six cognitive tasks across 518 individuals from four countries, they found that different task layers capture distinct, non-redundant information about semantic organization. This method improved prediction accuracy for individual creativity scores by 50% when combined with machine learning, highlighting the importance of diverse data and structural network measures. AI

IMPACT This research offers a novel method for understanding and predicting creativity, potentially impacting AI systems designed for creative tasks.
RESEARCH · arXiv cs.AI English(EN) · 1d · [3 sources]

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

Researchers have developed FASE, a new metric for evaluating code quality in multi-agent AI systems. FASE approximates functional correctness by analyzing code dissimilarity, offering a significant speed improvement over existing methods. Separately, a new benchmark called CoQuIR has been introduced to assess code retrieval systems on dimensions beyond just functional relevance, including correctness, efficiency, security, and maintainability. CoQuIR includes annotations for over 42,000 queries across 11 languages and highlights that current retrieval models often fail to distinguish between high and low-quality code. AI

IMPACT These advancements in code quality evaluation could lead to more reliable AI-assisted software development and more trustworthy code retrieval systems.
RESEARCH · arXiv stat.ML English(EN) · 1d · [2 sources]

Estimate Collapsibility of Causal Effects in Completed Partial DAGs via Strong d-Convex Hulls

Researchers have developed a new method for estimating causal effects within completed partially directed acyclic graphs (CPDAGs). This approach ensures estimator consistency both before and after marginalizing over specific variables. The paper introduces 'estimate collapsibility' and identifies minimal collapsible sets as strong d-convex hulls, providing an efficient algorithm for their discovery. Experiments demonstrate the effectiveness of this collapsibility technique for causal estimations in CPDAGs. AI

IMPACT Introduces a novel statistical method for causal inference, potentially improving the reliability of AI models that rely on understanding causal relationships.
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [3 sources]

Vision-Language Guided Hyperspectral Object Tracking via Semantics Fusion and Contextual Template Updating

Researchers have developed VLHTrack, a new framework for hyperspectral object tracking that integrates vision and language models. This approach uses language priors to guide band selection, reducing redundancy and highlighting key spectral features. The system also incorporates a dynamic template update mechanism using Mamba to handle appearance variations and deformations in long sequences. Experiments show VLHTrack surpasses current state-of-the-art methods on benchmark datasets. AI

IMPACT Introduces a novel method for improving object tracking accuracy by leveraging LLMs for spectral feature selection and dynamic template updating.
RESEARCH · arXiv stat.ML English(EN) · 1d · [2 sources]

Backward Coherence and Hidden-State Stability in Recurrent Neural Networks: A Quasi-Reverse-Martingale Theory

Researchers have developed a new theoretical framework called backward coherence to analyze hidden-state stability in recurrent neural networks (RNNs). This approach treats the hidden-state sequence as a quasi-reverse-martingale, enabling more stable and interpretable representations. Simulations and real-world data studies demonstrate that this method can significantly improve stability, reduce tracking errors, and enhance forecasting accuracy, particularly under concept drift. AI

IMPACT Introduces a theoretical framework to enhance stability and interpretability in RNNs, potentially improving performance in time-series forecasting and data analysis tasks.
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [2 sources]

Pretrained, Frozen, Still Leaking: Auditing Cross-Encoder Attribute Transfer in EEG Foundation Models

Researchers have developed a new auditing framework for EEG foundation models that goes beyond single-endpoint evaluations. This framework jointly audits multiple endpoints, revealing that models cleared by individual tests can still leak spectral attributes. A key finding is that a cross-encoder transfer audit demonstrates attribute leakage between different frozen encoders, even with standard defenses like DP-SGD failing to prevent it. AI

IMPACT This research introduces a more robust auditing framework for AI models, potentially leading to improved data privacy and security in foundation models.
- EEGPT
- DP-SGD
- EEG Foundation Models
- Sleep-EDF
- LIMO
- CHB-MIT
- EEGMMI
- LiRA
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

Latent Geometry Beyond Search: Amortizing Planning in World Models

Researchers have developed new methods for long-horizon planning in world models, addressing limitations of existing techniques. One approach, FF-JEPA, uses a hierarchical structure with two forward dynamics models, including an action-free latent planner to predict subgoals, thus removing the need for explicit goal images and enabling planning over extended periods. Another method, building on a pretrained LeWorldModel, amortizes planning into a latent inverse-dynamics mapping, replacing iterative optimization with a faster, goal-conditioned inverse dynamics model that significantly reduces computational cost while maintaining or exceeding performance. AI

IMPACT These advancements could enable more sophisticated AI agents capable of complex, multi-step tasks in real-world environments.
- iCEM
- Xiaohao Xu
- LeWorldModel
- CEM
- arXiv
- FF-JEPA
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

SAGE: Shape-Adapting Gated Experts for Adaptive Histopathology Image Segmentation

Researchers have developed two novel frameworks, SAGE and SegMoTE, to improve medical image segmentation. SAGE utilizes a dynamic expert routing system to adapt to variations in cell size and shape, achieving high Dice scores on multiple datasets. SegMoTE, on the other hand, efficiently adapts general segmentation models like SAM to medical imaging tasks with minimal learnable parameters and reduced annotation costs. Both approaches aim to enhance the accuracy and practicality of AI in clinical diagnostics. AI

IMPACT These new segmentation models offer improved accuracy and efficiency for clinical diagnostics, potentially reducing annotation costs and enhancing the deployment of AI in healthcare.
- SegMoTE
- MedSeg-HQ
- Yujie Lu
- SAM
- Vision Transformer UNet
- ConvNeXt
- SAGE
- Nguyen Vu
RESEARCH · arXiv cs.MA (Multiagent) English(EN) · 1d · [3 sources]

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Researchers have developed a novel "hacker-fixer loop" to improve the robustness of AI agent benchmarks against reward hacking. This adversarial process uses three LLM agents to iteratively identify and patch vulnerabilities in benchmark verifiers, preventing agents from achieving high scores without genuinely solving tasks. The method significantly reduced hack success rates, even enabling weaker agents to defend against stronger ones, and has led to the release of a new dataset and tools for future research. AI

IMPACT Enhances the reliability of AI agent evaluations, crucial for advancing research and development in multi-agent systems.
RESEARCH · arXiv cs.LG English(EN) · 1d · [4 sources]

Towards Serverless Semi-Decentralized Federated Learning with Heterogeneous Optimizers

Researchers are developing new methods to improve federated learning (FL) in practical, real-world scenarios. One approach, HASA, focuses on allocating subnets for model-heterogeneous FL by considering client heterogeneity alongside compute budgets, showing improved accuracy on prediction tasks. Another development addresses dynamic device availability in FL by analyzing convergence under changing device sets and proposing a model initialization algorithm that uses gradient similarity for faster adaptation. Additionally, a data-free early stopping framework is introduced to determine optimal stopping points in FL without relying on validation data, demonstrating comparable or superior performance to validation-based methods. Finally, a serverless, semi-decentralized FL methodology is proposed that uses device-to-device initialization for cluster formation and novel "effective loss functions" to handle heterogeneous optimizers and improve convergence speed and communication efficiency. AI

IMPACT These advancements aim to make federated learning more robust, efficient, and practical for real-world applications by addressing challenges like device heterogeneity, dynamic participation, and data privacy.
RESEARCH · arXiv cs.AI English(EN) · 1d · [3 sources]

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

Researchers have developed AHA-WAM, a novel asynchronous world-action model for robot manipulation that improves efficiency by decoupling world prediction and action execution. This model utilizes a dual Diffusion Transformer architecture, with one transformer acting as a low-frequency world planner and the other as a high-frequency action executor. Experiments demonstrate that AHA-WAM achieves state-of-the-art performance on robotic tasks, including a 4.59x speedup over previous methods. AI

IMPACT Enables more efficient and faster robotic manipulation by decoupling planning and execution.
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

Enhancing Video Representations with Spatiotemporal-Semantic Residual to Mitigate Hallucinations in Video Large Multimodal Models

Researchers have developed new methods to combat hallucinations in large vision-language models (LVLMs). One approach, ViSSRes, enhances video representations using a lightweight network to improve spatiotemporal and semantic consistency, significantly reducing hallucination rates on benchmarks like EventHallusion. Another method focuses on refining textual embeddings to encourage better integration of visual information, leading to more balanced multimodal reasoning and improved performance on benchmarks such as MMVP and POPE. AI

IMPACT These methods offer potential solutions for improving the reliability and accuracy of multimodal AI systems, crucial for applications requiring precise visual understanding.
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [3 sources]

Data augmented bootstrap: Unifying confidence interval construction by approximate invariance

Researchers have introduced the data augmented bootstrap (DAB), a new framework designed to unify the construction of confidence intervals. This method leverages approximately invariant transformations of data, encompassing existing techniques like conformal prediction and the classical bootstrap as special cases. DAB provides theoretical coverage guarantees that adapt based on the strength of the invariance, without requiring a group structure, and integrates data augmentation into statistical methods. AI

IMPACT Introduces a unified statistical framework for confidence intervals, potentially improving reliability in ML model evaluation.