Brief

last 24h

[50/168] 185 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.LG · 15h · [2 sources]

What is Learnable in Valiant's Theory of the Learnable?

Researchers have revisited Valiant's original 1984 learnability model, which differs from the more common PAC learning model by allowing learners to issue membership queries and requiring hypotheses with no false positives. They established a new characterization for learnability in Valiant's model, showing it is strictly between PAC learning and a variant without queries. The study also presents the first algorithm for learning $d$-dimensional halfspaces within Valiant's framework, demonstrating their learnability with queries. AI

IMPACT Refines theoretical understanding of learnability, potentially influencing future algorithm design.
- Valiant
- PAC learning model
RESEARCH · arXiv cs.LG · 17h · [2 sources]

Tight Sample Complexity Bounds for Entropic Best Policy Identification

Researchers have developed a new algorithm that tightens sample complexity bounds for identifying optimal policies in risk-sensitive reinforcement learning. The work addresses a gap between theoretical lower bounds and existing upper bounds, specifically for problems involving the entropic risk measure. By employing novel technical innovations, including sharper concentration bounds and a new stopping rule, the algorithm achieves a sample complexity that matches the established lower bound. AI

IMPACT This research refines theoretical understanding of reinforcement learning, potentially leading to more sample-efficient algorithms for complex decision-making tasks.
- arXiv
- cs.LG
RESEARCH · arXiv cs.LG · 17h · [2 sources]

Sampling from Flow Language Models via Marginal-Conditioned Bridges

Researchers have introduced a new sampling method for Flow Language Models (FLMs) called marginal-conditioned bridges. This technique adapts continuous flow matching for token sequences, addressing limitations in standard diffusion model samplers. The proposed method samples endpoints from FLM token marginals and then uses an analytic Ornstein-Uhlenbeck bridge, offering improved quality-diversity tradeoffs and principled control over decoding. AI

IMPACT Introduces a novel sampling technique that enhances the quality-diversity balance in Flow Language Models.
- Flow Language Models
- Iskander Azangulov
RESEARCH · Mastodon — mastodon.social Türkçe(TR) · 5h · [2 sources]

📰 CHAL: Hierarchical Memory Standard in AI Agents (2026) Scientists are standardizing the memory and decision-making processes of language agents with CHAL

Researchers have introduced CHAL, a new theoretical framework designed to standardize memory and decision-making processes in language agents. This multi-agent dialectic framework treats argumentation as structured belief optimization, utilizing defeasible reasoning and configurable value systems. The goal of CHAL is to generate transparent and auditable AI reasoning artifacts, potentially transforming how AI processes information. AI

IMPACT Standardizes memory and decision-making in AI agents, potentially transforming information processing.
- CHAL
- AI
RESEARCH · Medium — fine-tuning tag · 20h · [2 sources]

Is Fine-Tuning Always Necessary? When Pretrained Models Are Enough

Two articles discuss the nuances of fine-tuning AI models. One guide explores how to build specialized, smaller models that are efficient and outperform general-purpose ones. The other article questions the necessity of fine-tuning, suggesting that pre-trained models are often sufficient for many AI tasks. AI

IMPACT Explores efficient methods for specialized AI model development and questions the universal need for fine-tuning, guiding practitioners on model selection.
- AI
- models
RESEARCH · arXiv cs.LG · 18h · [2 sources]

Conformal Anomaly Detection in Python: Moving Beyond Heuristic Thresholds with 'nonconform'

Researchers have developed a new Python package called 'nonconform' to improve anomaly detection methods. This tool integrates with existing machine learning libraries to provide statistically calibrated p-values, moving beyond heuristic thresholding. The package aims to make conformal anomaly detection more accessible and reproducible for both experimental and production environments. AI

IMPACT Enhances statistical rigor in anomaly detection, making it more reliable for production systems.
- nonconform
- Python
- scikit-learn
- pyod
RESEARCH · MarkTechPost · 1d · [2 sources]

Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture

Researchers have introduced AntAngelMed, a 103 billion parameter open-source medical language model. It utilizes a Mixture-of-Experts (MoE) architecture, activating only 6.1 billion parameters per query for enhanced efficiency. This design allows it to match the performance of a 40 billion parameter dense model while achieving speeds over 200 tokens per second on H20 hardware. The model supports a 128K context length and has undergone a three-stage training process including pre-training on medical corpora, supervised fine-tuning, and reinforcement learning. AI

IMPACT Provides a highly efficient, open-source LLM for medical applications, potentially accelerating research and development in the healthcare sector.
RESEARCH · The Register — AI · 2d · [3 sources]

Microsoft researchers find AI models and agents can't handle long-running tasks

Microsoft researchers have identified a significant limitation in current AI models and agents: their inability to effectively manage long-running tasks. These systems struggle with tasks that require sustained operation or memory over extended periods. This deficiency impacts their potential for complex, multi-stage operations and highlights an area for future AI development. AI

IMPACT Highlights a current limitation in AI capabilities, suggesting that complex, long-term operations are not yet feasible for current models and agents.
RESEARCH · arXiv cs.AI · 18h · [2 sources]

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

Two new research papers explore efficient pre-training methods for large language models. The first paper compares dense and sparse Mixture-of-Experts (MoE) transformer architectures at a small scale, finding that MoE models improve validation loss when matching active parameters but do not surpass dense models at equal total parameter capacity. The second paper investigates various low-rank pre-training techniques, demonstrating that even when validation perplexity is similar, these methods converge to geometrically distinct solutions and do not fully replicate the generalization or internal representations of full-rank training. AI

IMPACT These studies offer insights into optimizing LLM training efficiency and understanding the trade-offs of different architectural and optimization approaches.
- LLaMA
- Mixtral
- GaLore
- Fira
- CoLA
- SLTrain
- ReLoRA
RESEARCH · arXiv cs.LG · 18h · [2 sources]

Achieving $ε^{-2}$ Sample Complexity for Single-Loop Actor-Critic under Minimal Assumptions

Researchers have established a new theoretical sample complexity guarantee for off-policy actor-critic methods in reinforcement learning. The paper proves the first $\tilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity for finding an $\epsilon$-optimal policy under minimal assumptions, specifically requiring only an irreducible Markov chain. This achievement contrasts with prior work that necessitated nested-loop updates or stronger, algorithm-dependent policy assumptions. AI

IMPACT Establishes a new theoretical benchmark for reinforcement learning algorithms, potentially improving sample efficiency in future applications.
RESEARCH · arXiv cs.LG · 20h · [2 sources]

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

Researchers have introduced Rescaled Asynchronous SGD (ASGD), a novel method for optimizing distributed machine learning models under heterogeneous conditions. This approach addresses the bias in standard ASGD that arises when faster workers contribute more updates, by rescaling worker-specific stepsizes. The method theoretically guarantees convergence to the correct global objective and matches the known lower bound for time complexity in the non-convex setting. AI

IMPACT Introduces a more efficient optimization method for distributed AI training, potentially improving performance on heterogeneous hardware.
- Rescaled Asynchronous SGD
- Artavazd Maranjyan
RESEARCH · arXiv stat.ML · 23h · [2 sources]

Delightful Exploration

Researchers have introduced Delight-gated exploration (DE), a novel algorithm designed to optimize decision-making in scenarios with vast action spaces. DE prioritizes exploratory actions based on their potential "delight," a metric combining expected improvement and surprisal, rather than broadly searching until uncertainty is resolved. This approach aims to be more efficient than traditional methods like ε-greedy, especially when exploration budgets are limited. The algorithm has demonstrated consistent performance across various bandit and MDP problems, showing reduced regret compared to Thompson Sampling and ε-greedy. AI

IMPACT Offers a more efficient approach to decision-making in complex environments, potentially improving AI agent performance.
RESEARCH · arXiv cs.LG · 18h · [2 sources]

Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models

Researchers have developed a new framework called operator-adaptive calibration to streamline the selection of spectral preprocessing methods in near-infrared spectroscopy (NIRS). This approach integrates preprocessing selection directly into the calibration model, reducing the need for costly and time-consuming external pipeline searches. The new models offer faster, more robust, and auditable NIRS method development by producing traceable operator choices and retaining interpretable coefficients. AI

IMPACT Offers a more efficient and auditable approach to method development in NIRS, potentially impacting fields relying on spectral analysis.
- PLS
- Ridge regression
- NIRS
- CatBoost
- CNN
RESEARCH · arXiv stat.ML · 23h · [2 sources]

Unified generalization analysis for physics informed neural networks

Researchers have developed a unified framework for analyzing the generalization capabilities of Physics-Informed Neural Networks (PINNs). This new approach relaxes previous restrictive assumptions and uses Taylor expansion to represent differential operators as linear operators in a high-dimensional space. The analysis reveals that while high-rank networks can generalize well, the nonlinearity of differential operators significantly impacts and potentially enlarges generalization bounds. AI

IMPACT Provides a theoretical advancement for understanding the generalization of specialized neural networks used in scientific applications.
- Physics-Informed Neural Networks
- Variational Physics-Informed Neural Networks
RESEARCH · arXiv stat.ML · 23h · [2 sources]

The Sample Complexity of Multiple Change Point Identification under Bandit Feedback

Researchers have developed a new adaptive algorithm for identifying multiple change points in data under bandit feedback. This algorithm aims to precisely locate discontinuities in a piecewise-constant function with minimal samples. The study establishes theoretical bounds on the algorithm's sample complexity, revealing that it depends not only on the magnitude of the jumps but also on the relative positions of these change points. AI

IMPACT Provides a theoretical framework for analyzing data with discontinuities, potentially improving models that rely on sequential data analysis.
- arXiv
- stat.ML
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Coupling-Informed Transport Maps for Bayesian Filtering in Nonlinear Dynamical Systems

Researchers have developed a new likelihood-free transport filtering method that leverages couplings between state and observation variables. This approach reformulates the filtering analysis step as a minimization of the maximum mean discrepancy (MMD) between true and approximated joint measures. The method offers an analytic computation for the transport map, avoiding particle collapse and accurately approximating non-Gaussian filtering posteriors, with demonstrated superior performance in nonlinear, non-Gaussian scenarios. AI

IMPACT Introduces a novel statistical method for approximating complex probability distributions, potentially improving AI systems that rely on accurate state estimation in dynamic environments.
- arXiv
- Coupling-Informed Transport Maps for Bayesian Filtering in Nonlinear Dynamical Systems
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Generative Modeling of Approximately Periodic Time Series by a Posterior-Weighted Gaussian Process

Researchers have developed a new generative model for time series data that exhibits approximately periodic behavior. This model utilizes a Gaussian Process (GP) with a novel kernel to effectively capture both the common structure across repetitions and the subtle variations between them. The approach decouples intra-repetition dynamics from inter-repetition variability, enabling the generation of realistic synthetic trajectories. AI

IMPACT Introduces a novel method for modeling complex, repetitive patterns in data, potentially improving generative capabilities for industrial and cyber-physical systems.
- Gaussian Process
RESEARCH · arXiv cs.CV · 1d · [2 sources]

On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods

Researchers have developed a theoretical framework to understand and quantify "hallucinations" in AI models used for inverse problems, such as medical imaging. The study shows that these realistic but incorrect details can stem from the inherent ill-posed nature of the problem itself, not just specific models. The new approach provides computable bounds on hallucination magnitudes and algorithms to assess reconstruction faithfulness, demonstrating broad applicability across various imaging tasks and modern generative models. AI

IMPACT Provides a theoretical basis and practical tools for understanding and mitigating AI-generated inaccuracies in critical imaging applications.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Amortized Neural Clustering of Time Series based on Statistical Features

Researchers have developed a novel algorithm-agnostic approach for time series clustering using amortized neural inference. This method trains neural networks to approximate optimal partitioning rules from simulated data, reducing reliance on traditional clustering techniques. The framework leverages statistical features to learn a data-driven affinity structure, enabling automated determination of cluster numbers and achieving competitive or superior accuracy compared to existing methods, with a demonstrated application in financial time series analysis. AI

IMPACT Introduces a new method for automated, adaptive, and data-driven clustering of temporal data across scientific and industrial domains.
- Ángel López-Oriona
RESEARCH · arXiv stat.ML · 1d · [2 sources]

State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

Researchers have developed new Determinantal Point Processes (DPPs) using wavelets to improve minibatch generation for machine learning tasks. These novel DPPs offer provably better accuracy guarantees and a general method to convert continuous DPPs into discrete kernels suitable for subsampling. This approach enhances variance reduction and computational efficiency, expanding the applicability of DPP-based methods to objective functions with low regularity. AI

IMPACT Introduces a novel method for generating more efficient and accurate minibatches in machine learning, potentially improving training performance and reducing computational costs.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Adaptive Kernel Density Estimation with Pre-training

Researchers have introduced a novel approach to density estimation in high-dimensional spaces by leveraging pre-training, a technique common in advanced AI. This method utilizes a pre-trained neural network to suggest suitable location-adaptive kernels for each data point, thereby improving efficiency and accuracy. The effectiveness of this strategy is demonstrated in numerical experiments, particularly when the target distribution aligns with the pre-training distribution, with options for fine-tuning to adapt to different distributions. AI

IMPACT Introduces a novel application of AI pre-training to improve statistical density estimation in high-dimensional data.
RESEARCH · arXiv cs.CL · 1d · [3 sources]

When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

A new research paper introduces a "channel-transition" framework to explain why large language models struggle to maintain context and instructions over extended multi-turn conversations. The study proposes the Goal Accessibility Ratio (GAR) as a metric to quantify the degradation of attention to key instructions. Researchers found that while attention to instructions may close, relevant information can persist in residual representations, leading to varied failure modes across different model architectures. AI

IMPACT Identifies a core limitation in LLM conversational ability, potentially guiding future architectural improvements for better long-term memory.
RESEARCH · arXiv cs.CL · 1d · [2 sources]

Scaling few-shot spoken word classification with generative meta-continual learning

Researchers have explored the effectiveness of generative meta-continual learning for spoken word classification across multiple languages. Their findings indicate that while multilingual models perform best, the performance differences between models trained on various language combinations are surprisingly small. The amount of unique training data appears to be a more significant factor in performance than the number of languages included. AI

IMPACT Investigates scaling few-shot spoken word classification, potentially improving efficiency and adaptability in multilingual environments.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems

Researchers have developed a new statistical method to determine when AI workflows should release their outputs, particularly for systems that use iterative generate-evaluate-revise loops. This "always-valid release wrapper" addresses the challenge of making release decisions with adaptively generated evaluator scores, where traditional calibration models are unavailable. The proposed wrapper creates a reference pool of failures to calibrate scores and uses an e-process for validity, aiming to control the probability of releasing on infeasible tasks while still allowing for releases on feasible ones. AI

IMPACT Provides a statistical framework to improve the reliability of AI system outputs by optimizing release decisions.
- AI
- LLM
- MBPP+
RESEARCH · arXiv stat.ML · 1d · [2 sources]

The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge

Researchers have theoretically analyzed the mechanism of weak-to-strong generalization, a method for aligning advanced AI systems. Their work, focusing on reward-model learning with two-layer neural networks, demonstrates how a strong model can efficiently learn a new task by eliciting its pre-trained knowledge without catastrophic forgetting. This approach establishes that the strong model acquires target feature directions through this training process, preserving its general capabilities. AI

IMPACT Establishes a theoretical foundation for aligning advanced AI systems by demonstrating efficient knowledge transfer without catastrophic forgetting.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Digital Twins as Synthetic Controls in Single-Arm Trials

Researchers have published a paper detailing the use of digital twins as synthetic control arms in single-arm clinical trials. These advanced machine learning models can generate personalized predictions of disease progression, offering a more robust alternative to traditional methods. The paper discusses how these digital twins can overcome limitations of existing synthetic control approaches and provides guidance on their practical deployment, including considerations for FDA draft guidelines on AI in drug development. AI

IMPACT This research could lead to more efficient and ethical clinical trials by leveraging AI for synthetic control arms.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Bayesian Surrogate Training on Multiple Data Sources: A Hybrid Modeling Strategy

Researchers have developed new strategies for training surrogate models by integrating data from multiple sources, including simulations and real-world measurements. One approach involves training separate models for each data type and then combining their predictions, while another trains a single model incorporating both data types. These hybrid methods aim to improve predictive accuracy and coverage, and to identify potential issues within existing simulation models, ultimately aiding in system understanding and future development. AI

IMPACT Enhances AI model training by enabling more accurate predictions and better diagnostics through multi-source data integration.
- Philipp Reiser
- Ian Taylor
RESEARCH · arXiv stat.ML · 1d · [2 sources]

When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression

Researchers have developed a new diagnostic tool to assess the reliability of confidence thresholding in pseudo-labeling pipelines for regression tasks. This method provides a way to predict the bias introduced by thresholding calibrated classifier scores, using the residual score variance on unlabelled data. The proposed $(V^{*}, \kappa)$ decision rule aims to help practitioners determine when confidence thresholding is a safe practice. AI

IMPACT Provides a new operational tool for practitioners to improve the reliability of pseudo-labelled regression models.
- Marcell Tamás Kurbucz
RESEARCH · arXiv stat.ML · 1d · [2 sources]

ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks

Researchers have introduced ISOMORPH, a novel digital twin designed for supply chain logistics, addressing a gap in existing time-series forecasting benchmarks. This simulator offers a configurable, multi-echelon network with interpretable parameters, allowing for realistic dataset generation and the study of phenomena like the bullwhip effect. Initial evaluations show that several foundation models, including Chronos and TimesFM, perform comparably to existing benchmarks when used with ISOMORPH, demonstrating its utility for both simulation and model evaluation. AI

IMPACT Provides a new benchmark for evaluating time-series forecasting models in complex supply chain environments.
- ISOMORPH
- Chronos
- Moirai
- TimesFM
- Lag-Llama
RESEARCH · arXiv stat.ML · 1d · [2 sources]

From Generalist to Specialist Representation

Researchers have published a paper detailing a new method for extracting task-specific representations from generalist AI models. The work establishes theoretical guarantees for identifying and disentangling relevant latent information without requiring interventions or specific model structures. This approach aims to provide a provable foundation for moving from broad, generalist models to more specialized and efficient ones for downstream applications. AI

IMPACT Establishes theoretical guarantees for creating more specialized AI models from generalist ones, potentially improving efficiency and performance in specific applications.
- arXiv
- From Generalist to Specialist Representation
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Plan Before You Trade: Inference-Time Optimization for RL Trading Agents

Researchers have developed FPILOT, a framework that enhances reinforcement learning agents for trading by incorporating price forecasts at inference time. This approach, inspired by Model Predictive Control, allows agents to optimize their trading strategies based on predicted future price trajectories without requiring retraining. Evaluations on the TradeMaster DJ30 benchmark demonstrated consistent improvements in total return and risk-adjusted metrics across various policy learning algorithms. AI

IMPACT Enhances financial trading strategies by enabling RL agents to leverage price forecasts for better decision-making.
RESEARCH · arXiv stat.ML · 1d · [3 sources]

Population Risk Bounds for Kolmogorov-Arnold Networks Trained by DP-SGD with Correlated Noise

Researchers have established new theoretical bounds for training Kolmogorov-Arnold Networks (KANs), a structured alternative to standard MLPs. The work analyzes KANs trained with mini-batch stochastic gradient descent (SGD), including differentially private variants with correlated noise. These findings reveal a gap between non-private and private training regimes, suggesting that polylogarithmic network width is necessary for differential privacy. AI

IMPACT Establishes theoretical underpinnings for KANs, potentially guiding future research in privacy-preserving machine learning.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Robust Sequential Experimental Design for A/B Testing

Researchers have developed a new framework for robust sequential experimental design in A/B testing, specifically addressing challenges posed by model misspecification. This approach aims to improve sample efficiency by bounding the worst-case mean squared error of estimated treatment effects. The framework's effectiveness has been demonstrated through both synthetic data and real-world datasets from a major technology company. AI

IMPACT Introduces a more reliable method for evaluating product changes, potentially improving decision-making in tech companies.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Researchers have introduced Pion, a novel spectrum-preserving optimizer designed for training large language models. Unlike traditional additive optimizers like Adam, Pion utilizes orthogonal transformations to update weight matrices, maintaining their singular values and spectral norm. This approach offers a stable and competitive alternative for both LLM pretraining and finetuning, as demonstrated by empirical results. AI

IMPACT Introduces a new optimization method that could improve LLM training stability and performance.
- Pion
- large language model
- Adam
- Muon
RESEARCH · arXiv stat.ML · 1d · [2 sources]

A proximal gradient algorithm for composite log-concave sampling

Researchers have developed a new proximal gradient algorithm designed to sample from composite log-concave distributions. This algorithm assumes access to gradient evaluations for one part of the distribution and a restricted Gaussian oracle for the other. The proposed method achieves state-of-the-art iteration counts for sampling, matching previous results for simpler cases and extending to non-log-concave distributions and non-smooth functions. AI

IMPACT Introduces a novel sampling technique that could improve efficiency in statistical modeling and machine learning applications.
- arXiv
- Mathematics
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Model-based Bootstrap of Controlled Markov Chains

Researchers have developed a new model-based bootstrap method for controlled Markov chains, particularly useful in offline reinforcement learning scenarios where the data-generating policy is unknown. This technique establishes distributional consistency for transition estimators and extends to policy evaluation and recovery, providing asymptotically valid confidence intervals for value and Q-functions. Experimental results on the RiverSwim problem demonstrate that the proposed confidence intervals offer improved calibration and coverage compared to existing methods, especially with limited data. AI

IMPACT Improves confidence interval calibration for offline reinforcement learning, aiding in more reliable policy evaluation and recovery.
RESEARCH · arXiv cs.CL · 1d · [2 sources]

Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering

Researchers have introduced MedHopQA, a new benchmark designed to evaluate the multi-hop reasoning capabilities of large language models in the biomedical domain. This benchmark consists of 1,000 expert-curated question-answer pairs, each requiring information synthesis from two distinct Wikipedia articles, with answers provided in free text. The MedHopQA dataset was presented as a shared task at BioCreative IX, attracting 48 submissions from 13 teams, and highlighted the effectiveness of retrieval-augmented generation strategies for improved performance. AI

IMPACT Establishes a new standard for evaluating complex biomedical reasoning in LLMs, pushing for more robust and contamination-resistant benchmarks.
- MedHopQA
- LLM
- BioCreative IX
- Wikipedia
- MONDO
- NCBI Gene
- NCBI Taxonomy
RESEARCH · arXiv stat.ML · 1d · [5 sources]

Multi-Variable Conformal Prediction: Optimizing Prediction Sets without Data Splitting

Two new research papers introduce advanced conformal prediction techniques to improve the accuracy and efficiency of prediction sets. The first paper, "Multi-Variable Conformal Prediction (MCP)," extends conformal prediction to handle vector-valued score functions, allowing for more flexible prediction set shapes without sacrificing coverage guarantees and eliminating the need for data splitting. The second paper, "Shape-Adaptive Conditional Calibration for Conformal Prediction via Minimax Optimization," presents the Minimax Optimization Predictive Inference (MOPI) framework, which optimizes over a flexible class of set-valued mappings to achieve superior shape adaptivity and more efficient prediction sets, even for complex conditional distributions. AI

IMPACT These new methods could lead to more reliable and efficient predictive models in machine learning by improving the calibration of prediction sets.
RESEARCH · arXiv cs.AI · 1d · [2 sources]

QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning

Researchers have developed new reinforcement learning (RL) methods to address the qubit allocation problem in quantum computing compilation. Two distinct approaches, CO-MAP and QAP-Router, frame the problem as a combinatorial optimization or dynamic quadratic assignment task, respectively. Both methods leverage RL policies trained on real-world quantum circuit datasets, demonstrating significant reductions in SWAP gate overhead and CNOT gate counts compared to existing compilers. AI

IMPACT These RL-based approaches offer significant improvements in quantum circuit compilation, potentially accelerating the development and practical application of quantum computing.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Online Learning-to-Defer with Varying Experts

Researchers have developed a new online algorithm for Learning-to-Defer (L2D) methods, designed to handle streaming data and dynamic expert availability. This algorithm is the first of its kind for multiclass classification with bandit feedback and a varying pool of experts. It offers theoretical regret guarantees and has demonstrated effectiveness in experiments on both synthetic and real-world datasets, extending L2D capabilities to more complex, dynamic environments. AI

IMPACT Introduces a novel algorithmic approach for dynamic expert selection in machine learning, potentially improving efficiency in real-time decision-making systems.
- Yannis Montreuil
RESEARCH · arXiv cs.CL · 1d · [2 sources]

Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control

Researchers are exploring the use of large language models (LLMs) for enhancing safety in air traffic control (ATC) and around non-towered airports. One study proposes a vision-language model approach to analyze radio communications, weather data, and flight trajectories for safety assessments, achieving high F1 scores with open-source models. Another paper introduces a safety-oriented evaluation framework that highlights the critical need for consequence-aware metrics, as standard accuracy measures can mask severe risks in ATC operations. AI

IMPACT LLM analysis could improve safety and efficiency in critical air traffic control operations.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Optimal Policy Learning under Budget and Coverage Constraints

Researchers have developed a new framework for optimal policy learning that addresses combined budget and minimum coverage constraints. The study reveals a knapsack-type structure within the problem, allowing the optimal policy to be defined by an affine threshold rule. Two algorithms, Greedy-Lagrangian (GLC) and rank-and-cut (RC), are proposed to implement this approach, with GLC offering close approximation and RC showing near-optimality under specific conditions. AI

IMPACT Introduces a novel algorithmic approach for optimizing resource allocation in policy learning scenarios.
- Greedy-Lagrangian (GLC)
- rank-and-cut (RC)
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Researchers have developed a new method called Self-Supervised Laplace Approximation (SSLA) to directly approximate the posterior predictive distribution in Bayesian models. This approach draws inspiration from self-training techniques and quantifies predictive uncertainty by refitting the model on its own predictions. The SSLA method offers a deterministic, sampling-free approximation that outperforms classical Laplace approximations in predictive calibration for regression tasks, including Bayesian neural networks, while maintaining computational efficiency. AI

IMPACT Offers a more computationally efficient and accurate method for assessing uncertainty in Bayesian models, potentially improving reliability in AI applications.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions

Researchers have developed a new method to improve the efficiency of training neural likelihood surrogates for stochastic process models. By augmenting the standard loss function with exact score information and adaptive weighting, the approach significantly reduces the computational cost associated with parameter inference. This technique demonstrates improved surrogate quality and can achieve performance comparable to a tenfold increase in training data with only a marginal increase in training time. AI

IMPACT Reduces computational cost for parameter inference in stochastic process models, potentially accelerating research and development in fields relying on such models.
RESEARCH · Mastodon — fosstodon.org · 11h · [2 sources]

"The developers I talked to agreed that LLMs will stick around and play a role in programming in the future in some fashion, but worried about how the industry

Frontier AI models are showing a rapid increase in their ability to handle complex tasks, with their reliability doubling every 4.7 months, a rate that has accelerated since late 2024. Recent models like Claude Mythos Preview and GPT-5.5 are outperforming these trends, though their exact capabilities are still being measured due to near-perfect success rates on current benchmarks. This rapid progress challenges existing testing methodologies, as models are pushing the limits of token capacity and agent scaffolding, making it difficult to accurately assess their performance and potential deterioration at scale. AI

IMPACT Rapid advancements in frontier models may necessitate new evaluation methods and could accelerate the adoption of AI in complex domains.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage

Researchers have developed a novel physics-informed generative framework to model yield curve dynamics, addressing the conflict between deep learning's flexibility and fixed-income modeling's theoretical constraints. The proposed two-stage architecture, featuring a Student-t Conditional Variational Autoencoder with Dynamic Level Injection (CVAEsT+LS) and a Neural Stochastic Differential Equation penalized by a No-Arbitrage PDE, significantly reduces forecasting errors. This approach demonstrates superior performance in predicting term structures across various macroeconomic regimes and currencies, outperforming traditional models like HJM. AI

IMPACT Enhances financial modeling accuracy and scenario generation capabilities for term structure prediction.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Approximation Theory of Laplacian-Based Neural Operators for Reaction-Diffusion System

Researchers have developed a new theoretical framework for neural operators, a type of AI model used to learn solutions for complex systems like partial differential equations. This work specifically addresses the approximation analysis for nonlinear reaction-diffusion systems, which are crucial for modeling pattern formation. The study establishes explicit error bounds and demonstrates that their proposed Laplacian eigenfunction-based architecture can significantly reduce the parameter complexity required for accurate predictions. AI

IMPACT Provides a theoretical foundation for using neural operators to model complex physical systems more efficiently.
RESEARCH · arXiv cs.CV · 1d · [2 sources]

EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

Researchers have developed two new frameworks for improving 3D hand pose estimation from egocentric camera views. EgoForce utilizes a differentiable forearm representation and a unified transformer to achieve state-of-the-art accuracy across various camera types, reducing MPJPE by up to 28%. EgoEV-HandPose, on the other hand, employs stereo event cameras and a novel KeypointBEV fusion module to jointly estimate bimanual hand poses and recognize gestures, achieving an MPJPE of 30.54mm and 86.87% gesture recognition accuracy. Both methods aim to enhance applications in AR/VR and human-computer interaction by providing more robust and accurate hand tracking. AI

IMPACT These advancements in egocentric hand tracking could significantly improve the realism and interactivity of AR/VR experiences and human-computer interfaces.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Random-Set Graph Neural Networks

Researchers have introduced Random-Set Graph Neural Networks (RS-GNNs) to address uncertainty quantification in graph learning. This new framework models node-level epistemic uncertainty using a belief function formalism. Experiments on nine datasets, including autonomous driving benchmarks, show RS-GNNs offer improved uncertainty estimation capabilities. AI

IMPACT Improves reliability of graph-based AI systems by quantifying uncertainty in predictions.
RESEARCH · arXiv stat.ML Deutsch(DE) · 1d · [2 sources]

QDSB: Quantized Diffusion Schrödinger Bridges

Researchers have introduced Quantized Diffusion Schrödinger Bridges (QDSB), a novel method for learning generative models from unpaired data. QDSB addresses the computational challenges of traditional Schrödinger bridges by quantizing endpoint distributions and using cell-wise sampling to reconstruct the data plan. This approach significantly reduces training time while maintaining sample quality comparable to existing methods. AI

IMPACT Accelerates generative model training by reducing computational costs and time.