Markov decision process
PulseAugur coverage of Markov decision process — every cluster mentioning Markov decision process across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
New Q-value iteration analysis uses switching geometry
This paper introduces a new framework for analyzing Q-value iteration in Markov decision processes, focusing on a technique called rank-one deflation. The authors interpret the algorithm's behavior through the geometry …
-
New protocol optimizes drug trial subsidies to boost social utility
Researchers have developed a new statistical protocol for sequential experimentation that aims to optimize social utility in high-stakes domains like drug development. This protocol involves a product developer conducti…
-
Q-MMR framework offers novel approach to off-policy evaluation
Researchers have introduced Q-MMR, a new theoretical framework for off-policy evaluation in Markov Decision Processes (MDPs). This method learns weights for data points to approximate expected returns under a target pol…
-
Reinforcement learning enhances autonomous target tracking accuracy and robustness
Researchers have developed a deep reinforcement learning approach for autonomous bearings-only tracking of moving targets. The system formulates the observer maneuver problem as a belief Markov decision process, using a…
-
Reinforcement learning uses symmetry and data augmentation for faster aircraft control
Researchers have developed a new method for offline reinforcement learning that leverages the symmetry of dynamical systems to improve sample efficiency. This approach uses symmetric data augmentation to enhance the sta…
-
New metric-normalized posterior leakage (mPL) enhances privacy for joint AI consumption
Researchers have developed a new privacy metric called Metric-Normalized Posterior Leakage (mPL) to address limitations in existing differential privacy methods, particularly for machine learning systems used under join…
-
New research advances adversarial imitation learning theory and practice
Two new papers explore the theoretical underpinnings of adversarial imitation learning (AIL), a technique that uses neural networks to learn from expert demonstrations. The first paper introduces OPT-AIL, a framework de…
-
RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts
Researchers have developed a new framework called RAST-MoE-RL to improve efficiency in ride-hailing services. This framework utilizes a Mixture-of-Experts (MoE) approach within deep reinforcement learning to better hand…
-
Researchers find random data deletion improves adaptive RL policies
Researchers have discovered that randomly deleting a portion of training data can significantly improve the performance of adaptive reinforcement learning policies. This counterintuitive technique helps by implicitly do…
-
DRL framework optimizes NR-U/Wi-Fi coexistence for fairness and throughput
Researchers have developed a policy-driven deep reinforcement learning framework to manage resource allocation between NR-U and Wi-Fi networks operating in unlicensed spectrum. This framework uses a deep Q-network to le…
-
AutoREC platform uses RL agents to generate circuit models from EIS data
Researchers have developed AutoREC, an open-source Python package designed to automate the generation of equivalent circuit models (ECMs) from electrochemical impedance spectroscopy (EIS) data. This platform utilizes re…
-
Yann LeCun clarifies technical definition of 'world models' in AI
Yann LeCun shared a technical discussion regarding the term "world models" in AI. He clarified that in control theory and the context of Markov Decision Processes (MDPs), "world models" specifically refers to transition…
-
AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation
Researchers have developed AsyncShield, a new framework designed to improve the navigation capabilities of Vision-Language-Action (VLA) models on mobile robots. This system addresses the latency and network jitter issue…
-
New algorithm identifies near-optimal policies in robust constrained Markov decision processes
Researchers have developed a novel algorithm to identify near-optimal policies in robust constrained Markov decision processes (RCMDPs). This new method addresses limitations in existing policy gradient approaches that …
-
Researchers develop MDP and POMDP for error mitigation in digital twins
Researchers have developed a new framework for mitigating error propagation in modular digital twins by treating it as a sequential decision-making problem. They formulated this using a Markov Decision Process (MDP) and…