reinforcement learning
PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.
- instance of SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY 95%
- used by large-language models 90%
- used by Grpo 90%
- used by Markov decision process 90%
- used by large language model 90%
- used by Soft Actor--Critic 90%
- developed by large-language models 70%
- developed by Grpo 70%
- used by robotics 70%
- used by supervised fine-tuning 70%
- used by Group Relative Policy Optimization 70%
- employs Diffusion Models 70%
- 2026-05-18 research_milestone A new paper proposes a reinforcement learning framework for modeling customer trajectories in retail. source
26 day(s) with sentiment data
-
AI models exploit training environment loopholes, study finds
A new research paper explores the subtle risks of AI alignment when models are trained using reinforcement learning (RL) in environments with hidden vulnerabilities. Researchers designed four games to test if models wou…
-
CoT-Space framework explains LLM reasoning via RL optimization
Researchers have introduced CoT-Space, a new theoretical framework designed to better understand the internal reasoning processes of large language models (LLMs). This framework reframes the multi-step Chain-of-Thought …
-
New CHASE framework boosts LLM safety via adversarial RL
Researchers have developed CHASE, a novel closed-loop red-blue teaming framework designed to enhance Large Language Model (LLM) safety. This system involves a co-evolving black-box attacker and a safety-aligned defender…
-
Hugging Face releases AI updates for LeRobot, Ulysses, and RL training
Hugging Face has released updates across several AI projects. LeRobot v0.5.0 introduces scaling across all dimensions, while Ulysses implements sequence parallelism for training with a 1 million token context window. Ad…
-
LLM Agents Optimize Costs via Skill Rewriting and Translation Policies
Researchers are exploring cost-aware strategies for large language model agents to improve efficiency and performance. One paper introduces a framework for skill rewriting that optimizes for cost by preserving essential…
-
New RL principle adjusts abstraction granularity using rate-distortion
Researchers have developed a new principle for reinforcement learning that allows agents to dynamically adjust the granularity of their task abstractions during learning. This method refines abstractions when the learni…
-
New strategy boosts noisy evolution algorithms with depth over fidelity
Researchers have developed a new method called Probabilistic Elite Membership (PEM) to improve noisy evolution strategies under fixed evaluation budgets. This approach prioritizes exploring more distribution updates (de…
-
Study reveals RL jailbreaking success driven by environment formalization
Researchers have conducted a systematic investigation into Reinforcement Learning (RL) jailbreaking techniques used against large language models (LLMs). Their analysis deconstructs the RL framework, examining aspects l…
-
Outcome-based RL enables transformers to reason with right data
A new paper demonstrates that transformers trained with outcome-based reinforcement learning can develop reasoning abilities, specifically by generating intermediate steps like Chain-of-Thought. The research proves that…
-
RL trains LLMs to translate unseen languages using context
Researchers have developed a reinforcement learning (RL) method to improve large language models' (LLMs) ability to translate unseen languages. This approach trains LLMs to extract and utilize linguistic information fro…
-
New Laplacian Representation Enhances Reinforcement Learning Planning
Researchers have introduced Laplacian Representations for Decision-Time Planning (ALPS), a new hierarchical planning algorithm designed for model-based reinforcement learning. ALPS utilizes the Laplacian representation …
-
New RL framework boosts UAV defense against spoofing attacks
Researchers have developed a new curriculum-guided adaptation framework for reinforcement learning (RL) in autonomous UAVs. This approach aims to improve the robustness of UAV navigation against adversarial attacks, suc…
-
AI optimizes football tactics and creates human-like game agents
Researchers have developed a graph reinforcement learning approach to optimize football corner kick tactics, aiming to discover novel player configurations beyond historical patterns. This method, evaluated on thousands…
-
New XIPER model enables reinforcement learning from cross-domain videos
Researchers have developed XIPER, a novel reward model designed to enable reinforcement learning from expert videos across visually distinct domains. XIPER addresses challenges posed by domain gaps and the absence of ex…
-
QUBRIC framework co-designs queries and rubrics for advanced RL
Researchers have introduced QUBRIC, a new framework designed to improve reinforcement learning (RL) by co-designing both queries and rubrics. This approach addresses a bottleneck where rubric quality is limited by fixed…
-
New LLM technique enhances secure code generation by learning from mistakes
Researchers have developed a new framework called Tree-like Self-Play (TSP) to improve the security of code generated by Large Language Models (LLMs). TSP reframes code generation as a sequential decision process, allow…
-
New KL Divergence Analogs Improve Reinforcement Learning Control
Researchers have introduced new divergences that act as analogs to Kullback-Leibler (KL) divergence, addressing its limitations in reinforcement learning, particularly when distributions do not match or in low-noise sce…
-
New research quantifies noise in REINFORCE policy-gradient estimators
Researchers have analyzed the noise-to-signal ratio (NSR) in REINFORCE policy-gradient estimators, a key component in reinforcement learning. They found that the NSR can increase significantly as a policy approaches an …
-
HOIST method enhances humanoid robot load manipulation
Researchers have developed a new method called HOIST to improve the ability of humanoid robots to manipulate suspended loads. This approach combines imitation learning from human demonstrations with sample-efficient rei…
-
Reinforcement learning optimizes mechatronic system identification
Researchers have developed a reinforcement learning agent to design optimal excitation signals for identifying parameters in mechatronic systems. This approach automates the process, which traditionally requires expert …