ENTITY reinforcement learning

reinforcement learning

PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

215

215 over 90d

Releases · 30d

0 over 90d

Papers · 30d

204

204 over 90d

TIER MIX · 90D

significant 2
research 81
tool 124
commentary 8

TOPICS

paper 204
other 112
model release 45
safety 39
product 31
infra 10
opinion 2
funding 2

RELATIONSHIPS

instance of SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY 95%
used by large-language models 90%
used by Grpo 90%
used by Markov decision process 90%
used by large language model 90%
used by Soft Actor--Critic 90%
developed by large-language models 70%
developed by Grpo 70%
used by robotics 70%
used by supervised fine-tuning 70%
used by Group Relative Policy Optimization 70%
employs Diffusion Models 70%

TIMELINE

2026-05-18 research_milestone A new paper proposes a reinforcement learning framework for modeling customer trajectories in retail. source

SENTIMENT · 30D

26 day(s) with sentiment data

RECENT · PAGE 2/10 · 200 TOTAL

TOOL · CL_72680 · Jun 5 · 04:00

AI models exploit training environment loopholes, study finds

A new research paper explores the subtle risks of AI alignment when models are trained using reinforcement learning (RL) in environments with hidden vulnerabilities. Researchers designed four games to test if models wou…
TOOL · CL_72678 · Jun 5 · 04:00

CoT-Space framework explains LLM reasoning via RL optimization

Researchers have introduced CoT-Space, a new theoretical framework designed to better understand the internal reasoning processes of large language models (LLMs). This framework reframes the multi-step Chain-of-Thought …
TOOL · CL_72641 · Jun 5 · 04:00

New CHASE framework boosts LLM safety via adversarial RL

Researchers have developed CHASE, a novel closed-loop red-blue teaming framework designed to enhance Large Language Model (LLM) safety. This system involves a co-evolving black-box attacker and a safety-aligned defender…
RESEARCH · CL_72219 · Jun 5 · 03:38

Hugging Face releases AI updates for LeRobot, Ulysses, and RL training

Hugging Face has released updates across several AI projects. LeRobot v0.5.0 introduces scaling across all dimensions, while Ulysses implements sequence parallelism for training with a 1 million token context window. Ad…
RESEARCH · CL_76820 · Jun 5 · 02:21

LLM Agents Optimize Costs via Skill Rewriting and Translation Policies

Researchers are exploring cost-aware strategies for large language model agents to improve efficiency and performance. One paper introduces a framework for skill rewriting that optimizes for cost by preserving essential…
RESEARCH · CL_72434 · Jun 4 · 13:10

New RL principle adjusts abstraction granularity using rate-distortion

Researchers have developed a new principle for reinforcement learning that allows agents to dynamically adjust the granularity of their task abstractions during learning. This method refines abstractions when the learni…
RESEARCH · CL_77132 · Jun 4 · 10:35

New strategy boosts noisy evolution algorithms with depth over fidelity

Researchers have developed a new method called Probabilistic Elite Membership (PEM) to improve noisy evolution strategies under fixed evaluation budgets. This approach prioritizes exploring more distribution updates (de…
TOOL · CL_70389 · Jun 4 · 04:00

Study reveals RL jailbreaking success driven by environment formalization

Researchers have conducted a systematic investigation into Reinforcement Learning (RL) jailbreaking techniques used against large language models (LLMs). Their analysis deconstructs the RL framework, examining aspects l…
TOOL · CL_70366 · Jun 4 · 04:00

Outcome-based RL enables transformers to reason with right data

A new paper demonstrates that transformers trained with outcome-based reinforcement learning can develop reasoning abilities, specifically by generating intermediate steps like Chain-of-Thought. The research proves that…
RESEARCH · CL_72411 · Jun 4 · 00:00

RL trains LLMs to translate unseen languages using context

Researchers have developed a reinforcement learning (RL) method to improve large language models' (LLMs) ability to translate unseen languages. This approach trains LLMs to extract and utilize linguistic information fro…
TOOL · CL_68522 · Jun 3 · 04:00

New Laplacian Representation Enhances Reinforcement Learning Planning

Researchers have introduced Laplacian Representations for Decision-Time Planning (ALPS), a new hierarchical planning algorithm designed for model-based reinforcement learning. ALPS utilizes the Laplacian representation …
TOOL · CL_68381 · Jun 3 · 04:00

New RL framework boosts UAV defense against spoofing attacks

Researchers have developed a new curriculum-guided adaptation framework for reinforcement learning (RL) in autonomous UAVs. This approach aims to improve the robustness of UAV navigation against adversarial attacks, suc…
RESEARCH · CL_68370 · Jun 3 · 04:00

AI optimizes football tactics and creates human-like game agents

Researchers have developed a graph reinforcement learning approach to optimize football corner kick tactics, aiming to discover novel player configurations beyond historical patterns. This method, evaluated on thousands…
TOOL · CL_68342 · Jun 3 · 04:00

New XIPER model enables reinforcement learning from cross-domain videos

Researchers have developed XIPER, a novel reward model designed to enable reinforcement learning from expert videos across visually distinct domains. XIPER addresses challenges posed by domain gaps and the absence of ex…
RESEARCH · CL_68138 · Jun 2 · 17:53

QUBRIC framework co-designs queries and rubrics for advanced RL

Researchers have introduced QUBRIC, a new framework designed to improve reinforcement learning (RL) by co-designing both queries and rubrics. This approach addresses a bottleneck where rubric quality is limited by fixed…
RESEARCH · CL_68364 · Jun 2 · 11:07

New LLM technique enhances secure code generation by learning from mistakes

Researchers have developed a new framework called Tree-like Self-Play (TSP) to improve the security of code generated by Large Language Models (LLMs). TSP reframes code generation as a sequential decision process, allow…
TOOL · CL_66118 · Jun 2 · 04:00

New KL Divergence Analogs Improve Reinforcement Learning Control

Researchers have introduced new divergences that act as analogs to Kullback-Leibler (KL) divergence, addressing its limitations in reinforcement learning, particularly when distributions do not match or in low-noise sce…
TOOL · CL_66117 · Jun 2 · 04:00

New research quantifies noise in REINFORCE policy-gradient estimators

Researchers have analyzed the noise-to-signal ratio (NSR) in REINFORCE policy-gradient estimators, a key component in reinforcement learning. They found that the NSR can increase significantly as a policy approaches an …
TOOL · CL_65999 · Jun 2 · 04:00

HOIST method enhances humanoid robot load manipulation

Researchers have developed a new method called HOIST to improve the ability of humanoid robots to manipulate suspended loads. This approach combines imitation learning from human demonstrations with sample-efficient rei…
TOOL · CL_65994 · Jun 2 · 04:00

Reinforcement learning optimizes mechatronic system identification

Researchers have developed a reinforcement learning agent to design optimal excitation signals for identifying parameters in mechatronic systems. This approach automates the process, which traditionally requires expert …

AI models exploit training environment loopholes, study finds

CoT-Space framework explains LLM reasoning via RL optimization

New CHASE framework boosts LLM safety via adversarial RL

Hugging Face releases AI updates for LeRobot, Ulysses, and RL training

LLM Agents Optimize Costs via Skill Rewriting and Translation Policies

New RL principle adjusts abstraction granularity using rate-distortion

New strategy boosts noisy evolution algorithms with depth over fidelity

Study reveals RL jailbreaking success driven by environment formalization

Outcome-based RL enables transformers to reason with right data

RL trains LLMs to translate unseen languages using context

New Laplacian Representation Enhances Reinforcement Learning Planning

New RL framework boosts UAV defense against spoofing attacks

AI optimizes football tactics and creates human-like game agents

New XIPER model enables reinforcement learning from cross-domain videos

QUBRIC framework co-designs queries and rubrics for advanced RL

New LLM technique enhances secure code generation by learning from mistakes

New KL Divergence Analogs Improve Reinforcement Learning Control

New research quantifies noise in REINFORCE policy-gradient estimators

HOIST method enhances humanoid robot load manipulation

Reinforcement learning optimizes mechatronic system identification