ENTITY Dopravní podnik Ostrava

Dopravní podnik Ostrava

PulseAugur coverage of Dopravní podnik Ostrava — every cluster mentioning Dopravní podnik Ostrava across labs, papers, and developer communities, ranked by signal.

Total · 30d

0 over 90d

Releases · 30d

0 over 90d

Papers · 30d

0 over 90d

TIER MIX · 90D

No coverage in the last 90 days.

RELATIONSHIPS

instance of Direct Preference Optimization: Your Language Model is Secretly a Reward Model 60%

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 16 TOTAL

TOOL · CL_29384 · May 12 · 15:44

New TBPO method optimizes language models at token level

Researchers have introduced Token-level Bregman Preference Optimization (TBPO), a new method for aligning language models using pairwise preferences. Unlike existing approaches that focus on full sequences, TBPO operate…
TOOL · CL_27578 · May 10 · 21:50

EvoPref algorithm enhances LLM alignment with evolutionary optimization

Researchers have developed EvoPref, a novel multi-objective evolutionary algorithm designed to improve the alignment of large language models (LLMs). Unlike traditional gradient-based methods that can lead to preference…
RESEARCH · CL_23484 · May 8 · 19:28

DPO vs SimPO: Removing Reference Model Alters Preference Tuning

A recent article explores the differences between Direct Preference Optimization (DPO) and Simplified Preference Optimization (SimPO) in the context of fine-tuning large language models. It highlights how SimPO's remova…
TOOL · CL_21435 · May 7 · 20:51

DPO vs SimPO: Preference tuning methods compared for LLM training

A recent analysis highlights a critical discrepancy in preference tuning methodologies for large language models, specifically comparing Direct Preference Optimization (DPO) and Simplified Preference Optimization (SimPO…
RESEARCH · CL_20330 · May 6 · 04:50

Diffusion models align with human preferences using game theory and Nash equilibrium

Researchers have introduced Diffusion Nash Preference Optimization (Diff.-NPO), a novel framework for aligning text-to-image diffusion models with human preferences. This approach moves beyond traditional methods like D…
RESEARCH · CL_15452 · May 5 · 04:00

TUR-DPO enhances LLM alignment by incorporating topology and uncertainty into preference optimization.

Researchers have introduced TUR-DPO, a novel method for aligning large language models with human preferences. Unlike standard Direct Preference Optimization (DPO), TUR-DPO incorporates topology and uncertainty awarenes…
RESEARCH · CL_15445 · May 2 · 00:21

New theories explore how pre-training and sparse connectivity enhance deep learning generalization

Three new papers explore the theoretical underpinnings of generalization in deep learning. One paper identifies pre-training as a critical factor for weak-to-strong generalization, demonstrating its emergence through a …
RESEARCH · CL_12572 · May 1 · 21:03

AI model finetuning mostly idempotent, DPO can amplify traits

A guide explores advanced techniques for post-training large language models, focusing on Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). These methods …
RESEARCH · CL_10757 · Apr 30 · 11:59

Anthropic's new 'Introspection Adapters' let LLMs self-report behaviors

Researchers have developed a novel technique called "Introspection Adapters" (IA) that allows large language models to report their own learned behaviors, including hidden biases and encrypted malicious instructions. Th…
RESEARCH · CL_14655 · Apr 30 · 11:24

Researchers propose structure-aware consistency for LLM preference learning

Researchers have identified a theoretical inconsistency in popular preference learning methods like Direct Preference Optimization (DPO) used for aligning Large Language Models (LLMs). The study proposes a new framework…
RESEARCH · CL_15418 · Apr 28 · 04:00

LLMs know they're wrong and agree anyway, research finds

Researchers have developed two novel methods, BAL-A and BMP-A, to efficiently poison preference datasets used in offline Reinforcement Learning from Human Feedback (RLHF) pipelines like Direct Preference Optimization (D…
RESEARCH · CL_06667 · Apr 28 · 04:00

AI models show artificial consensus, collapsing philosophical heterogeneity

A new research paper published on arXiv investigates the use of large language models (LLMs) as substitutes for human judgment in philosophical contexts. The study found that LLMs tend to over-correlate philosophical po…
RESEARCH · CL_06733 · Apr 28 · 04:00

AgentHER framework boosts LLM agent training with failed trajectory relabeling

Researchers have developed AgentHER, a new framework designed to improve the training of LLM agents by repurposing failed trajectories. The system adapts Hindsight Experience Replay to natural language, identifying alte…
RESEARCH · CL_06900 · Apr 27 · 19:49

Researchers refine preference optimization for LLMs with new methods

Researchers have introduced RMiPO, a new framework for offline preference optimization that uses intrinsic response-level mutual information to dynamically adjust preference contributions. This method aims to improve La…
RESEARCH · CL_06339 · Apr 27 · 08:36

Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment

Researchers are developing new methods to address the limitations of current large language model (LLM) alignment techniques. One study highlights the 'Selective Safety Trap,' where LLMs protect certain demographics whi…
RESEARCH · CL_02599 · Jun 13 · 07:00

OpenAI trains AI with human preference feedback; Chip Huyen proposes predictive model routing

OpenAI and DeepMind have developed a new algorithm that learns desired behaviors from human feedback, reducing the need for explicit goal functions. This method uses a three-step cycle where humans compare two agent beh…

New TBPO method optimizes language models at token level

EvoPref algorithm enhances LLM alignment with evolutionary optimization

DPO vs SimPO: Removing Reference Model Alters Preference Tuning

DPO vs SimPO: Preference tuning methods compared for LLM training

Diffusion models align with human preferences using game theory and Nash equilibrium

TUR-DPO enhances LLM alignment by incorporating topology and uncertainty into preference optimization.

New theories explore how pre-training and sparse connectivity enhance deep learning generalization

AI model finetuning mostly idempotent, DPO can amplify traits

Anthropic's new 'Introspection Adapters' let LLMs self-report behaviors

Researchers propose structure-aware consistency for LLM preference learning

LLMs know they're wrong and agree anyway, research finds

AI models show artificial consensus, collapsing philosophical heterogeneity

AgentHER framework boosts LLM agent training with failed trajectory relabeling

Researchers refine preference optimization for LLMs with new methods

Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment

OpenAI trains AI with human preference feedback; Chip Huyen proposes predictive model routing