Dopravní podnik Ostrava
PulseAugur coverage of Dopravní podnik Ostrava — every cluster mentioning Dopravní podnik Ostrava across labs, papers, and developer communities, ranked by signal.
No coverage in the last 90 days.
2 day(s) with sentiment data
-
New TBPO method optimizes language models at token level
Researchers have introduced Token-level Bregman Preference Optimization (TBPO), a new method for aligning language models using pairwise preferences. Unlike existing approaches that focus on full sequences, TBPO operate…
-
EvoPref algorithm enhances LLM alignment with evolutionary optimization
Researchers have developed EvoPref, a novel multi-objective evolutionary algorithm designed to improve the alignment of large language models (LLMs). Unlike traditional gradient-based methods that can lead to preference…
-
DPO vs SimPO: Removing Reference Model Alters Preference Tuning
A recent article explores the differences between Direct Preference Optimization (DPO) and Simplified Preference Optimization (SimPO) in the context of fine-tuning large language models. It highlights how SimPO's remova…
-
DPO vs SimPO: Preference tuning methods compared for LLM training
A recent analysis highlights a critical discrepancy in preference tuning methodologies for large language models, specifically comparing Direct Preference Optimization (DPO) and Simplified Preference Optimization (SimPO…
-
Diffusion models align with human preferences using game theory and Nash equilibrium
Researchers have introduced Diffusion Nash Preference Optimization (Diff.-NPO), a novel framework for aligning text-to-image diffusion models with human preferences. This approach moves beyond traditional methods like D…
-
TUR-DPO enhances LLM alignment by incorporating topology and uncertainty into preference optimization.
Researchers have introduced TUR-DPO, a novel method for aligning large language models with human preferences. Unlike standard Direct Preference Optimization (DPO), TUR-DPO incorporates topology and uncertainty awarenes…
-
New theories explore how pre-training and sparse connectivity enhance deep learning generalization
Three new papers explore the theoretical underpinnings of generalization in deep learning. One paper identifies pre-training as a critical factor for weak-to-strong generalization, demonstrating its emergence through a …
-
AI model finetuning mostly idempotent, DPO can amplify traits
A guide explores advanced techniques for post-training large language models, focusing on Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). These methods …
-
Anthropic's new 'Introspection Adapters' let LLMs self-report behaviors
Researchers have developed a novel technique called "Introspection Adapters" (IA) that allows large language models to report their own learned behaviors, including hidden biases and encrypted malicious instructions. Th…
-
Researchers propose structure-aware consistency for LLM preference learning
Researchers have identified a theoretical inconsistency in popular preference learning methods like Direct Preference Optimization (DPO) used for aligning Large Language Models (LLMs). The study proposes a new framework…
-
LLMs know they're wrong and agree anyway, research finds
Researchers have developed two novel methods, BAL-A and BMP-A, to efficiently poison preference datasets used in offline Reinforcement Learning from Human Feedback (RLHF) pipelines like Direct Preference Optimization (D…
-
AI models show artificial consensus, collapsing philosophical heterogeneity
A new research paper published on arXiv investigates the use of large language models (LLMs) as substitutes for human judgment in philosophical contexts. The study found that LLMs tend to over-correlate philosophical po…
-
AgentHER framework boosts LLM agent training with failed trajectory relabeling
Researchers have developed AgentHER, a new framework designed to improve the training of LLM agents by repurposing failed trajectories. The system adapts Hindsight Experience Replay to natural language, identifying alte…
-
Researchers refine preference optimization for LLMs with new methods
Researchers have introduced RMiPO, a new framework for offline preference optimization that uses intrinsic response-level mutual information to dynamically adjust preference contributions. This method aims to improve La…
-
Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
Researchers are developing new methods to address the limitations of current large language model (LLM) alignment techniques. One study highlights the 'Selective Safety Trap,' where LLMs protect certain demographics whi…
-
OpenAI trains AI with human preference feedback; Chip Huyen proposes predictive model routing
OpenAI and DeepMind have developed a new algorithm that learns desired behaviors from human feedback, reducing the need for explicit goal functions. This method uses a three-step cycle where humans compare two agent beh…