LLaMA-3-8B-Instruct
PulseAugur coverage of LLaMA-3-8B-Instruct — every cluster mentioning LLaMA-3-8B-Instruct across labs, papers, and developer communities, ranked by signal.
-
New method uses model's own outputs for safety fine-tuning
Researchers have developed a novel method for safety fine-tuning language models by identifying and utilizing the most challenging prompts. This technique involves scoring prompts based on the frequency of harmful model…
-
New attack redirects LLM attention to bypass safety alignment
Researchers have developed a new white-box adversarial attack called the Attention Redistribution Attack (ARA) that targets the internal attention mechanisms of safety-aligned large language models. This attack crafts n…
-
The Measure of Deception: An Analysis of Data Forging in Machine Unlearning
Two new research papers explore vulnerabilities and detection methods in machine unlearning, a process designed to remove specific data from trained models for privacy compliance. One paper, "DurableUn," reveals that lo…
-
DPN-LE method precisely edits LLM personalities with minimal neuron intervention
Researchers have developed DPN-LE, a novel method for editing the "personality" of large language models by targeting specific neurons. Existing techniques often degrade overall model performance by modifying too many n…