GPT-5.4 Nano
PulseAugur coverage of GPT-5.4 Nano — every cluster mentioning GPT-5.4 Nano across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
LLM agents struggle to patch security bugs, leaving vulnerabilities open
A new benchmark, CVE-Bench, was developed to evaluate LLM agents' ability to patch security vulnerabilities in Python projects. Across 18 projects and 20 real-world CVEs, the best performing models achieved only a 50% s…
-
Qwen2.5 fine-tuned for SRE post-mortems outperforms larger models
A developer has fine-tuned the Qwen2.5-0.5B model to generate summaries for SRE post-mortems. This approach uses a 700-sample training set and 4-bit LoRA quantization, allowing it to run on consumer hardware. The fine-t…
-
New benchmark tests LLMs on math text continuations
Researchers have developed a new self-supervised benchmark for evaluating language models on mathematical text continuations. This benchmark uses likelihood scoring to assess how well a model's auxiliary forecast string…
-
PIIGuard shields webpages from LLM PII harvesting via adversarial fragments
Researchers have developed PIIGuard, a novel webpage-level defense system designed to prevent large language models (LLMs) from harvesting personally identifiable information (PII). This system embeds hidden HTML fragme…
-
[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Researchers are developing new benchmarks and evaluation methods for large language models (LLMs) in mathematical reasoning and educational assessment. New datasets like ESTBook and Math-PT aim to go beyond simple accur…