PulseAugur
LIVE 00:04:28
research · [198 sources] ·
0
research

OpenAI, Anthropic, Google, Meta, and Alibaba release new models and agent platforms

Several AI labs have released new open-weight models, including Alibaba's Qwen3.6-27B, which claims to outperform larger models on coding benchmarks, and Xiaomi's MiMo-V2.5 series, featuring enhanced agentic capabilities and multimodality. OpenAI has also open-sourced a privacy filter model for PII detection, targeting infrastructure needs. Additionally, Anthropic has launched Claude Design, a new tool for generating prototypes and presentations powered by Claude Opus 4.7, signaling a move into design tooling. AI

Summary written by gemini-2.5-flash-lite from 198 sources. How we write summaries →

IMPACT New open-source models and agentic tools are increasing competition and lowering barriers for AI development and deployment.

RANK_REASON Multiple open-weight model releases and new product features from major AI labs.

Read on vLLM — Releases →

OpenAI, Anthropic, Google, Meta, and Alibaba release new models and agent platforms

COVERAGE [198]

  1. vLLM — Releases TIER_1 · simon-mo ·

    v0.20.1rc0: Add system_fingerprint field to OpenAI-compatible API responses (#40537)

    <p>Co-authored-by: Claude <a href="mailto:[email protected]">[email protected]</a></p>

  2. Smol AINews TIER_1 ·

    not much happened today

    **Alibaba** released **Qwen3.6-27B**, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench. It supports native vision-language reasoning over ima…

  3. Smol AINews TIER_1 ·

    not much happened today

    **Moonshot's Kimi K2.6** is a major open-weight **1T-parameter MoE** model featuring **32B active parameters**, **384 experts**, **MLA attention**, **256K context window**, native multimodality, and **INT4 quantization**. It supports day-0 integration with platforms like **vLLM**…

  4. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** launched **Claude Design**, a prototyping tool powered by **Claude Opus 4.7**, targeting design workflows and competing with **Figma** and others. Benchmarks show **Opus 4.7** leading in coding and text tasks, with improved efficiency and adaptive reasoning, though …

  5. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes, foster…

  6. Smol AINews TIER_1 ·

    not much happened today

    **Harness engineering** is emerging as a key discipline in AI agent development, emphasizing components like filesystems, memory, and retries beyond just models. **OpenAI's Codex** is expanding agentic coding workflows beyond software engineering, including codebase understanding…

  7. Smol AINews TIER_1 ·

    not much happened today

    **GLM-5.1** has reached **#3 on Code Arena**, surpassing **Gemini 3.1** and **GPT-5.4**, and matching **Claude Sonnet 4.6** in coding performance. **Z.ai** now holds the **#1 open model rank** close to the top overall. The advisor pattern, combining a cheap executor with an expen…

  8. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic's Mythos** and **OpenAI's** upcoming restricted cyber-capable models are central to recent discussions, with debates on their security realism and evaluation methods. **LangChain's Deep Agents deploy** introduces an open memory, model-agnostic agent harness architectu…

  9. Smol AINews TIER_1 ·

    not much happened today

    **Meta Superintelligence Labs** launched **Muse Spark**, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on **meta.ai** and the Meta AI app with a private API preview and plans for open-sourcing future v…

  10. Smol AINews TIER_1 ·

    not much happened today

    **Google** introduced **Skills in Chrome**, enabling reusable browser workflows with Gemini prompts and a library of ready-made Skills, enhancing end-user agentization. **Tencent** teased **HYWorld 2.0**, an open-source 3D world model generating editable scenes from a single imag…

  11. Smol AINews TIER_1 ·

    not much happened today

    **Hermes Agent** is gaining attention as a leading open agent stack with features like self-improving skills, persistent memory, and a self-improvement loop. Its new **Manim skill** enables generation of math/technical animations, expanding agent capabilities. The Hermes ecosyste…

  12. Smol AINews TIER_1 ·

    not much happened today

    **Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including **vLLM**,…

  13. Smol AINews TIER_1 ·

    not much happened today

    **Arcee’s Trinity-Large-Thinking** was released with **open weights under Apache 2.0**, featuring a **400B total / 13B active** model size and strong agentic performance, ranking **#2 on PinchBench**. **Z.ai’s GLM-5V-Turbo** is a **vision coding model** with **native multimodal f…

  14. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** introduced **computer use inside Claude Code** for closed-loop verification in a research preview for Pro/Max users, enhancing reliable app iteration. **OpenAI** released a **Codex plugin for Claude Code**, enabling cross-agent composition and signaling a shift towa…

  15. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** is reportedly introducing a new AI model tier called **Capybara**, which is larger and more intelligent than **Claude Opus 4.6**, showing improved performance in coding, academic reasoning, and cybersecurity. The model is speculated to be around **10 trillion parame…

  16. Smol AINews TIER_1 ·

    not much happened today

    **ARC-AGI-3** benchmark introduced by **@arcprize** and **François Chollet** resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency. The…

  17. Smol AINews TIER_1 ·

    not much happened today

    **Google** launched **Gemini 3.1 Flash Live**, a realtime voice and vision agent model with **2x longer conversation memory**, supporting **70 languages** and **128k context**. **Mistral AI** released **Voxtral TTS**, a low-latency, open-weight text-to-speech model supporting **9…

  18. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming product-nat…

  19. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** introduced **Claude Cowork** and **Claude Code** enabling desktop control of mouse, keyboard, and screen in a **macOS research preview**, expanding agent capabilities beyond APIs and browsers. The agent ecosystem is evolving towards long-running, parallel, tool-rich…

  20. Smol AINews TIER_1 ·

    not much happened today

    **Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into thir…

  21. Smol AINews TIER_1 ·

    not much happened today

    **Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into reinforcemen…

  22. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** released **GPT-5.4 mini** and **GPT-5.4 nano**, their most capable small models optimized for coding, multimodal understanding, and subagents, featuring a **400k context window** and over **2x speed** compared to GPT-5 mini. The mini model approaches larger GPT-5.4 per…

  23. Smol AINews TIER_1 ·

    not much happened today

    **Moonshot's Attention Residuals** paper introduced an input-dependent attention mechanism over prior layers with a **1.25x compute advantage** and less than **2% inference latency overhead**, validated on **Kimi Linear 48B total / 3B active**. The paper sparked debate on novelty…

  24. Smol AINews TIER_1 ·

    not much happened today

    **MCP tools** remain relevant for deterministic APIs despite ergonomic criticisms, with new **web MCP support in Chrome v146** enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for agents, with IBM improving task completion rates and multi…

  25. Smol AINews TIER_1 ·

    not much happened today

    **Harnesses, agent infrastructure, and the MCP protocol** are central themes, with emphasis on how **harnesses, sandboxes, filesystem access, skills, memory, and observability** shape agent UI/UX and runtime environments. Despite jokes about MCP's demise, it remains vital in prod…

  26. Smol AINews TIER_1 ·

    not much happened today

    **NVIDIA’s Nemotron 3 Super** is a **120B parameter / ~12B active** open model featuring a **hybrid Mamba-Transformer / SSM Latent MoE** architecture and **1M context window**, delivering up to **2.2x faster inference than GPT-OSS-120B** in FP4 with strong throughput gains. It su…

  27. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** rolled out **GPT-5.4**, achieving tied **#1** on the **Artificial Analysis Intelligence Index** with **Gemini 3.1 Pro Preview** scoring **57** (up from 51 for GPT-5.2 xhigh). GPT-5.4 features a larger **~1.05M token** context window and higher per-token prices ($2.50/$…

  28. Smol AINews TIER_1 ·

    not much happened today

    **Gemini 3.1 Flash-Lite** is highlighted by **Demis Hassabis** for its speed and cost-efficiency, focusing on latency and cost per capability rather than raw performance. **NotebookLM Studio** introduces a new feature for generating immersive cinematic video overviews. Rumors abo…

  29. Smol AINews TIER_1 ·

    not much happened today

    **Google DeepMind** launched **Gemini 3.1 Flash-Lite**, emphasizing *dynamic thinking levels* for adjustable compute, with notable metrics like **$0.25/M input**, **$1.50/M output**, **1432 Elo on LMArena**, and **2.5× faster time-to-first-token** than Gemini 2.5 Flash. It suppor…

  30. Smol AINews TIER_1 ·

    not much happened today

    **Alibaba** released the **Qwen 3.5** series with models ranging from **0.8B to 9B** parameters, featuring **native multimodality**, **scaled reinforcement learning**, and targeting **edge and lightweight agent** deployments. The models support very long context windows up to **2…

  31. Smol AINews TIER_1 ·

    not much happened today

    **Gemini 3.1 Pro** demonstrates strong retrieval capabilities and cost efficiency compared to **GPT-5.2** and **Opus 4.6**, though users report tooling and UI issues. The **SWE-bench Verified** evaluation methodology is under scrutiny for consistency, with updates bringing result…

  32. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** released **Claude Opus/Sonnet 4.6**, showing a significant intelligence index jump but with increased token usage and cost. **Anthropic** also shared insights on AI agent autonomy, highlighting human-in-the-loop prevalence and software engineering tool calls. **Alib…

  33. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as the…

  34. Smol AINews TIER_1 ·

    not much happened today

    **AI News** for early February 2026 highlights a detailed comparison between **GPT-5.3-Codex** and **Claude Opus 4.6**, with users noting **Codex's** strength in detailed scoped tasks and **Opus's** ergonomic advantage for exploratory work. Benchmarks on Karpathy's **nanochat GPT…

  35. Smol AINews TIER_1 ·

    not much happened today

    **AI News for 1/27/2026-1/28/2026** highlights a quiet day with deep dives into frontier model "personality split" where **GPT-5.2** excels at *exploration* and **Claude Opus 4.5** at *exploitation*, suggesting **OpenAI** suits research workflows and **Anthropic** commercial reli…

  36. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** launches "Claude in Excel Pro" with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor** introd…

  37. Smol AINews TIER_1 ·

    not much happened today

    **X Engineering** open-sourced its new transformer-based recommender algorithm, sparking community debate on transparency and fairness. **GLM-4.7-Flash (30B-A3B)** gains momentum as a strong local inference model with efficient KV-cache management and quantization tuning strategi…

  38. Smol AINews TIER_1 ·

    not much happened today

    **AI News for 1/16/2026-1/19/2026** covers new architectures for scaling Transformer memory and context, including **STEM** from **Carnegie Mellon** and **Meta AI**, which replaces part of the FFN with a token-indexed embedding lookup enabling CPU offload and asynchronous prefetc…

  39. Smol AINews TIER_1 ·

    not much happened today.

    **OpenAI** launched **GPT-5.2-Codex** API, touted as their strongest coding model for long-running tasks and cybersecurity. **Cursor** integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million lines of Rust code. **GitHub** incorporated it into t…

  40. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** tightens usage policies for **Claude Max** in third-party apps, prompting builders to adopt **model-agnostic orchestration** and **BYO-key** defaults to mitigate platform risks. The **Model Context Protocol (MCP)** is evolving into a key tooling plane with **OpenAI …

  41. Smol AINews TIER_1 ·

    not much happened today

    **Stanford paper** reveals **Claude 3.7 Sonnet** memorized **95.8% of Harry Potter 1**, highlighting copyright extraction risks compared to **GPT-4.1**. **Google AI Studio** sponsors **TailwindCSS** amid OSS funding debates. **Google** and **Sundar Pichai** launch **Gmail Gemini …

  42. Smol AINews TIER_1 ·

    not much happened today

    **AI News for 1/6/2026-1/7/2026** highlights a quiet day with key updates on **LangChain DeepAgents** introducing **Ralph Mode** for persistent agent loops, **Cursor** improving context management by reducing token usage by **46.9%**, and operational safety measures for coding ag…

  43. Smol AINews TIER_1 ·

    not much happened today

    **AI News** from early January 2026 highlights a viral economic prediction about **Vietnam** surpassing Thailand, **Microsoft**'s reported open-sourcing of **bitnet.cpp** for 1-bit CPU inference promising speed and energy gains, and a new research partnership between **Google Dee…

  44. Smol AINews TIER_1 ·

    not much happened today

    **DeepSeek** released a new paper on **mHC: Manifold-Constrained Hyper-Connections**, advancing residual-path design as a key scaling lever in neural networks. Their approach constrains residual mixing matrices to the **Birkhoff polytope** to improve stability and performance, wi…

  45. Smol AINews TIER_1 ·

    not much happened today

    **South Korea's Ministry of Science** launched a coordinated program with **5 companies** to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like **SK Telecom A.X-K1 (519B total / 33B active)** and **LG K-EXAONE (236B MoE / 23B active)**,…

  46. Smol AINews TIER_1 ·

    not much happened today

    **Z.ai (GLM family) IPO in Hong Kong on Jan 8, 2026**, aiming to raise **$560M** at **HK$4.35B**, marking it as the "first AI-native LLM company" public listing. The IPO highlights **GLM-4.7** as a starting point. **Meta AI** acquired **Manus** for approximately **$4–5B**, with M…

  47. Smol AINews TIER_1 ·

    not much happened today

    **MiniMax M2.1** launches as an **open-source** agent and coding Mixture-of-Experts (MoE) model with **~10B active / ~230B total parameters**, claiming to outperform **Gemini 3 Pro** and **Claude Sonnet 4.5**, and supports local inference including on **Apple Silicon M3 Ultra** w…

  48. Smol AINews TIER_1 ·

    not much happened today

    **GLM-4.7** and **MiniMax M2.1** open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters an…

  49. Smol AINews TIER_1 ·

    not much happened today

    **Zhipu AI's GLM-4.7** release marks a significant improvement in **coding, complex reasoning, and tool use**, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. **Xiaomi's MiMo-V2-Flash** is highlighted as a practical, cost-efficient mixture-of-experts model opt…

  50. Smol AINews TIER_1 ·

    not much happened today

    **Alibaba** released **Qwen-Image-Layered**, an open-source model enabling Photoshop-grade layered image decomposition with recursive infinite layers and prompt-controlled structure. **Kling 2.6** introduced advanced motion control for image-to-video workflows, supported by a cre…

  51. Smol AINews TIER_1 ·

    not much happened today

    **GPT-5.2** shows mixed performance in public evaluations, excelling in agentic tasks but at a significantly higher cost (~**$620/run**) compared to **Opus 4.5** and **GPT-5.1**. It performs variably on reasoning and coding benchmarks, with some improvements on long-context tasks…

  52. Smol AINews TIER_1 ·

    not much happened today

    **NousResearch's Nomos 1** is a 30B open math model achieving a top Putnam score with only ~3B active parameters, enabling consumer Mac inference. **AxiomProver** also posts top Putnam results using ThinkyMachines' RL stack. **Mistral's Devstral 2 Small** outperforms DeepSeek v3.…

  53. Smol AINews TIER_1 ·

    not much happened today

    **Claude Code Skills** gains attention with a published talk and Hugging Face's new "skill" enabling one-line fine-tuning pipelines for models from ~0.5B to 70B parameters, supporting SFT, DPO, and GRPO, costing as low as ~$0.30 for small runs. **Zhipu AI** launches multimodal mo…

  54. Smol AINews TIER_1 ·

    not much happened today

    **vLLM 0.12.0** introduces DeepSeek support, GPU Model Runner V2, and quantization improvements with PyTorch 2.9.0 and CUDA 12.9. **NVIDIA** launches CUDA Tile IR and cuTile Python for advanced GPU tensor operations targeting Blackwell GPUs. **Hugging Face** releases Transformers…

  55. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI's Code Red response** and **Anthropic's IPO** are major highlights. In AI video and imaging, **Kling 2.6** introduces native audio co-generation with coherent lip-sync, partnered with platforms like **ElevenLabs** and **OpenArt**. **Runway Gen-4.5** enhances lighting fid…

  56. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** introduces durable agents and MCP tasks for long-running workflows, with practical engineering patterns and integrations like Prefect. **Booking.com** deploys a large-scale agent system improving customer satisfaction using LangGraph, Kubernetes, GPT-4 Mini, and Wea…

  57. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** launched **GPT-5.1** featuring "adaptive reasoning" and developer-focused API improvements, including prompt caching and a reasoning_effort toggle for latency/cost tradeoffs. Independent analysis shows a minor intelligence bump with significant gains in agentic coding …

  58. Smol AINews TIER_1 ·

    not much happened today

    **GPT-5** leads Sudoku-Bench solving 33% of puzzles but 67% remain unsolved, highlighting challenges in meta-reasoning and spatial logic. New training methods like **GRPO fine-tuning** and "Thought Cloning" show limited success. Research on "looped LLMs" suggests pretrained model…

  59. Smol AINews TIER_1 ·

    not much happened today

    **Moonshot AI's Kimi K2 Thinking** AMA revealed a hybrid attention stack using **KDA + NoPE MLA** outperforming full MLA + RoPE, with the **Muon optimizer** scaling to ~1T parameters and native **INT4** QAT for cost-efficient inference. K2 Thinking ranks highly on **LisanBench** …

  60. Smol AINews TIER_1 ·

    not much happened today

    **Kimi-K2 Reasoner** has been integrated into **vLLM** and will soon be supported by **SGLang**, featuring a massive **1.2 trillion parameter MoE** configuration. **Perplexity AI** released research on cloud-portable trillion-parameter MoE kernels optimized for **AWS EFA**, with …

  61. Smol AINews TIER_1 ·

    not much happened today

    **Google's Project Suncatcher** prototypes scalable ML compute systems in orbit using solar energy with Trillium-generation TPUs surviving radiation, aiming for prototype satellites by 2027. **China's 50% electricity subsidies** for datacenters may offset chip efficiency gaps, wi…

  62. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** and **AWS** announced a strategic partnership involving a $38B compute deal to deploy hundreds of thousands of NVIDIA GB200 and GB300 chips, while **Microsoft** secured a license to ship NVIDIA GPUs to the UAE with a planned $7.9B datacenter investment. A 3-month NVFP4…

  63. Smol AINews TIER_1 ·

    not much happened today

    **Poolside** raised **$1B** at a **$12B valuation**. **Eric Zelikman** raised **$1B** after leaving **Xai**. **Weavy** joined **Figma**. New research highlights **FP16** precision reduces training-inference mismatch in **reinforcement-learning** fine-tuning compared to **BF16**. …

  64. Smol AINews TIER_1 ·

    not much happened today

    **Moonshot AI** released **Kimi Linear (KDA)** with day-0 infrastructure and strong long-context metrics, achieving up to **75% KV cache reduction** and **6x decoding throughput**. **MiniMax M2** pivoted to full attention for multi-hop reasoning, maintaining strong agentic coding…

  65. Smol AINews TIER_1 ·

    not much happened today

    **vLLM** announced support for **NVIDIA Nemotron Nano 2**, featuring a hybrid Transformer–Mamba design and tunable "thinking budget" enabling up to 6× faster token generation. **Mistral AI Studio** launched a production platform for agents with deep observability. **Baseten** rep…

  66. Smol AINews TIER_1 ·

    not much happened today

    **LangSmith** launched the **Insights Agent** with multi-turn evaluation for agent ops and observability, improving failure detection and user intent clustering. **Meta PyTorch** and **Hugging Face** introduced **OpenEnv**, a Gymnasium-style API and hub for reproducible agentic e…

  67. Smol AINews TIER_1 ·

    not much happened today

    **LangChain & LangGraph 1.0** released with major updates for reliable, controllable agents and unified docs, emphasizing "Agent Engineering." **Meta** introduced **PyTorch Monarch** and **TorchForge** for distributed programming and reinforcement learning, enabling large-scale a…

  68. Smol AINews TIER_1 ·

    not much happened today

    **Alibaba** released compact dense **Qwen3-VL** models at 4B and 8B sizes with FP8 options, supporting up to 1M context and open vocabulary detection, rivaling larger models like **Qwen2.5-VL-72B**. Ecosystem support includes **MLX-VLM**, **LM Studio**, **vLLM**, **Kaggle models*…

  69. Smol AINews TIER_1 ·

    not much happened today

    **FrontierMath Tier 4** results show **GPT-5 Pro** narrowly outperforming **Gemini 2.5 Deep Think** in reasoning accuracy, with concerns about problem leakage clarified by **Epoch AI Research**. **Mila** and **Microsoft** propose **Markovian Thinking** to improve reasoning effici…

  70. Smol AINews TIER_1 ·

    not much happened today

    **Samsung's 7M Tiny Recursive Model (TRM)** achieves superior reasoning on ARC-AGI and Sudoku with fewer layers and MLP replacing self-attention. **LeCun's team** introduces **JEPA-SCORE**, enabling density estimation from encoders without retraining. **AI21 Labs** releases **Jam…

  71. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** announces a new CTO. Frontier coding agents see updates with **Claude Sonnet 4.5** showing strong cybersecurity and polished UX but trailing **GPT-5 Codex** in coding capability. **xAI Grok Code Fast** claims higher edit success at lower cost. **Google's Jules** cod…

  72. Smol AINews TIER_1 ·

    not much happened today

    **Kling 2.5 Turbo** leads in text-to-video and image-to-video generation with competitive pricing. **OpenAI Sora 2** shows strong instruction-following but has physics inconsistencies. **Google Gemini 2.5 Flash** "Nano Banana" image generation is now generally available with mult…

  73. Smol AINews TIER_1 ·

    not much happened today

    **Google** released a dense September update including **Gemini Robotics 1.5** with enhanced spatial/temporal reasoning, **Gemini Live**, **EmbeddingGemma**, and **Veo 3 GA** powering creative workflows. They also introduced agentic features like restaurant-reservation agents and…

  74. Smol AINews TIER_1 ·

    not much happened today

    **Alibaba** unveiled the **Qwen3** model family including **Qwen3-Max** and **Qwen3-VL** with a native 256K context window expandable to 1M, strong OCR in 32 languages, and rapid release velocity (~3.5 releases/month) backed by a $52B infrastructure roadmap. **OpenAI** launched *…

  75. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** published an in-depth postmortem on their August-September reliability issues. **OpenAI**'s GPTeam achieved a perfect 12/12 score at the **ICPC 2025** World Finals, showcasing rapid progress in general-purpose reasoning and introducing controllable "thinking time" t…

  76. Smol AINews TIER_1 ·

    not much happened today

    **GPT-5 Codex** rollout shows strong agentic coding capabilities with some token bloat issues. IDEs like **VS Code Insiders** and **Cursor 1.6** enhance context windows and model integration. **vLLM 0.10.2** supports aarch64 and NVIDIA GB200 with performance improvements. **AMD R…

  77. Smol AINews TIER_1 ·

    not much happened today

    **Meta** released **MobileLLM-R1**, a sub-1B parameter reasoning model family on Hugging Face with strong small-model math accuracy, trained on 4.2T tokens. **Alibaba** introduced **Qwen3-Next-80B-A3B** with hybrid attention, 256k context window, and improved long-horizon memory,…

  78. Smol AINews TIER_1 ·

    not much happened today

    **Cognition** raised **$400M** at a **$10.2B** valuation to advance AI coding agents, with **swyx** joining to support the "Decade of Agents" thesis. **Vercel** launched an OSS "vibe coding platform" using a tuned **GPT-5** agent loop. **Claude Code** emphasizes minimalism in age…

  79. Smol AINews TIER_1 ·

    not much happened today

    **Google DeepMind** released **EmbeddingGemma (308M)**, a small multilingual embedding model optimized for on-device retrieval-augmented generation and semantic search, supporting over 100 languages and running efficiently with quantization and EdgeTPU latency under 15ms. **Jina …

  80. Smol AINews TIER_1 ·

    not much happened today

    **Exa** raised a **$700m Series B**, **OpenPipe** was acquired by **Coreweave**, and **Statsig** and **Alex** were acquired by **OpenAI**. The **Agent/Client Protocol (ACP)** was introduced by the **Zed** team to standardize IDE-agent interoperability, supporting **Claude Code** …

  81. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** integrates **GPT-5** into Xcode 26 with improved coding latency, though some UX trade-offs are noted. **xAI's Grok Code Fast 1** gains momentum, surpassing **Claude Sonnet** in usage and praised for fast debugging. **Zhipu's GLM-4.5** offers a cost-effective coding pla…

  82. Smol AINews TIER_1 ·

    not much happened today

    **Apple** released three real-time vision-language models (**FastVLM**, **MobileCLIP2**) on Hugging Face with significant speed and size improvements, supporting WebGPU and Core ML. Their MLX framework now supports **MXFP4** format, competing with **NVFP4** for FP4 quantization. …

  83. Smol AINews TIER_1 ·

    not much happened today

    **xAI** released open weights for **Grok-2** and **Grok-2.5** with a novel MoE residual architecture and μP scaling, sparking community excitement and licensing concerns. **Microsoft** open-sourced **VibeVoice-1.5B**, a multi-speaker long-form TTS model with streaming support and…

  84. Smol AINews TIER_1 ·

    not much happened today

    **DeepMind** released **Genie 3**, an interactive multimodal world simulator with advanced spatial memory and real-time avatar control, and **SIMA**, an embodied training agent operating inside generated worlds. **Alibaba** introduced **Qwen-Image-Edit**, an open-weights image ed…

  85. Smol AINews TIER_1 ·

    not much happened today

    **Gemma 3 270M**, an ultra-small model optimized for edge and mobile use, was released and is gaining adoption. **NVIDIA** launched two open multilingual ASR models, **Canary 1B** and **Parakeet-TDT 0.6B**, trained on 1 million hours of data with CC-BY licensing, plus the efficie…

  86. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** rolled out **GPT-5** as the default in ChatGPT with new modes and a "warmer" personality, plus expanded message limits for Plus/Team users and Enterprise/Edu access. Performance rankings show **gpt-5-high** leading, with smaller variants also ranked, though critiques n…

  87. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** continues small updates to **GPT-5**, introducing "Auto/Fast/Thinking" modes with **196k token context**, **3,000 messages/week**, and dynamic routing to cheaper models for cost efficiency. The **MiniMax AI Agent Challenge** offers **$150,000** in prizes for AI agent d…

  88. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** released the **GPT-5** series including **GPT-5-mini** and **GPT-5-nano**, with mixed user feedback on performance and API behavior. **Anthropic** extended **Claude Sonnet 4** context window to **1 million tokens**, a 5x increase, enhancing large document processing. *…

  89. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** launched **GPT-5** with a unified user experience removing manual model selection, causing initial routing and access issues for Plus users that are being addressed with fixes including restored model options and increased usage limits. **GPT-5** introduces "Priority P…

  90. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** released its first open models since GPT-2, **gpt-oss-120b** and **gpt-oss-20b**, which quickly trended on **Hugging Face**. **Microsoft** supports these models via **Azure AI Foundry** and **Windows Foundry Local**. Key architectural innovations include **sliding wind…

  91. Smol AINews TIER_1 ·

    not much happened today

    **Chinese AI labs** have released powerful open-source models like **GLM-4.5** and **GLM-4.5-Air** from **Zhipu AI**, **Qwen3 Coder** and **Qwen3-235B** from **Alibaba**, and **Kimi K2** from **Moonshot AI**, highlighting a surge in permissively licensed models. **Zhipu AI's GLM-…

  92. Smol AINews TIER_1 ·

    not much happened today

    **Chinese labs** have released a wave of powerful, permissively licensed models in July, including **Zhipu AI's GLM-4.5** and **GLM-4.5-Air**, **Alibaba's Qwen3 Coder** and **Qwen3-235B**, and **Moonshot AI's Kimi K2**. These models feature large-scale Mixture of Experts architec…

  93. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** has fully rolled out its ChatGPT agent to all Plus, Pro, and Team users and is building hype for the upcoming **GPT-5**, which reportedly outperforms **Grok-4** and can build a cookie clicker game in two minutes. **Alibaba's Qwen** team released the open-source reasoni…

  94. Smol AINews TIER_1 ·

    not much happened today

    **Alibaba** announced the release of **Qwen3-Coder-480B-A35B-Instruct**, an open agentic code model with **480B** parameters and **256K** context length, praised for rapid development and strong coding performance. Benchmark claims of **41.8% on ARC-AGI-1** faced skepticism from …

  95. Smol AINews TIER_1 ·

    not much happened today

    **Moonshot AI** released the **Kimi K2**, a 1-trillion parameter ultra-sparse Mixture-of-Experts (MoE) model with the **MuonClip** optimizer and a large-scale agentic data pipeline using over **20,000 tools**. Shortly after, **Alibaba** updated its **Qwen3** model with the **Qwen…

  96. Smol AINews TIER_1 ·

    not much happened today

    **Mistral** released **Voxtral**, claimed as the world's best open speech recognition models, available via API and Hugging Face. **Moonshot AI** launched **Kimi K2**, a trillion-parameter **Mixture-of-Experts (MoE)** model, outperforming **GPT-4.1** on benchmarks with 65.4% on S…

  97. Smol AINews TIER_1 ·

    not much happened today

    **Cognition** is acquiring the remaining assets of **Windsurf** after a significant weekend deal. **Moonshot AI** released **Kimi K2**, an open-source, MIT-licensed agentic model with **1 Trillion total / 32B active parameters** using a Mixture-of-Experts architecture, trained on…

  98. Smol AINews TIER_1 ·

    not much happened today

    **LangChain** is nearing unicorn status, while **OpenAI** and **Google DeepMind's Gemini 3 Pro** models are launching soon. **Perplexity** rolls out its agentic browser **Comet** to waitlists, offering multitasking and voice command features. **xAI's Grok-4** update sparked contr…

  99. Smol AINews TIER_1 ·

    not much happened today

    Over the holiday weekend, key AI developments include the upcoming release of **Grok 4**, **Perplexity** teasing new projects, and community reactions to **Cursor** and **Dia**. Research highlights feature a paper on **Reinforcement Learning (RL)** improving generalization and re…

  100. Smol AINews TIER_1 ·

    not much happened today

    **Ilya Sutskever** confirmed his role as CEO of **Safe Superintelligence Inc. (SSI)** with **Daniel Levy** as President, dismissing acquisition rumors and emphasizing their strong team and compute resources. **Perplexity AI** expanded its data integrations by adding **Morningstar…

  101. Smol AINews TIER_1 ·

    not much happened today

    **Meta** has hired **Scale AI CEO Alexandr Wang** as its new **Chief AI Officer**, acquiring a **49% non-voting stake** in **Scale AI** for **$14.3 billion**, doubling its valuation to **~$28 billion**. This move is part of a major talent shuffle involving **Meta**, **OpenAI**, a…

  102. Smol AINews TIER_1 ·

    not much happened today

    **Meta** makes a major AI move by hiring **Scale AI** founder **Alexandr Wang** as Chief AI Officer and acquiring a 49% non-voting stake in **Scale AI** for **$14.3 billion**, doubling its valuation to about **$28 billion**. **Chai Discovery** announces **Chai-2**, a breakthrough…

  103. Smol AINews TIER_1 ·

    not much happened today

    **Meta** has poached top AI talent from **OpenAI**, including **Alexandr Wang** joining as Chief AI Officer to work towards superintelligence, signaling a strong push for the next **Llama** model. The AI job market shows polarization with high demand and compensation for top-tier…

  104. Smol AINews TIER_1 ·

    not much happened today

    **Google** released **Gemma 3n**, a multimodal model for edge devices available in **2B and 4B** parameter versions, with support across major frameworks like **Transformers** and **Llama.cpp**. **Tencent** open-sourced **Hunyuan-A13B**, a **Mixture-of-Experts (MoE)** model with …

  105. Smol AINews TIER_1 ·

    Not much happened today

    **Sakana AI** released **Reinforcement-Learned Teachers (RLTs)**, a novel technique using smaller 7B parameter models trained via reinforcement learning to teach reasoning through step-by-step explanations, accelerating **Chain-of-Thought** learning. **Mistral AI** updated **Mist…

  106. Smol AINews TIER_1 ·

    not much happened today

    **Bytedance** showcased an impressive state-of-the-art video generation model called **Seedance 1.0** without releasing it, while **Morph Labs** announced **Trinity**, an autoformalization system for Lean. **Huggingface Transformers** deprecated Tensorflow/JAX support. **Andrew N…

  107. Smol AINews TIER_1 ·

    not much happened today

    **China's Xiaohongshu (Rednote) released dots.llm1**, a **142B parameter open-source Mixture-of-Experts (MoE) language model** with **14B active parameters** and a **32K context window**, pretrained on **11.2 trillion high-quality, non-synthetic tokens**. The model supports effic…

  108. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** rolled out **Codex** to ChatGPT Plus users with internet access and fine-grained controls, improving memory features for free users. **Anthropic's Claude 4 Opus and Sonnet** models lead coding benchmarks, while **Google's Gemini 2.5 Pro and Flash** models gain recognit…

  109. Smol AINews TIER_1 ·

    not much happened today

    **DeepSeek R1-0528** release brings major improvements in reasoning, hallucination reduction, JSON output, and function calling, matching or surpassing closed models like **OpenAI o3** and **Gemini 2.5 Pro** on benchmarks such as **Artificial Analysis Intelligence Index**, **Live…

  110. Smol AINews TIER_1 ·

    not much happened today

    **DeepSeek R1 v2** model released with availability on Hugging Face and inference partners. The **Gemma model family** continues prolific development including **PaliGemma 2**, **Gemma 3**, and others. **Claude 4** and its variants like **Opus 4** and **Claude Sonnet 4** show top…

  111. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** plans to evolve **ChatGPT** into a **super-assistant** by 2025 with models like **o3** and **o4** enabling agentic tasks and supporting a billion users. Recent multimodal and reasoning model releases include ByteDance's **BAGEL-7B**, Google's **MedGemma**, and NVIDIA's…

  112. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic's Claude 4 models (Opus 4, Sonnet 4)** demonstrate strong coding abilities, with Sonnet 4 achieving **72.7%** on SWE-bench and Opus 4 at **72.5%**. Claude Sonnet 4 excels in codebase understanding and is considered **SOTA on large codebases**. Criticism arose over Ant…

  113. Smol AINews TIER_1 ·

    not much happened today

    **Meta** released **KernelLLM 8B**, outperforming **GPT-4o** and **DeepSeek V3** on KernelBench-Triton Level 1. **Mistral Medium 3** debuted strongly in multiple benchmarks. **Qwen3** models introduced a unified framework with multilingual support. **DeepSeek-V3** features hardwa…

  114. Smol AINews TIER_1 ·

    not much happened today

    **Tencent's Hunyuan-Turbos** has risen to #8 on the LMArena leaderboard, showing strong performance across major categories and significant improvement since February. The **Qwen3 model family**, especially the **Qwen3 235B-A22B (Reasoning)** model, is noted for its intelligence …

  115. Smol AINews TIER_1 ·

    not much happened today

    **Gemini 2.5 Flash** shows a **12 point increase** in the Artificial Analysis Intelligence Index but costs **150x more** than Gemini 2.0 Flash due to **9x more expensive output tokens** and **17x higher token usage** during reasoning. **Mistral Medium 3** competes with **Llama 4 …

  116. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** launched both **Reinforcement Finetuning** and **Deep Research on GitHub repos**, drawing comparisons to **Cognition's DeepWiki**. **Nvidia** open-sourced **Open Code Reasoning models (32B, 14B, 7B)** with Apache 2.0 license, showing 30% better token efficiency and com…

  117. Smol AINews TIER_1 ·

    not much happened today

    **Qwen model family** released quantized versions of Qwen3 models including **14B**, **32B**, and **235B** parameters, with promising coding capabilities in Qwen3-235B. **Microsoft** launched **Phi-4-reasoning**, a **14B** parameter model distilled from OpenAI's o3-mini, emphasiz…

  118. Smol AINews TIER_1 ·

    not much happened today

    **Microsoft** released **Phi-reasoning 4**, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. **Anthropic** introduced remote MCP server support and a 45-minute Research mode in **Claude**. **Cursor** published a mod…

  119. Smol AINews TIER_1 ·

    not much happened today

    AI news for April 23-24, 2025, covering new model releases, benchmarks, and research developments from companies like openai, google deepmind, anthropic, and epoch ai research.

  120. Smol AINews TIER_1 ·

    not much happened today

    **Nemotron-H** model family introduces hybrid Mamba-Transformer models with up to **3x faster inference** and variants including **8B**, **56B**, and a compressed **47B** model. **Nvidia Eagle 2.5** is a frontier VLM for long-context multimodal learning, matching **GPT-4o** and *…

  121. Smol AINews TIER_1 ·

    not much happened today

    The AI news recap highlights independent evaluations showing **Grok-3** outperforming models like **GPT-4.5** and **Claude 3.7 Sonnet** on reasoning benchmarks, while **Grok-3 mini** excels in reasoning tasks. Research on **reinforcement learning (RL)** fine-tuning reveals potent…

  122. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** teased a *Memory update in ChatGPT* with limited technical details. Evidence suggests upcoming releases of **o3** and **o4-mini** models, alongside a press leak about **GPT-4.1**. **X.ai** launched the **Grok 3** and **Grok 3 mini** APIs, confirmed as **o1** level mode…

  123. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** announced that **o3** and **o4-mini** models will be released soon, with **GPT-5** expected in a few months, delayed for quality improvements and capacity planning. **DeepSeek** introduced **Self-Principled Critique Tuning (SPCT)** to enhance inference-time scalability…

  124. Smol AINews TIER_1 ·

    not much happened today

    **Gemini 2.5 Pro** shows strengths and weaknesses, notably lacking LaTex math rendering unlike **ChatGPT**, and scored **24.4%** on the **2025 US AMO**. **DeepSeek V3** ranks 8th and 12th on recent leaderboards. **Qwen 2.5** models have been integrated into the **PocketPal** app.…

  125. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** plans to release its first open-weight language model since **GPT-2** in the coming months, signaling a move towards more open AI development. **DeepSeek** launched its open-source **R1 model** earlier this year, challenging perceptions of China's AI progress. **Gemma …

  126. Smol AINews TIER_1 ·

    not much happened today

    **GPT-4o** was praised for its improved coding, instruction following, and freedom, becoming the leading non-reasoning coding model surpassing **DeepSeek V3** and **Claude 3.7 Sonnet** in coding benchmarks, though it still lags behind reasoning models like **o3-mini**. Concerns a…

  127. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** announced the new **GPT-4o** model with enhanced instruction-following, complex problem-solving, and native image generation capabilities. The model shows improved performance in math, coding, and creativity, with features like transparent background image generation. …

  128. Smol AINews TIER_1 ·

    not much happened today

    At Nvidia GTC Day 1, several AI updates were highlighted: **Google's Gemini 2.0 Flash** introduces image input/output but is not recommended for text-to-image tasks, with **Imagen 3** preferred for that. **Mistral AI** released **Mistral Small 3.1** with 128k token context window…

  129. Smol AINews TIER_1 ·

    not much happened today

    **Google DeepMind** announced updates to **Gemini 2.0**, including an upgraded **Flash Thinking model** with stronger reasoning and native image generation capabilities. **Cohere** launched **Command A**, a **111B** parameter dense model with a **256K context window** and competi…

  130. Smol AINews TIER_1 ·

    not much happened today

    **DeepSeek R1** demonstrates significant efficiency using **FP8** precision, outperforming **Gemma 3 27B** in benchmarks with a **Chatbot Arena Elo Score** of **1363** vs. **1338**, requiring substantial hardware like **32 H100 GPUs** and **2,560GB VRAM**. **OpenAI** labels **Dee…

  131. Smol AINews TIER_1 ·

    not much happened today

    The AI news recap highlights several key developments: **nanoMoE**, a PyTorch implementation of a mid-sized Mixture-of-Experts (MoE) model inspired by Andrej Karpathy's nanoGPT, enables pretraining on commodity hardware within a week. An agentic leaderboard ranks LLMs powering **…

  132. Smol AINews TIER_1 ·

    not much happened today

    **AI21 Labs launched Jamba 1.6**, touted as the **best open model for private enterprise deployment**, outperforming **Cohere, Mistral, and Llama** on benchmarks like **Arena Hard**. **Mistral AI** released a state-of-the-art **multimodal OCR model** with multilingual and structu…

  133. Smol AINews TIER_1 ·

    not much happened today

    **Weights and Biases** announced a **$1.7 billion acquisition by CoreWeave** ahead of CoreWeave's IPO. **CohereForAI** released the **Aya Vision models (8B and 32B parameters)** supporting **23 languages**, outperforming larger models like **Llama-3.2 90B Vision** and **Molmo 72B…

  134. Smol AINews TIER_1 ·

    not much happened today

    **GPT-4.5** sparked mixed reactions on Twitter, with **@karpathy** noting users preferred **GPT-4** in a poll despite his personal favor for GPT-4.5's creativity and humor. Critics like **@abacaj** highlighted **GPT-4.5's slowness** and questioned its practical value and pricing …

  135. Smol AINews TIER_1 ·

    not much happened today

    **Claude 3.7 Sonnet** demonstrates exceptional coding and reasoning capabilities, outperforming models like **DeepSeek R1**, **O3-mini**, and **GPT-4o** on benchmarks such as **SciCode** and **LiveCodeBench**. It is available on platforms including **Perplexity Pro**, **Anthropic…

  136. Smol AINews TIER_1 ·

    not much happened today

    **Grok-3**, a new family of LLMs from **xAI** using **200,000 Nvidia H100 GPUs** for advanced reasoning, outperforms models from **Google, Anthropic, and OpenAI** on math, science, and coding benchmarks. **DeepSeek-R1** from **ByteDance Research** achieves top accuracy on the cha…

  137. Smol AINews TIER_1 ·

    not much happened today

    **Smolagents** library by **Huggingface** continues trending. **ChatGPT-4o** latest version "chatgpt-40-latest-20250129" released. **DeepSeek R1 671B** sets speed record at **198 t/s**, fastest reasoning model, recommended with specific prompt settings. **Perplexity Deep Research…

  138. Smol AINews TIER_1 ·

    not much happened today

    **Zyphra AI** launched **Zonos-v0.1**, a leading open-weight text-to-speech model supporting multiple languages and zero-shot voice cloning. **Meta FAIR** released the open-source **Audiobox Aesthetics** model trained on 562 hours of audio data. **Kyutai Labs** introduced **Moshi…

  139. Smol AINews TIER_1 ·

    not much happened today

    **Google** released **Gemini 2.0 Flash Thinking Experimental 1-21**, a vision-language reasoning model with a **1 million-token context window** and improved accuracy on science, math, and multimedia benchmarks, surpassing **DeepSeek-R1** but trailing **OpenAI's o1**. **ZyphraAI*…

  140. Smol AINews TIER_1 ·

    not much happened today

    **DeepSeek-R1 surpasses OpenAI in GitHub stars**, marking a milestone in open-source AI with rapid growth in community interest. **AlphaGeometry2 achieves gold-medalist level performance with an 84% solving rate on IMO geometry problems**, showcasing significant advancements in A…

  141. Smol AINews TIER_1 ·

    not much happened today

    **DeepSeek-R1 and DeepSeek-V3** models have made significant advancements, trained on an **instruction-tuning dataset of 1.5M samples** with **600,000 reasoning** and **200,000 non-reasoning SFT data**. The models demonstrate strong **performance benchmarks** and are deployed on-…

  142. Smol AINews TIER_1 ·

    not much happened today

    **Huawei chips** are highlighted in a diverse AI news roundup covering **NVIDIA's** stock rebound, new open music foundation models like **Local Suno**, and competitive AI models such as **Qwen 2.5 Max** and **Deepseek V3**. The release of **DeepSeek Janus Pro**, a multimodal LLM…

  143. Smol AINews TIER_1 ·

    not much happened today

    **DeepSeek-V3**, a **671 billion parameter mixture-of-experts model**, surpasses **Llama 3.1 405B** and **GPT-4o** in coding and math benchmarks. **OpenAI** announced the upcoming release of **GPT-5** on **April 27, 2023**. **MiniMax-01 Coder mode** in **ai-gradio** enables build…

  144. Smol AINews TIER_1 ·

    not much happened today

    **Harvey** secured a new **$300M funding round**. **OuteTTS 0.3 1B & 500M** text-to-speech models were released featuring **zero-shot voice cloning**, **multilingual support** (en, jp, ko, zh, fr, de), and **emotion control**, powered by **OLMo-1B** and **Qwen 2.5 0.5B**. The **H…

  145. Smol AINews TIER_1 ·

    not much happened today

    **Helium-1 Preview** by **kyutai_labs** is a **2B-parameter multilingual base LLM** outperforming **Qwen 2.5**, trained on **2.5T tokens** with a **4096 context size** using token-level distillation from a **7B model**. **Phi-4 (4-bit)** was released in **lmstudio** on an **M4 ma…

  146. Smol AINews TIER_1 ·

    not much happened today

    **rStar-Math** surpasses **OpenAI's o1-preview** in math reasoning with **90.0% accuracy** using a **7B LLM** and **MCTS** with a **Process Reward Model**. **Alibaba** launches **Qwen Chat** featuring **Qwen2.5-Plus** and **Qwen2.5-Coder-32B-Instruct** models enhancing vision-lan…

  147. Smol AINews TIER_1 ·

    not much happened today

    **Sebastien Bubeck** introduced **REINFORCE++**, enhancing classical REINFORCE with **PPO-inspired techniques** for **30% faster training**. **AI21 Labs** released **Phi-4** under the **MIT License**, accessible via **Ollama**. **François Chollet** announced plans for **ARC-AGI-2…

  148. Smol AINews TIER_1 ·

    not much happened today

    **NVIDIA** has launched **Cosmos**, an open-source video world model trained on **20 million hours of video**, aimed at advancing **robotics** and **autonomous driving**. The release sparked debate over its open-source status and technical approach. Additionally, **NVIDIA** annou…

  149. Smol AINews TIER_1 ·

    not much happened today

    **Olmo 2** released a detailed tech report showcasing full pre, mid, and post-training details for a frontier fully open model. **PRIME**, an open-source reasoning solution, achieved **26.7% pass@1**, surpassing **GPT-4o** in benchmarks. Performance improvements include **Qwen 32…

  150. Smol AINews TIER_1 ·

    not much happened today

    **Sam Altman** publicly criticizes **DeepSeek** and **Qwen** models, sparking debate about **OpenAI**'s innovation claims and reliance on foundational research like the **Transformer architecture**. **Deepseek V3** shows significant overfitting issues in the **Misguided Attention…

  151. Smol AINews TIER_1 ·

    not much happened today

    **ChatGPT**, **Sora**, and the **OpenAI API** experienced a >5 hour outage but are now restored. Updates to **vLLM** enable **DeepSeek-V3** to run with enhanced **parallelism** and **CPU offloading**, improving **model deployment flexibility**. Discussions on **gradient descent**…

  152. Smol AINews TIER_1 ·

    not much happened today

    The **Qwen team** launched **QVQ**, a vision-enabled version of their experimental **QwQ o1 clone**, benchmarking comparably to **Claude 3.5 Sonnet**. Discussions include **Bret Taylor's** insights on autonomous software development distinct from the Copilot era. The **Latent Spa…

  153. Smol AINews TIER_1 ·

    not much happened this weekend

    **o3** model gains significant attention with discussions around its capabilities and implications, including an OpenAI board member referencing "AGI." **LangChain** released their **State of AI 2024** survey. **Hume** announced **OCTAVE**, a **3B parameter** API-only speech-lang…

  154. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** announced their "12 Days of OpenAI" event with daily livestreams and potential releases including the **O1 full model**, **Sora video model**, and **GPT-4.5**. **Google DeepMind** released the **GenCast weather model** capable of **15-day forecasts in 8 minutes** using…

  155. Smol AINews TIER_1 ·

    not much happened today

    **AI News for 11/29/2024-12/2/2024** highlights several developments: **Nvidia** introduced **Puzzle**, a distillation-based neural architecture search for inference-optimized large language models, enhancing efficiency. The **IC-Light V2** model was released for varied illuminat…

  156. Smol AINews TIER_1 ·

    not much happened to end the week

    **AI News for 11/29/2024-11/30/2024** covers key updates including the **Gemini multimodal model** advancing in musical structure understanding, a new **quantized SWE-Bench** for benchmarking at **1.3 bits per task**, and the launch of the **DeepSeek-R1 model** focusing on transp…

  157. Smol AINews TIER_1 ·

    not much happened today

    This week in AI news, **Anthropic** launched **Claude Sonnet 3.5**, enabling desktop app control via natural language. **Microsoft** introduced **Magentic-One**, a multi-agent system built on the **AutoGen framework**. **OpenCoder** was unveiled as an AI-powered code cookbook for…

  158. Smol AINews TIER_1 ·

    not much happened today

    This week in AI news highlights **Ollama 0.4** supporting **Meta's Llama 3.2 Vision** models (11B and 90B), with applications like handwriting recognition. **Self-Consistency Preference Optimization (ScPO)** was introduced to improve model consistency without human labels. Discus…

  159. Smol AINews TIER_1 ·

    Not much happened today

    **Grok Beta** surpasses **Llama 3.1 70B** in intelligence but is less competitive due to its pricing at **$5/1M input tokens** and **$15/1M output tokens**. **Defense Llama**, developed with **Meta AI** and **Scale AI**, targets American national security applications. **SWE-Kit*…

  160. Smol AINews TIER_1 ·

    not much happened today

    **ChatGPT Search** was launched by **Sam Altman**, who called it his favorite feature since ChatGPT's original launch, doubling his usage. Comparisons were made between ChatGPT Search and **Perplexity** with improvements noted in Perplexity's web navigation. **Google** introduced…

  161. Smol AINews TIER_1 ·

    not much happened this weekend

    **Moondream**, a **1.6b vision language model**, secured seed funding, highlighting a trend in moon-themed tiny models alongside **Moonshine** (27-61m ASR model). **Claude 3.5 Sonnet** was used for AI Twitter recaps. Discussions included **pattern recognition** vs. **intelligence…

  162. Smol AINews TIER_1 ·

    not much happened today

    **Liquid AI** held a launch event introducing new foundation models. **Anthropic** shared follow-up research on social bias and feature steering with their "Golden Gate Claude" feature. **Cohere** released multimodal Embed 3 embeddings models following Aya Expanse. There was misi…

  163. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** released upgraded **Claude 3.5 Sonnet** and **Claude 3.5 Haiku** models featuring a new **computer use capability** that allows interaction with computer interfaces via screenshots and actions like mouse movement and typing. The **Claude 3.5 Sonnet** achieved state-…

  164. Smol AINews TIER_1 ·

    not much happened today

    **Answer.ai** launched **fastdata**, a synthetic data generation library using "claudette" and Tencent's Billion Persona paper. **NotebookLM** became customizable, and **Motherduck** introduced notable LLMs in SQL implementations. **Perplexity** and **Dropbox** announced competit…

  165. Smol AINews TIER_1 ·

    not much happened today

    **Vertical SaaS agents** are gaining rapid consensus as the future of AI applications, highlighted by **Decagon's $100m funding** and **Sierra's $4b round**. **OpenAI alumni** are actively raising venture capital and forming new startups, intensifying competition in the AI market…

  166. Smol AINews TIER_1 ·

    not much happened today

    **Rhymes AI** released **Aria**, a new **25.3B** parameter multimodal MoE model supporting text, code, image, and video with a **64k token context window** and Apache-2.0 license. **OpenAI**'s **o1-preview** and **o1-mini** models show consistent improvement over **Anthropic** an…

  167. Smol AINews TIER_1 ·

    not much happened today

    **Geoffrey Hinton** and **John Hopfield** won the **Nobel Prize in Physics** for foundational work on neural networks linking AI and physics. **Meta AI** introduced a **13B parameter audio generation model** as part of Meta Movie Gen for video-synced audio. **Anthropic** launched…

  168. Smol AINews TIER_1 ·

    Not much technical happened today

    **OpenAI** announced raising **$6.6B** in new funding at a **$157B valuation**, with ChatGPT reaching *250M weekly active users*. **Poolside** raised **$500M** to advance AGI development. **LiquidAI** introduced three new MoE models (1B, 3B, 40B) with a **32k context window** and…

  169. Smol AINews TIER_1 ·

    not much happened today

    **Meta** released **Llama 3.2**, including lightweight 1B and 3B models for on-device AI with capabilities like summarization and retrieval-augmented generation. **Molmo**, a new multimodal model, was introduced with a large dense captioning dataset. **Google DeepMind** announced…

  170. Smol AINews TIER_1 ·

    not much happened today

    **Meta AI** released **Llama 3.2** models including **1B, 3B text-only** and **11B, 90B vision** variants with **128K token context length** and adapter layers for image-text integration. These models outperform competitors like **Gemma 2** and **Phi 3.5-mini**, and are supported…

  171. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** introduced a RAG technique called Contextual Retrieval that reduces retrieval failure rates by 67% using prompt caching. **Meta** is teasing multimodal **Llama 3** ahead of Meta Connect. **OpenAI** is hiring for a multi-agent research team focusing on improved AI re…

  172. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI's o1-preview and o1-mini models** lead benchmarks in Math, Hard Prompts, and Coding. **Qwen 2.5 72B** model shows strong performance close to **GPT-4o**. **DeepSeek-V2.5** tops Chinese LLMs, rivaling **GPT-4-Turbo-2024-04-09**. **Microsoft's GRIN MoE** achieves good resu…

  173. Smol AINews TIER_1 ·

    nothing much happened today

    **OpenAI's o1 model** faces skepticism about open-source replication due to its extreme restrictions and unique training advances like RL on CoT. **ChatGPT-4o** shows significant performance improvements across benchmarks. **Llama-3.1-405b** fp8 and bf16 versions perform similarl…

  174. Smol AINews TIER_1 ·

    not much happened today + AINews Podcast?

    **Glean** doubled its valuation again. **Dan Hendrycks' Superforecaster AI** generates plausible election forecasts with interesting prompt engineering. A **Stanford** study found that **LLM-generated research ideas** are statistically more novel than those by expert humans. **Sa…

  175. Smol AINews TIER_1 ·

    not much happened today

    **Meta** announced significant adoption of **LLaMA 3.1** with nearly **350 million downloads** on Hugging Face. **Magic AI Labs** introduced **LTM-2-Mini**, a long context model with a **100 million token context window**, and a new evaluation method called HashHop. **LMSys** add…

  176. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** launched **GPT-4o finetuning** with a case study on Cosine. **Anthropic** released **Claude 3.5 Sonnet** with 8k token output. **Microsoft Phi** team introduced **Phi-3.5** in three variants: Mini (3.8B), MoE (16x3.8B), and Vision (4.2B), noted for sample efficiency. *…

  177. Smol AINews TIER_1 ·

    not much happened today

    **Anthropic** rolled out **prompt caching** in its API, reducing input costs by up to **90%** and latency by **80%**, enabling instant fine-tuning with longer prompts. **xAI** released **Grok-2**, a new model competing with frontier models from **Google DeepMind**, **OpenAI**, **…

  178. Smol AINews TIER_1 ·

    not much happened today

    **GPT-5** delayed again amid a quiet news day. **Nous Research** released Hermes 3 finetune of **Llama 3** base models, rivaling FAIR's instruct tunes but sparking debate over emergent existential crisis behavior with 6% roleplay data. **Nvidia** introduced Minitron finetune of *…

  179. Smol AINews TIER_1 ·

    not much happened today

    **Qwen2-Math-72B** outperforms **GPT-4o**, **Claude-3.5-Sonnet**, **Gemini-1.5-Pro**, and **Llama-3.1-405B** on math benchmarks using synthetic data and advanced optimization techniques. **Google AI** cuts pricing for **Gemini 1.5 Flash** by up to 78%. **Anthropic** expands its b…

  180. Smol AINews TIER_1 ·

    not much happened today

    **OpenAI** introduced structured outputs in their API with a new "strict" mode and a "response_format" parameter, supporting models like **gpt-4-0613**, **gpt-3.5-turbo-0613**, and the new **gpt-4o-2024-08-06**. They also halved the price of **gpt-4o** to $2.50 per million tokens…

  181. Smol AINews TIER_1 ·

    not much happened today

    **Meta** released **SAM 2**, a unified model for real-time object segmentation with a new dataset 4.5x larger and 53x more annotated than previous ones. **FastHTML**, a new Python web framework by **Jeremy Howard**, enables easy creation and deployment of interactive web apps. **…

  182. Smol AINews TIER_1 ·

    Nothing much happened today

    **HuggingFace** released a browser-based timestamped Whisper using transformers.js. A Twitter bot by **truth_terminal** became the first "semiautonomous" bot to secure VC funding. **Microsoft** and **Apple** abruptly left the **OpenAI** board amid regulatory scrutiny. **Meta** is…

  183. Smol AINews TIER_1 ·

    Not much happened today.

    **Meta** introduced **Meta 3D Gen**, a system for end-to-end generation of 3D assets from text in under 1 minute, producing high-quality 3D assets with detailed textures. **Perplexity AI** updated Pro Search to handle deeper research with multi-step reasoning and code execution. …

  184. Smol AINews TIER_1 ·

    Not much happened today

    **Twelve Labs** raised **$50m** in Series A funding co-led by NEA and **NVIDIA's NVentures** to advance multimodal AI. **Livekit** secured **$22m** in funding. **Groq** announced running at **800k tokens/second**. OpenAI saw a resignation from Daniel Kokotajlo. Twitter users high…

  185. Smol AINews TIER_1 ·

    Not much happened today

    **Ilya Sutskever** steps down as Chief Scientist at **OpenAI** after nearly a decade, with **Jakub Pachocki** named as his successor. **Google DeepMind** announces **Gemini 1.5 Pro** and **Gemini 1.5 Flash** models featuring 2 million token context and improved multimodal capabil…

  186. Smol AINews TIER_1 ·

    Not much happened today

    **Anthropic** released a team plan and iOS app about 4 months after **OpenAI**. The **Command-R 35B** model excels at creative writing, outperforming larger models like **Goliath-120** and **Miqu-120**. The **Llama-3 8B** model now supports a 1 million token context window, impro…

  187. Smol AINews TIER_1 ·

    Not much happened today

    **RAGFlow** open sourced, a deep document understanding RAG engine with **16.3k context length** and natural language instruction support. **Jamba v0.1**, a **52B parameter** MoE model by Lightblue, released but with mixed user feedback. **Command-R** from **Cohere** available on…

  188. Smol AINews TIER_1 ·

    not much happened today

    The Reddit community /r/LocalLlama discusses **fine-tuning and training LLMs**, including tutorials and questions on training models with specific data like dictionaries and synthetic datasets with **25B+ tokens**. Users explore **retrieval-augmented generation (RAG)** challenges…

  189. Smol AINews TIER_1 ·

    Not much happened piday

    **DeepMind** announces **SIMA**, a generalist AI agent capable of following natural language instructions across diverse 3D environments and video games, advancing embodied AI agents. **Anthropic** releases **Claude 3 Haiku**, their fastest and most affordable model, now availabl…

  190. Smol AINews TIER_1 ·

    Not much happened today

    **Anthropic** released **Claude 3**, replacing Claude 2.1 as the default on Perplexity AI, with **Claude 3 Opus** surpassing **GPT-4** in capability. Debate continues on whether Claude 3's performance stems from emergent properties or pattern matching. **LangChain** and **LlamaIn…

  191. Smol AINews TIER_1 ·

    12/26/2023: not much happened today

    **LM Studio** users extensively discussed its performance, installation issues on macOS, and upcoming features like **Exllama2 support** and multimodality with the **Llava model**. Conversations covered **GPU offloading**, **vRAM utilization**, **MoE model expert selection**, and…

  192. Smol AINews TIER_1 ·

    12/10/2023: not much happened today

    **Nous Research AI** Discord community discussed attending **NeurIPS** and organizing future AI events in Australia. Highlights include interest in open-source and decentralized AI projects, with **Richard Blythman** seeking co-founders. Users shared projects like **Photo GPT AI*…

  193. Artificial Intelligence News TIER_1 · AI News ·

    Hugging Face hosted malicious software masquerading as OpenAI release

    <p>A malicious Hugging Face repository that posed as an OpenAI release delivered infostealer malware to Windows machines and recorded about 244,000 downloads before removal, according to research from AI security firm HiddenLayer. The number of downloads may have been artificiall…

  194. Mastodon — sigmoid.social TIER_1 · [email protected] ·

    Fake # OpenAI repository on # HuggingFace pushes # infostealer # malware https://www. bleepingcomputer.com/news/secu rity/fake-openai-repository-on-hugging-face

    Fake # OpenAI repository on # HuggingFace pushes # infostealer # malware https://www. bleepingcomputer.com/news/secu rity/fake-openai-repository-on-hugging-face-pushes-infostealer-malware/ # ChatGPT # AI # cybersecurity

  195. Mastodon — mastodon.social TIER_1 · winbuzzer ·

    https:// winbuzzer.com/2026/05/11/fake- openai-repository-on-hugging-face-pushes-info-xcxwbn/ A fake Hugging Face repository copied OpenAI's Privacy Filter bran

    https:// winbuzzer.com/2026/05/11/fake- openai-repository-on-hugging-face-pushes-info-xcxwbn/ A fake Hugging Face repository copied OpenAI's Privacy Filter branding and delivered infostealer malware to Windows users. # AI # AIModels # HuggingFace # OpenAI # Infostealer # Cybersec…

  196. Mastodon — mastodon.social TIER_1 · [email protected] ·

    Fake # OpenAI Privacy Filter Repo Hits #1 on Hugging Face, Draws 244K Downloads https:// thehackernews.com/2026/05/fake -openai-privacy-filter-repo-hits-1.html?

    Fake # OpenAI Privacy Filter Repo Hits #1 on Hugging Face, Draws 244K Downloads https:// thehackernews.com/2026/05/fake -openai-privacy-filter-repo-hits-1.html?m=1 # ai # llm # security

  197. Mastodon — mastodon.social TIER_1 · [email protected] ·

    A fake Hugging Face repo impersonating OpenAI’s Privacy Filter model reportedly reached #1 trending while distributing infostealer malware. Researchers say it h

    A fake Hugging Face repo impersonating OpenAI’s Privacy Filter model reportedly reached #1 trending while distributing infostealer malware. Researchers say it hit ~244K downloads before removal. AI supply chain attacks are accelerating fast. Source: https:// thehackernews.com/202…

  198. Mastodon — mastodon.social TIER_1 Italiano(IT) · tomshw ·

    ⚠️ Fake OpenAI profiles on Hugging Face are spreading malware: verify author, repository, and files before downloading. Trust is not enough. #Cybersecurity #AI 🔗 ht

    ⚠️ Profili fake OpenAI su Hugging Face diffondono malware: verificate autore, repository e file prima di scaricare. Fidarsi non basta. # Cybersecurity # AI 🔗 https://www. tomshw.it/hardware/openai-fals a-hugging-face-malware-trend