ENTITY GPTQ

GPTQ

PulseAugur coverage of GPTQ — every cluster mentioning GPTQ across labs, papers, and developer communities, ranked by signal.

Total · 30d

6 over 90d

Releases · 30d

0 over 90d

Papers · 30d

5 over 90d

TIER MIX · 90D

RELATIONSHIPS

other waqf 50%

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL

TOOL · CL_30718 · May 13 · 16:47

New paper details improved quantization for LLM matrix multiplication

Researchers have published a paper detailing advancements in quantized matrix multiplication, specifically for large language models (LLMs). This second part of their work focuses on scenarios where the covariance matri…
TOOL · CL_27223 · May 11 · 21:34

ExLlamaV3, Unsloth Qwen, and Phi3 agent see major local AI updates

This week's local AI news highlights significant updates to the ExLlamaV3 inference library, enhancing efficiency for running quantized Llama models on consumer GPUs. Additionally, new GGUF-quantized versions of Qwen 3.…
RESEARCH · CL_15961 · May 5 · 04:00

New methods accelerate LLMs via efficient sparsification, quantization, and compression

Researchers have developed several new methods for compressing and optimizing large language models (LLMs) to improve efficiency and reduce computational costs. SparseForge focuses on efficient semi-structured sparsific…
RESEARCH · CL_11807 · May 1 · 04:00

New methods tackle LLM quantization for improved efficiency and accuracy

Researchers have developed several new methods to improve the efficiency of large language models (LLMs) through quantization. OSAQ focuses on suppressing weight outliers using a low-rank Hessian property for accurate l…
RESEARCH · CL_01274 · May 24 · 00:00

Hugging Face introduces advanced quantization techniques for efficient LLMs

Researchers are developing advanced quantization techniques to make large language models (LLMs) more efficient. New methods like AutoRound, LATMiX, and GSQ aim to reduce model size and computational requirements, enabl…
RESEARCH · CL_01035 · Jan 10 · 17:00

Optimizing Transformer Inference: Techniques for Faster, Cheaper Large Models

Large transformer models present significant inference challenges due to their substantial memory footprint and computation costs, which scale quadratically with input length. Researchers and practitioners are exploring…

New paper details improved quantization for LLM matrix multiplication

ExLlamaV3, Unsloth Qwen, and Phi3 agent see major local AI updates

New methods accelerate LLMs via efficient sparsification, quantization, and compression

New methods tackle LLM quantization for improved efficiency and accuracy

Hugging Face introduces advanced quantization techniques for efficient LLMs

Optimizing Transformer Inference: Techniques for Faster, Cheaper Large Models