Cutlass
PulseAugur coverage of Cutlass — every cluster mentioning Cutlass across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
CUDA/C++ inference engine built for NVIDIA's DVLT 3D model
A new inference engine called dvlt.cu has been developed from scratch using CUDA/C++ for NVIDIA's DVLT 3D transformer model. This standalone 5MB binary has minimal dependencies, relying only on cuBLASLt and the header-o…
-
TileLang simplifies GPU kernel writing with Python interface
A new programming language called TileLang aims to simplify GPU kernel development by offering a middle ground between high-level frameworks like Triton and low-level control like CUTLASS. TileLang allows developers to …
-
CuTeDSL emerges as new GPU kernel path for LLM inference, challenging CUTLASS
The landscape of GPU kernel engineering for LLM inference is shifting, with CuTeDSL emerging as a potential successor to C++ CuTe/CUTLASS. This evolution is highlighted by industry trends in technologies like FlashAtten…
-
Moonshot AI open-sources FlashKDA, boosting Kimi Delta Attention 2.5x on H200 GPUs
Moonshot AI has released FlashKDA, an open-source implementation of Kimi Delta Attention. This new kernel achieves up to 2.5 times faster inference speeds on NVIDIA H200 GPUs. It is built using CUTLASS and optimized for…