SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
Source: https://wsai.iitm.ac.in/preprints/swan-sparse-winnowed-attention-for-reduced-inference-memory-via-decompression-free-kv-cache-compression-31/ Parent: https://wsai.iitm.ac.in/preprints/
SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
https://doi.org/10.48550/arXiv.2511.18936
Authors
S, Santhosh G , Prakash, Saurav , Ravindran, Balaraman
Preprint Server
arXiv
Santhosh G S, Saurav Prakash, Balaraman Ravindran, SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
Preprint link: https://arxiv.org/abs/2511.18936