view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 13 days ago • 474
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Paper • 2401.08671 • Published Jan 9, 2024 • 15