Publications
(* indicates equal contribution)
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding
Zikun Li*, 
Zhuofu Chen*, 
Remi Delacourt, 
Gabriele Oliaro, 
Zeyu Wang, 
Qinghan Chen, 
Shuhuai Lin, 
April Yang, 
Zhihao Zhang, 
Zhuoming Chen, 
Sean Lai, 
Xupeng Miao, 
Zhihao Jia 
under review.
[paper]
[code]
Large language model serving; SLO customization; Speculative decoding.
Characterizing Network Requirements for GPU API Remoting in AI Applications
Tianxia Wang*, 
Zhuofu Chen*, 
Xingda Wei, 
Jinyu Gu, 
Rong Chen, 
Haibo Chen 
under review.
[paper]
[code]
GPU disaggregation; Transparent API remoting; Proxy Optimization.
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Lijie Yang*, 
Zhihao Zhang*, 
Zhuofu Chen, 
Zikun Li, 
Zhihao Jia 
International Conference on Learning Representations (ICLR), 2025.
[paper]
[code]
Sparse attention; Efficient decoding.
|