Publications
(* indicates equal contribution)
Kimi K2: Open Agentic Intelligence
Kimi Team (was part of the project while interning at Kimi in spring 2025) 
[paper]
[code]
Large model architecture; Training.
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding
Zikun Li*, 
Zhuofu Chen*, 
Remi Delacourt, 
Gabriele Oliaro, 
Zeyu Wang, 
Qinghan Chen, 
Shuhuai Lin, 
April Yang, 
Zhihao Zhang, 
Zhuoming Chen, 
Sean Lai, 
Xupeng Miao, 
Zhihao Jia 
Proceedings of the European Conference on Computer Systems (EuroSys), 2026.
[paper]
[code]
Large language model serving; SLO customization; Speculative decoding.
Characterizing Network Requirements for GPU API Remoting in AI Applications
Tianxia Wang*, 
Zhuofu Chen*, 
Xingda Wei, 
Jinyu Gu, 
Rong Chen, 
Haibo Chen 
under review.
[paper]
[code]
GPU disaggregation; Transparent API remoting; Proxy Optimization.
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Lijie Yang*, 
Zhihao Zhang*, 
Zhuofu Chen, 
Zikun Li, 
Zhihao Jia 
International Conference on Learning Representations (ICLR), 2025.
[paper]
[code]
Sparse attention; Efficient decoding.
|