Publications
(* indicates equal contribution)
Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows
Yinwei Dai, 
Zhuofu Chen, 
Anand Iyer, 
Ravi Netravali 
under review.
[paper]
Kimi K2: Open Agentic Intelligence
Kimi Team (was part of the project while interning at Kimi in spring 2025) 
[paper]
[code]
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding
Zikun Li*, 
Zhuofu Chen*, 
Remi Delacourt, 
Gabriele Oliaro, 
Zeyu Wang, 
Qinghan Chen, 
Shuhuai Lin, 
April Yang, 
Zhihao Zhang, 
Zhuoming Chen, 
Sean Lai, 
Xupeng Miao, 
Zhihao Jia 
Proceedings of the European Conference on Computer Systems (EuroSys), 2026.
[paper]
[code]
Characterizing Network Requirements for GPU API Remoting in AI Applications
Tianxia Wang*, 
Zhuofu Chen*, 
Xingda Wei, 
Jinyu Gu, 
Rong Chen, 
Haibo Chen 
under review.
[paper]
[code]
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Lijie Yang*, 
Zhihao Zhang*, 
Zhuofu Chen, 
Zikun Li, 
Zhihao Jia 
International Conference on Learning Representations (ICLR), 2025.
[paper]
[code]
|