University of Connecticut | Focusing on Machine Learning Systems, LLM Infrastructure, System Security, and Disaggregated Memory
π Connecticut, USA
π‘ Open to Collaboration! If you're interested in MLSys, Multi-Agent Systems, Disaggregated Memory, or Security, feel free to reach out β let's build something exciting together!
Efficient inference, KV-cache optimization, and serving systems for large language models
Security and privacy in ML systems, timing side-channel mitigation
RDMA, disaggregated memory, CXL, and memory-tiered architectures
Multi-tier KV cache management system for efficient large language model inference.
π Read Paper βRuntime system for MoE inference that combines adaptive expert prefetching and cache-aware routing to optimize inference under memory constraints.
π Read Paper βSecurity-focused approach to prevent timing attacks in LLM serving systems.
π Read Paper βeBPF-based tracing framework for distributed LLM inference systems.
π Read Paper βPrivacy-preserving KV-cache sharing mechanism for multi-tenant LLM serving.
π Paper π₯ PresentationUniversity of Connecticut, USA | 2024 - Present
Research Focus: ML Systems, KV-cache optimization, RDMA-backed storage, and disaggregated memory architectures.
π° Predoctoral Fellowship Recipient
Hefei University of Technology, China
Co-supervised by A.P. Ying Wang and A.P. Cheng Liu
Specialized in computer architecture and AI acceleration
Hefei University of Technology, China
Foundation in digital circuit design, computer organization, and system integration
π National Scholarship Recipient (2018, 2019)
Baidu Inc., Beijing, China
Department: Search R&D Platform - Focus on large-scale backend systems
π Career Progression: T3 β T4 (2021) β T5 (2023)
Key Contributions:
University of Connecticut
2025
Baidu Inc.
2022
China Ministry of Education
2018, 2019
Interested in collaboration or have questions about my research?
Send me an email β