Focus
Research themes that I keep returning to
I am most interested in the systems layer beneath modern AI: how memory, scheduling, retrieval, verification, and runtime policy interact when workloads become large, heterogeneous, and failure-prone.
LLM serving and inference systems
I work on efficient large-model inference with an emphasis on throughput, tail behavior, fairness, and memory pressure. That includes KV-cache management, batching strategy, runtime observability, and deterministic or verification-aware decoding.
Hybrid retrieval systems
I am interested in hybrid SQL plus vector retrieval as a coordination problem. The important question is not only which index to build, but how a system decides among SQL-first, vector-first, cooperative execution, and exact fallback under different regimes.
Trust, security, and recoverability
Another recurring theme in my work is making AI infrastructure more dependable. That includes secure multi-tenant serving, integrity or attestation for model execution, and recoverability-aware planning when agents interact with stateful tools.