Focus

Research themes that I keep returning to

I am most interested in the systems layer beneath modern AI: how memory, scheduling, retrieval, verification, and runtime policy interact when workloads become large, heterogeneous, and failure-prone.

LLM serving and inference systems

I work on efficient large-model inference with an emphasis on throughput, tail behavior, fairness, and memory pressure. That includes KV-cache management, batching strategy, runtime observability, and deterministic or verification-aware decoding.

Hybrid retrieval systems

I am interested in hybrid SQL plus vector retrieval as a coordination problem. The important question is not only which index to build, but how a system decides among SQL-first, vector-first, cooperative execution, and exact fallback under different regimes.

Trust, security, and recoverability

Another recurring theme in my work is making AI infrastructure more dependable. That includes secure multi-tenant serving, integrity or attestation for model execution, and recoverability-aware planning when agents interact with stateful tools.

How I work

Systems research with builder instincts

  • I prefer mechanism-level explanations over vague stories.
  • I care about real workloads more than toy wins.
  • I like evaluating how an idea changes tail behavior, not only averages.
  • I bring production systems instincts from Baidu into academic problem selection and evaluation.