Latest

What I am actively pushing right now

These are the projects and directions that best represent my current momentum. I use this page to show what is active now, not only what is already fully settled.

Serving systems Current

Shape-aware serving for long-context diffusion LLMs

I am refining a systems story around shape-aware batching, fairness controls, and strong same-model online baselines for long-context diffusion language model serving.

Retrieval Current

Workload-aware coordination for SQL plus vector search

My recent hybrid retrieval work studies when systems should switch among SQL-first, vector-first, cooperative execution, and exact fallback based on filter cost, selectivity, and interference.

Agent systems Emerging

Recoverability-aware planning for stateful agents

I am exploring how tool contracts, reversibility, checkpoint value, and repair cost can become first-class planning signals for more dependable stateful agents.

Recent papers

Examples of where the current work is heading

  • MarginGate studies sparse verification for batch-invariant LLM inference.
  • Latency-Quality Routing explores routing across functionally equivalent tool providers in LLM agents.
  • TrustWeave focuses on integrity measurement and attestation for multi-cloud LLMs.
  • Dynamic Expert Quantization pushes runtime-aware MoE serving under memory constraints.