Latest

What I am actively pushing right now

These are the projects and directions that best represent my current momentum. I use this page to show what is active now, not only what is already fully settled.

Serving systems Current

Shape-aware serving for long-context diffusion LLMs

I am refining a systems story around shape-aware batching, fairness controls, and strong same-model online baselines for long-context diffusion language model serving.

Retrieval Current

Workload-aware coordination for SQL plus vector search

My recent hybrid retrieval work studies when systems should switch among SQL-first, vector-first, cooperative execution, and exact fallback based on filter cost, selectivity, and interference.

Agent systems Emerging

Recoverability-aware planning for stateful agents

I am exploring how tool contracts, reversibility, checkpoint value, and repair cost can become first-class planning signals for more dependable stateful agents.

Examples of where the current work is heading

MarginGate studies sparse verification for batch-invariant LLM inference.
Latency-Quality Routing explores routing across functionally equivalent tool providers in LLM agents.
TrustWeave focuses on integrity measurement and attestation for multi-cloud LLMs.
Dynamic Expert Quantization pushes runtime-aware MoE serving under memory constraints.