TrustWeave: Integrity Measurement and Attestation For Multi-Cloud LLMs
A system for integrity measurement and attestation in multi-cloud LLM deployments, aimed at making model execution more trustworthy across heterogeneous cloud environments.
This page highlights representative papers across LLM systems, hybrid infrastructure, security, and trustworthy AI systems. For the most complete and up-to-date list, please use my Google Scholar profile.
A system for integrity measurement and attestation in multi-cloud LLM deployments, aimed at making model execution more trustworthy across heterogeneous cloud environments.
A verification policy that targets only low-margin decode steps, aiming to restore deterministic decoding while keeping verification overhead significantly lower than always-on checking.
A routing system for same-function tool providers that treats latency as service capacity and answer quality as the main target under changing load and provider heterogeneity.
A multi-tier KV-cache management system for improving large-model inference efficiency under memory pressure.
A runtime system for MoE inference that jointly optimizes expert scheduling and memory coordination instead of treating them as separate problems.
A runtime-aware approach to dynamic expert precision control for MoE serving, designed to adapt expert bit-widths under strict GPU memory budgets.
A security-focused line of work on reducing timing side-channel leakage in multi-tenant LLM serving without giving up the performance benefits of shared caching.