Scalable GPU & AI Hardware Systems

Multi-GPU memory systems, GNN acceleration, hybrid DNN parallelism, LLM serving, KV-cache reuse, and scheduling.

Edge AI Systems

Adaptive continual learning, self-supervised training, sparse computation, and real-time inference under tight resource budgets.

Research Landscape

AI for Computer Vision

Dynamic patchification, token pruning, visual representation learning, 3D Gaussian splatting, and efficient video generation.

Quantum Computing Systems

Fault-tolerant compilation, photonic graph-state generation, qLDPC decoding, and quantum-classical acceleration.

Featured Projects

Research Areas

Scalable GPU and AI Hardware Systems

Active

Building the architecture, memory, and runtime systems for foundation-scale AI. Our work targets the systems barriers that limit multi-GPU platforms and AI workloads: distributed address translation, page placement, inter-GPU data movement, GNN irregularity, hybrid DNN parallelism, attention execution, KV-cache growth, and multi-tenant interference.

GPU Architecture GNN Acceleration LLM Systems KV Cache RAG Serving Runtime Systems

Selected Publications

  • arXiv — Chunk-Level KV Cache Reuse for Efficient RAG Serving
  • HPCA 2025 — OASIS: Object-Aware Page Management for Multi-GPU Systems
  • MLSys 2025 — FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference
  • ASPLOS 2025 — Cascade: A Dependency-Aware Efficient Training Framework for Temporal GNNs
  • HPCA 2024 — GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement
  • MICRO 2024 — STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU

Edge AI Systems

Active

Enabling AI models to adapt continuously on mobile GPUs and IoT-class devices without exceeding strict memory, latency, energy, and thermal budgets.

Edge Computing On-Device Learning Self-Supervised Learning Hardware-Software Co-design Efficient Inference

Selected Publications

  • arXiv — EdgeOL: Efficient In-situ Online Learning on Edge Devices
  • ICLR 2025 — Mutual Effort for Efficiency: Similarity-Based Token Pruning for Vision Transformers
  • ICLR 2024 — Waxing-and-Waning: Efficient Self-Supervised Learning
  • ICLR 2023 Spotlight — SmartFRZ: Attention-Based Layer Freezing for Efficient Training

AI for Computer Vision

Active

Developing systems techniques for high-quality visual intelligence and generation. A central direction is adaptive video generation, where patchification and pruning co-evolve with the denoising process so computation follows the regions that matter most.

Computer Vision Video Diffusion Dynamic Patchification 3D Gaussian Splatting Efficient Training

Selected Publications

  • CVPR 2026 — Content-Aware Dynamic Patchification for Efficient Video Diffusion
  • arXiv — Accelerating 3D Gaussian Splatting with Tensor Cores
  • ICLR 2025 — Mutual Effort for Efficiency: Similarity-Based Token Pruning for Vision Transformers
  • ICLR 2024 — Waxing-and-Waning: Efficient Self-Supervised Learning

Quantum Computing Systems

Active

Building compiler, architecture, and classical-acceleration support for scalable and fault-tolerant quantum computing.

Quantum Compilers Photonic Computing Fault Tolerance qLDPC Decoding Quantum-Classical Systems

Selected Publications

  • ICCAD 2025 — STMC: Small-Tile Multiple-Copy Compilation for Reliable Measurement-Based Quantum Computing
  • ISCA 2025 — Reinforcement Learning-Guided Graph State Generation in Photonic Quantum Computers
  • ASPLOS 2024 — FMCC: Flexible Measurement-Based Quantum Computation over Cluster State
  • ASPLOS 2024 — QRCC: Evaluating Large Quantum Circuits on Small Quantum Computers