About
The TANG Lab at the University of Pittsburgh builds efficient computing foundations, systems, and infrastructure for modern AI. As AI models grow more capable and computationally demanding, the challenge shifts to whether the underlying systems can train, serve, adapt, and reason at the necessary scale, speed, and efficiency.
We aim to make AI infrastructure fundamentally more scalable, efficient, and accessible through a full-stack view of AI computing. Our research connects computer architecture, compilers, runtime systems, and machine learning systems to address key bottlenecks: memory management and data movement in multi-GPU LLM training and inference, scheduling and kernel optimization for foundation-model workloads, on-device AI under tight latency and energy budgets, and compiler/runtime techniques that automate performance tuning for AI workloads. By co-designing across these layers, we aim to turn hardware complexity into usable performance. See our Projects →
Our long-term goal is to help define what AI computing systems and infrastructure should look like in the next decade. We envision platforms where large models can be trained and served with far less waste, where intelligence can move fluidly between cloud clusters and personal devices, and where quantum-classical systems can become practical accelerators. Through this work, we aim to make the future AI computing stack not only faster, but also more scalable and adaptive.