Research — Xulong Tang

Current Research Focus

My research interests lie in the area of both compiler and computer architecture. More specifically, my group works on software–hardware co-designs for emerging applications, exploring algorithm-level innovations and advanced architecture designs.

Multi-GPU & LLM Infrastructure

Fueling machine learning and deep learning applications at scale on single- and multi-GPU systems. Exploring DNN characteristics, compiler optimizations, runtime management, and next-generation GPU architecture features.

Quantum Computing Systems

Building efficient quantum computing ecosystems. Leveraging system optimizations to simulate large quantum circuits, developing front-end/back-end compiler support, and exploring heterogeneous quantum–classical system designs.

Edge AI

Software–hardware co-design for addressing performance bottlenecks in edge platforms. Developing compiler-assisted paging, zero-copy remapping, and high-performance persistence support for secure non-volatile memory.

Past Research (Ph.D.)

Optimizing Dynamic Parallelism for Irregular Applications on GPGPUs. Designed a runtime control system that dynamically decides child kernel launch, enabling better mixing of parent and child kernels to hide launch overheads and improve GPU utilization. Analyzed data reuse and designed locality-aware schedulers.
Compiler-assisted Optimization on Manycore Platforms. Proposed a loop iteration scheduling strategy considering both bank-level parallelism (inter-core) and bank reuse (intra-core) for irregular applications. Designed a compiler algorithm to partition computations into subcomputations scheduled for minimal on-chip network distance-to-data.

Experience

Internship at AMD Research, Aug–Dec 2017
Internship at Samsung Research America, Summer 2015
Internship at Institute of Computing Technology, Chinese Academy of Sciences, 2011–2013