TANG Lab
  • Home
  • Team
  • Projects
  • Publications
  • Teaching
  • Services

Publications

  • CVPR 2026 Content-Aware Dynamic Patchification for Efficient Video Diffusion [paper]
    Sheng Li, Connelly Barnes, Mamshad Nayeem Rizve, Hongwu Peng, Zhengang Li, Ohi Dibua, Alireza Ganjdanesh, Xulong Tang, Yan Kang, Yifan Gong
  • arXiv Chunk-Level KV Cache Reuse for Efficient RAG Serving [paper]
    Yueqi Wang, Bingyao Li, Mohamed Tarek Ibn Ziad, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang
  • arXiv Accelerating 3D Gaussian Splatting with Tensor Cores [paper]
    Sheng Li, Yang Sui, Yue Wu, Zhuoran Song, Bo Yuan, Xulong Tang, Yue Dai
  • arXiv Swap-Free Quantum LDPC Code Mapping on Near-Term Local Architecture [paper]
    Aditya Pawar, Yingheng Li, Xulong Tang, Youtao Zhang, Jun Yang
  • arXiv EdgeOL: Efficient in-situ Online Learning on Edge Devices [paper]
    Sheng Li, Geng Yuan, Yue Dai, Tianyu Wang, Yawen Wu, Alex K. Jones, Jingtong Hu, Geng Yuan, Yanzhi Wang, Bo Yuan, Yufei Ding, Xulong Tang
  • arXiv Non-Clifford Fusion: T-Gate Optimization for Quantum Simulation [paper]
    Yingheng Li, Xulong Tang, Paul Hovland, Ji Liu
  • arXiv Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration [paper]
    Tianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang
  • ICCAD 2025 STMC: Small-Tile Multiple-Copy Compilation for Reliable Measurement-Based Quantum Computing [paper]
    Rongchao Dong, Zewei Mo, Yingheng Li, Aditya Pawar, Jun Yang, Youtao Zhang, Xulong Tang
  • ICML 2025 MemFreezing: A Novel Adversarial Attack on Temporal Graph Neural Networks under Limited Future Knowledge [paper]
    Yue Dai, Liang Liu, Xulong Tang, Youtao Zhang, Jun Yang
  • ICS 2025 CIExplorer: Microarchitecture-Aware Exploration for Tightly Integrated Custom Instruction [paper]
    Xiaoyu Hao, Sen Zhang, Liang Qiao, Qingcai Jiang, Jun Shi, Junshi Chen, Hong An, Xulong Tang, Hao Shu, Honghui Yuan
  • ISCA 2025 Reinforcement Learning-Guided Graph State Generation in Photonic Quantum Computers [paper]
    Yingheng Li, Yue Dai, Aditya Pawar, Rongchao Dong, Jun Yang, Youtao Zhang, Xulong Tang
  • MLSys 2025 FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference [paper]
    Zaifeng Pan, Yitong Ding, Yue Guan, Zheng Wang, Zhongkai Yu, Xulong Tang, Yida Wang, Yufei Ding
  • ASPLOS 2025 Cascade: A Dependency-aware Efficient Training Framework for Temporal Graph Neural Network [paper]
    Yue Dai, Xulong Tang, Youtao Zhang
  • ASPLOS 2025 Pruner: A Draft-then-Verify Exploration Mechanism to Accelerate Tensor Program Tuning [paper]
    Liang Qiao, Jun Shi, Xiaoyu Hao, Xi Fang, Sen Zhang, Minfan Zhao, Ziqi Zhu, Junshi Chen, Hong An, Xulong Tang, Bing Li, Honghui Yuan, Xinyang Wang
  • ICLR 2025 Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning [paper]
    Sheng Li, Qitao Tan, Yue Dai, Zhenglun Kong, Tianyu Wang, Jun Liu, Ao Li, Ninghao Liu, Yufei Ding, Xulong Tang, Geng Yuan
  • HPCA 2025 OASIS: Object-Aware Page Management for Multi-GPU Systems [paper]
    Yueqi Wang, Bingyao Li, Mohamed Tarek Ibn Ziad, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang
  • MICRO 2024 STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU [paper]
    Bingyao Li, Yueqi Wang, Tianyu Wang, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang
  • ASPLOS 2024 FMCC: Flexible Measurement-based Quantum Computation over Cluster State [paper]
    Yingheng Li, Aditya Pawar, Zewei Mo, Youtao Zhang, Jun Yang, Xulong Tang
  • ASPLOS 2024 QRCC: Evaluating Large Quantum Circuits on Small Quantum Computers through Integrated Qubit Reuse and Circuit Cutting [paper]
    Aditya Pawar, Yingheng Li, Zewei Mo, Yanan Guo, Xulong Tang, Youtao Zhang, Jun Yang
  • DAC 2024 FCM: Wire Cutting For Fusion Reduction in Measurement-based Quantum Computing [paper]
    Zewei Mo, Yingheng Li, Aditya Pawar, Xulong Tang, Jun Yang, Youtao Zhang
  • DAC 2024 LOTUS: learning-based online thermal and latency variation management for two-stage detectors on edge devices [paper]
    Yifan Gong, Yushu Wu, Pu Zhao, Zheng Zhan, Liangkai Liu, Chao Wu, Xulong Tang, Yanzhi Wang
  • ICLR 2024 Waxing-and-Waning: a Generic Similarity-based Framework for Efficient Self-Supervised Learning [paper]
    Sheng Li, Chao Wu, Ao Li, Yanzhi Wang, Xulong Tang, Geng Yuan
  • HPCA 2024 GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement [paper]
    Yueqi Wang*, Bingyao Li*, Aamer Jaleel, Jun Yang, Xulong Tang
  • ICCD 2023 FlexGM: An Adaptive Runtime System to Accelerate Graph Matching Networks on GPUs [paper]
    Yue Dai, Xulong Tang, Youtao Zhang
  • MICRO 2023 IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations [paper]
    Bingyao Li, Yanan Guo, Yueqi Wang, Aamer Jaleel, Jun Yang, Xulong Tang
  • MICRO 2023 SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices [paper]
    Zhengang Li, Geng Yuan, Tomoharu Yamauchi, Zabihi Masoud, Yanyue Xie, Peiyan Dong, Xulong Tang, Nobuyuki Yoshikawa, Devesh Tiwari, Yanzhi Wang, Olivia Chen
  • DAC 2023 Orchestrated Scheduling and Partitioning for Improved Address Translation in GPUs [paper]
    Bingyao Li, Yueqi Wang, Xulong Tang
  • DAC 2023 Orchestrating Measurement-Based Quantum Computation over Photonic Quantum Processors [paper]
    Yingheng Li, Aditya Pawar, Mohadeseh Azari, Yanan Guo, Youtao Zhang, Jun Yang, Kaushik Parasuram Seshadreesan, Xulong Tang
  • DAC 2023 EP-ORAM: Efficient NVM-Friendly Path Eviction for Ring ORAM in Hybrid Memory [paper]
    Mehrnoosh Raoufi, Jun Yang, Xulong Tang, Youtao Zhang
  • ICLR 2023 Spotlight SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing [paper]
    Sheng Li*, Geng Yuan*, Yue Dai, Youtao Zhang, Yanzhi Wang, Xulong Tang
  • HPCA 2023 Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding [paper]
    Bingyao Li, Jieming Yin, Anup Holey, Youtao Zhang, Jun Yang, Xulong Tang
  • HPCA 2023 CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks [paper]
    Yue Dai, Youtao Zhang, Xulong Tang
  • HPCA 2023 AB-ORAM: Constructing Adjustable Buckets for Space Reduction in Ring ORAM [paper]
    Mehrnoosh Raoufi, Jun Yang, Xulong Tang, Youtao Zhang
  • HPCA 2022 Q-GPU: A Recipe of Optimizations for Quantum Circuit Simulation Using GPUs [paper]
    Yilun Zhao, Yanan Guo, Yuan Yao, Amanda Dumi, Devin M Mulvey, Shiv Upadhyay, Youtao Zhang, Kenneth D Jordan, Jun Yang, Xulong Tang
  • NeurIPS 2022 Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training [paper]
    Geng Yuan, Yanyu Li, Sheng Li, Zhenglun Kong, Sergey Tulyakov, Xulong Tang, Yanzhi Wang, Jian Ren
  • ECCV 2022 You Already Have It: A Generator-Free Low-Precision DNN Training Framework using Stochastic Rounding [paper]
    Geng Yuan, Sung-En Chang, Qing Jin, Alec Lu, Yanyu Li, Yushu Wu, Zhenglun Kong, Yanyue Xie, Peiyan Dong, Minghai Qin, Xiaolong Ma, Xulong Tang, Zhenman Fang, Yanzhi Wang
  • ICCAD 2022 Fine-Granular Computation and Data Layout Reorganization for Improving Locality [paper]
    Mahmut Taylan Kandemir, Xulong Tang, Jagadish Kotra, Mustafa Karakoy
  • ICCD 2022 Enhancing GPU Performance via Neighboring Directory Table Based Inter-TLB Sharing [paper]
    Yajuan Du, Mingyang Liu, Yuqi Yang, Mingzhe Zhang, Xulong Tang
  • TECS 2022 Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [paper]
    Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang
  • IEEE Micro 2022 Sustainable AI Processing at the Edge [paper]
    Sebastien Ollivier, Sheng Li, Yue Tang, Chayanika Chaudhuri, Peipei Zhou, Xulong Tang, Jingtong Hu, Alex K. Jones
  • WWW 2022 Workshop Optimizing Data Layout for Training Deep Neural Networks [paper]
    Bingyao Li*, Qi Xue*, Geng Yuan*, Sheng Li, Xiaolong Ma, Yanzhi Wang, Xulong Tang
  • EuroSys 2022 Poster Rethinking Latency-aware DNN Design with GPU Tail Effect Analysis [paper]
    F. Yu, Z. Xu, T. Shen, D. Stamoulis, L. Shangguan, D. Wang, M. Zhang, X. Tang, R. Madhok, C. Zhao, X. Li, N. Karianakis, D. Lymberopoulos, C. Liu, A. Li, Y. Chen, X. Chen
  • CCF THPC 2022 An Efficient Segmented Quantization for Graph Neural Networks [paper]
    Yue Dai, Xulong Tang, Youtao Zhang
  • Arxiv 2022 Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System [paper]
    Zhendong Wang, Xiaoming Zeng, Xulong Tang, Danfeng Zhang, Xing Hu, Yang Hu
  • MICRO 2021 Improving Address Translation in Multi-GPUs via Sharing and Spilling Aware TLB Design [paper]
    Bingyao Li, Jieming Yin, Youtao Zhang, Xulong Tang
  • ICCAD 2021 ScaleDNN: Data Movement Aware DNN Training on Multi-GPU [paper]
    Weizheng Xu, Ashutosh Pattnaik, Geng Yuan, Yanzhi Wang, Youtao Zhang, Xulong Tang
  • ICCAD 2021 Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU [paper]
    Fuxun Yu, Shawn Bray, Di Wang, Longfei Shangguan, Xulong Tang, Chenchen Liu, Xiang Chen
  • SIGMETRICS 2021 Mix and Match: Reorganizing Tasks for Enhancing Data Locality [paper]
    Xulong Tang, Mahmut Taylan Kandemir, Mustafa Karakoy
  • PLDI 2021 Distance-in-Time versus Distance-in-Space [paper]
    Mahmut Taylan Kandemir, Xulong Tang, Hui Zhao, Jihyun Ryoo, Mustafa Karakoy
  • PLDI 2021 Fluid: A Framework for Approximate Concurrency via Controlled Dependency Relaxation [paper]
    Huaipan Jiang, Haibo Zhang, Xulong Tang, Vineetha Govindaraj, Jack Sampson, Mahmut Taylan Kandemir, Danfeng Zhang
  • PPoPP 2021 Compiler Support for Near Data Computing [paper]
    Mahmut Taylan Kandemir, Jihyun Ryoo, Xulong Tang, Mustafa Karakoy
  • AAAI 2021 YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design [paper]
    Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang
  • AAAI 2021 A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices [paper]
    Yuxuan Cai, Geng Yuan, Hongjia Li, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang
  • CODES+ISSS 2021 Algorithm-Hardware Co-design of Attention Mechanism on FPGA Devices [paper]
    Xinyi Zhang, Yawen Wu, Peipei Zhou, Xulong Tang, Jingtong Hu
  • RTAS 2021 Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework [paper]
    Geng Yuan, Peiyan Dong, Mengshu Sun, Wei Niu, Zhengang Li, Yuxuan Cai, Jun Liu, Weiwen Jiang, Xue Lin, Bin Ren, Xulong Tang, Yanzhi Wang
  • WWW 2021 Workshop Parallelizing DNN Training on GPUs: Challenges and Opportunities [paper]
    Weizheng Xu, Youtao Zhang, Xulong Tang
  • ATS 2021 Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption [paper]
    Zhendong Wang, Rujia Wang, Zihang Jiang, Xulong Tang, Shouyi Yin, Yang Hu
  • NAS 2021 Characterizing AI Model Inference Applications Running in SGX Environment [paper]
    Shixiong Jing, Qinkun Bao, Pei Wang, Xulong Tang, Dinghao Wu
  • PACT 2020 Enhancing Address Translations in Throughput Processors via Compression [paper]
    Xulong Tang, Ziyu Zhang, Weizheng Xu, Mahmut Taylan Kandemir, Rami Melhem, Jun Yang
  • TCAD 2020 Enabling Latency-aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform [paper]
    Zhendong Wang, Zihang Jiang, Zhen Wang, Xulong Tang, Cong Liu, Yang Hu
  • NeurIPS 2020 Workshop YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design [paper]
    Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang
  • ISCA 2019 Opportunistic Computing in GPU Architectures [paper]
    Ashutosh Pattnaik, Xulong Tang, Onur Kayiran, Adwait Jog, Asit Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, Chita R. Das
  • PLDI 2019 Co-Optimizing Memory-Level Parallelism and Cache-Level Parallelism [paper]
    Xulong Tang, Mahmut Taylan Kandemir, Mustafa Karakoy, Meena Arunachalam
  • SIGMETRICS 2019 Quantifying Data Locality in Dynamic Parallelism in GPUs [paper]
    Xulong Tang, Ashutosh Pattnaik, Onur Kayiran, Adwait Jog, Mahmut Taylan Kandemir, Chita Das
  • SIGMETRICS 2019 Computing with Near Data [paper]
    Xulong Tang, Mahmut Taylan Kandemir, Hui Zhao, Myoungsoo Jung, Mustafa Karakoy
  • SIGMETRICS 2019 Architecture-Aware Approximate Computing [paper]
    Mustafa Karakoy, Orhan Kislal, Xulong Tang, Mahmut Taylan Kandemir, Meenakshi Arunachalam
  • HiPC 2019 Architecture-Centric Bottleneck Analysis for Deep Neural Network Applications [paper]
    Jihyun Ryoo, Mengran Fan, Xulong Tang, Huaipan Jiang, Meena Arunachalam, Sharada Naveen, Mahmut T. Kandemir
  • PLDI 2018 Enhancing Computation-to-Core Assignment with Physical Location Information [paper]
    Orhan Kislal, Jagadish B. Kotra, Xulong Tang, Mahmut T. Kandemir, Myoungsoo Jung
  • MASCOTS 2018 Quantifying and Optimizing Data Access Parallelism on Manycores [paper]
    Jihyun Ryoo, Orhan Kislal, Xulong Tang, Mahmut T. Kandemir
  • GPGPU 11 @ PPoPP 2018 Oversubscribed Command Queues in GPUs [paper]
    Sooraj Puthoor, Xulong Tang, Joseph Gross, Bradford M Beckmann
  • MICRO 2017 Data Movement Aware Computation Partitioning [paper]
    Xulong Tang, Orhan Kislal, Mahmut Kandemir, Mustafa Karakoy
  • HPCA 2017 Controlled Kernel Launch for Dynamic Parallelism in GPUs [paper]
    Xulong Tang, Ashutosh Pattnaik, Huaipan Jiang, Onur Kayiran, Adwait Jog, Sreepathi Pai, Mohamed Ibrahim, Mahmut Taylan Kandemir, Chita Das
  • MASCOTS 2017 DEMM: a Dynamic Energy-saving mechanism for Multicore [paper]
    Akbar Sharifi, Wei Ding, Diana Guttman, Hui Zhao, Xulong Tang, Mahmut Kandemir, Chita Das
  • PACT 2017 Poster POSTER: Location-Aware Computation Mapping for Manycore Processors [paper]
    Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, Myoungsoo Jung
  • MICRO 2016 Improving Bank-Level Parallelism for Irregular Applications [paper]
    Xulong Tang, Mahmut Kandemir, Praveen Yedlapalli, Jagadish Kotra
  • PACT 2016 Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities [paper]
    Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut Taylan Kandemir, Onur Mutlu, Chita R. Das
  • PACT 2016 µC-States: Fine-grained GPU Datapath Power Management [paper]
    Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, Mahmut Taylan Kandemir, Gabriel H. Loh, Onur Mutlu, Chita R. Das
  • PLDI 2015 Optimizing Off-Chip Accesses in Manycores [paper]
    Wei Ding, Xulong Tang, Mahmut Taylan Kandemir, Yuanrui Zhang, Emre Kultursay
  • SIGMETRICS 2015 Memory Row Reuse Distance and its Role in Optimizing Application Performance [paper]
    Mahmut Taylan Kandemir, Hui Zhao, Xulong Tang, Mustafa Karakoy
  • SNPD 2013 A Video Coding Benchmark Suite for Evaluation of Processor Capability [paper]
    Xulong Tang, Hong An, Gongjin Sun, Dongrui Fan
  • PPoPP 2012 Poster FlexBFS: A Parallelism-aware Implementation of Breadth-First Search on GPU [paper]
    Gu Liu, Hong An, Xiaoqiang Li, Wei Zhou, Xuechao Wei, Xulong Tang
University of Pittsburgh

TANG Lab

Department of Computer Science, University of Pittsburgh

210 S. Bouquet Street, SENSQ 6514, Pittsburgh, PA 15213