Pitt Students from My Group
Yueqi Wang, Bingyao Li, Mohamed Tarek Ibn Ziad, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang
OASIS: Object-Aware Page Management for Multi-GPU Systems.
In Proceedings of the 31st IEEE International Symposium on High-Performance Computer Architecture
(HPCA 2025)
Bingyao Li, Yueqi Wang, Tianyu Wang, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang
STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU.
In Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture
(MICRO 2024)
Yingheng Li, Aditya Pawar, Zewei Mo, Youtao Zhang, Jun Yang, Xulong Tang
FMCC: Flexible Measurement-based Quantum Computation over Cluster State.
In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS 2024)
Aditya Pawar, Yingheng Li, Zewei Mo, Yanan Guo, Xulong Tang, Youtao Zhang, Jun Yang
QRCC: Evaluating Large Quantum Circuits on Small Quantum Computers through Integrated Qubit Reuse and Circuit Cutting.
In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS 2024)
Zewei Mo, Yingheng Li, Aditya Pawar, Xulong Tang, Jun Yang, Youtao Zhang
FCM: Wire Cutting For Fusion Reduction in Measurement-based Quantum Computing.
In Proceedings of the 61th Design Automation Conference
(DAC 2024)
Yifan Gong, Yushu Wu, PU ZHAO, zheng zhan, Liangkai Liu, Chao Wu, Xulong Tang, Yanzhi Wang
LOTUS: learning-based online thermal and latency variation management for two-stage detectors on edge devices.
In Proceedings of the 61th Design Automation Conference
(DAC 2024)
Sheng Li, Chao Wu, Ao Li, Yanzhi Wang, Xulong Tang, Geng Yuan
Waxing-and-Waning: a Generic Similarity-based Framework for Efficient Self-Supervised Learning.
In Proceedings of the 12th International Conference on Learning Representations
(ICLR 2024)
Yueqi Wang*, Bingyao Li*, Aamer Jaleel, Jun Yang, Xulong Tang
GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement.
In Proceedings of the 30th IEEE International Symposium on High-Performance Computer Architecture
(HPCA 2024)
Yue Dai, Xulong Tang, Youtao Zhang
FlexGM: An Adaptive Runtime System to Accelerate Graph Matching Networks on GPUs.
In Proceedings of the 41st IEEE International Conference on Computer Design
(ICCD 2023)
Bingyao Li, Yanan Guo, Yueqi Wang, Aamer Jaleel, Jun Yang, Xulong Tang
IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations.
In Proceedings of the 56th IEEE/ACM International Symposium on Microarchitecture
(MICRO 2023)
Zhengang Li, Geng Yuan, Tomoharu Yamauchi, Zabihi Masoud, Yanyue Xie, Peiyan Dong, Xulong Tang, Nobuyuki Yoshikawa, Devesh Tiwari, Yanzhi Wang, Olivia Chen
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices.
In Proceedings of the 56th IEEE/ACM International Symposium on Microarchitecture
(MICRO 2023)
Bingyao Li, Yueqi Wang, Xulong Tang
Orchestrated Scheduling and Partitioning for Improved Address Translation in GPUs.
In Proceedings of the 60th Design Automation Conference
(DAC 2023)
Yingheng Li, Aditya Pawar, Mohadeseh Azari, Yanan Guo, Youtao Zhang, Jun Yang, Kaushik Parasuram Seshadreesan, Xulong Tang
Orchestrating Measurement-Based Quantum Computation over Photonic Quantum Processors.
In Proceedings of the 60th Design Automation Conference
(DAC 2023)
Mehrnoosh Raoufi, Jun Yang, Xulong Tang, Youtao Zhang
EP-ORAM: Efficient NVM-Friendly Path Eviction for Ring ORAM in Hybrid Memory.
In Proceedings of the 60th Design Automation Conference
(DAC 2023)
Sheng Li*, Geng Yuan*, Yue Dai, Youtao Zhang, Yanzhi Wang, Xulong Tang
SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing.
In Proceedings of the 11th International Conference on Learning Representations
(ICLR 2023 Spotlight)
Bingyao Li, Jieming Yin, Anup Holey, Youtao Zhang, Jun Yang, Xulong Tang
Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding.
In Proceedings of the 29th IEEE International Symposium on High-Performance Computer Architecture
(HPCA 2023)
Yue Dai, Youtao Zhang, Xulong Tang
CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks.
In Proceedings of the 29th IEEE International Symposium on High-Performance Computer Architecture
(HPCA 2023)
Mehrnoosh Raoufi, Jun Yang, Xulong Tang, Youtao Zhang
AB-ORAM: Constructing Adjustable Buckets for Space Reduction in Ring ORAM.
In Proceedings of the 29th IEEE International Symposium on High-Performance Computer Architecture
(HPCA 2023)
Yilun Zhao, Yanan Guo, Yuan Yao, Amanda Dumi, Devin M Mulvey, Shiv Upadhyay, Youtao Zhang, Kenneth D Jordan, Jun Yang, Xulong Tang
Q-GPU: A Recipe of Optimizations for Quantum Circuit Simulation Using GPUs.
In Proceedings of the 28th IEEE International Symposium on High-Performance Computer Architecture
(HPCA 2022)
Bingyao Li*, Qi Xue*, Geng Yuan* Sheng Li, Xiaolong Ma, Yanzhi Wang, Xulong Tang
Optimizing Data Layout for Training Deep Neural Networks.
In Proceedings of the WWW '22: Companion Proceedings of the Web Conference 2022
(WWW 2022 workshop)
Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration.
In Proceedings of ACM Transactions on Embedded Computing Systems
(TECS 2022)
Geng Yuan, Sung-En Chang, Qing Jin, Alec Lu, Yanyu Li, Yushu Wu, Zhenglun Kong, Yanyue Xie, Peiyan Dong, Minghai Qin, Xiaolong Ma, Xulong Tang, Zhenman Fang, Yanzhi Wang
You Already Have It: A Generator-Free Low-Precision DNN Training Framework using Stochastic Rounding.
In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference
(ECCV 2022)
Geng Yuan, Yanyu Li, Sheng Li, Zhenglun Kong, Sergey Tulyakov, Xulong Tang, Yanzhi Wang, Jian Ren
Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training.
In Proceedings of the 36 the Conference on Neural Information Processing Systems
(NeurIPS 2022)
Mahmut Taylan Kandemir, Xulong Tang, Jagadish Kotra, Mustafa Karakoy
Fine-Granular Computation and Data Layout Reorganization for Improving Locality.
In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design
(ICCAD 2022)
Yajuan Du, Mingyang Liu, Yuqi Yang, Mingzhe Zhang, Xulong Tang
Enhancing GPU Performance via Neighboring Directory Table Based Inter-TLB Sharing.
In Proceedings of the IEEE International Conference on Computer Design
(ICCD 2022)
Sebastien Ollivier, Sheng Li, Yue Tang, Chayanika Chaudhuri, Peipei Zhou, Xulong Tang, Jingtong Hu, Alex K. Jones
Sustainable AI Processing at the Edge.
In Proceedings of the IEEE Micro
(IEEE Micro)
F. Yu, Z. Xu, T. Shen, D. Stamoulis, L. Shangguan, D. Wang, M. Zhang, X. Tang, R. Madhok, C. Zhao, X. Li, N. Karianakis, D. Lymberopoulos, C. Liu, A. Li, Y. Chen, and X. Chen
Rethinking Latency-aware DNN Design with GPU Tail Effect Analysis.
Poster accepted in the 17th European Conference on Computer Systems (EuroSys)
(EuroSys 2022 poster)
Yue Dai, Xulong Tang, Youtao Zhang
An Efficient Segmented Quantization for Graph Neural Networks.
In Proceedings of CCF Transactions on High Performance Computing
Zhendong Wang, Xiaoming Zeng, Xulong Tang, Danfeng Zhang, Xing Hu, Yang Hu
Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System.
Arxiv
Bingyao Li, Jieming Yin, Youtao Zhang, Xulong Tang
Improving Address Translation in Multi-GPUs via Sharing and Spilling Aware TLB Design.
In Proceedings of the 54th IEEE/ACM International Symposium on Microarchitecture
(MICRO 2021)
Weizheng Xu, Ashutosh Pattnaik, Geng Yuan, Yanzhi Wang, Youtao Zhang, Xulong Tang
ScaleDNN: Data Movement Aware DNN Training on Multi-GPU.
In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design
(ICCAD 2021)
Fuxun Yu, Shawn Bray, Di Wang, Longfei Shangguan, Xulong Tang, Chenchen Liu, Xiang Chen
Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU.
In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design
(ICCAD 2021)
Xulong Tang, Mahmut Taylan Kandemir, Mustafa Karakoy
Mix and Match: Reorganizing Tasks for Enhancing Data Locality.
In Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS Journal)
(SIGMETRICS 2021)
Mahmut Taylan Kandemir, Xulong Tang, Hui Zhao, Jihyun Ryoo, Mustafa Karakoy
Distance-in-Time versus Distance-in-Space.
In proceedings of 42nd annual ACM SIGPLAN conference on Programming Language Design and Implementation
(PLDI 2021)
Huaipan Jiang, Haibo Zhang, Xulong Tang, Vineetha Govindaraj, Jack Sampson, Mahmut Taylan Kandemir, Danfeng Zhang
Fluid: A Framework for Approximate Concurrency via Controlled Dependency Relaxation.
In proceedings of 42nd annual ACM SIGPLAN conference on Programming Language Design and Implementation
(PLDI 2021)
Zhendong Wang, Rujia Wang, Zihang Jiang, Xulong Tang, Shouyi Yin, Yang Hu
Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption.
In proceedings of the Asian Test Symposium 2021
(ATS 2021)
Shixiong Jing, Qinkun Bao, Pei Wang, Xulong Tang, Dinghao Wu
Characterizing AI Model Inference Applications Running in SGX Environment.
In Proceedings of the 15th International Conference on Networking, Architecture, and Storage
(NAS 2021)
Xinyi Zhang, Yawen Wu, Peipei Zhou, Xulong Tang, Jingtong Hu
Algorithm-Hardware Co-design of Attention Mechanism on FPGA Devices.
In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis
(CODES+ISSS 2021)
Geng Yuan, Peiyan Dong, Mengshu Sun, Wei Niu, Zhengang Li, Yuxuan Cai, Jun Liu, Weiwen Jiang, Xue Lin, Bin Ren, Xulong Tang, Yanzhi Wang
Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.
In proceedings of IEEE 27th Real-Time and Embedded Technology and Applications Symposium
(RTAS 2021)
Weizheng Xu, Youtao Zhang, Xulong Tang
Parallelizing DNN Training on GPUs: Challenges and Opportunities.
In Proceedings of the WWW '21: Companion Proceedings of the Web Conference 2021
(WWW 2021 workshop)
Mahmut Taylan Kandemir, Jihyun Ryoo, Xulong Tang, Mustafa Karakoy
Compiler Support for Near Data Computing.
In proceedings of the 26th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
(PPoPP 2021)
Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang
YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design.
In proceedings of the 35th AAAI Conference on Artificial Intelligence
(AAAI 2021)
Yuxuan Cai, Geng Yuan, Hongjia Li, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang
A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices.
In proceedings of the 35th AAAI Conference on Artificial Intelligence
(AAAI 2021)
Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang
YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design.
In NeurIPS 2020 Workshop on Machine Learning for Autonomous Driving
(NeurIPS 2020 workshop)
Xulong Tang, Ziyu Zhang, Weizheng Xu, Mahmut Taylan Kandemir, Rami Melhem, Jun Yang
Enhancing
Address Translations in Throughput Processors via Compression.
In proceedings of the 29th International Conference on Parallel Architectures and Compilation Techniques
(PACT 2020)
Zhendong Wang, Zihang Jiang, Zhen Wang, Xulong Tang, Cong Liu, Yang Hu
Enabling Latency-aware
Data Initialization for Integrated CPU/GPU Heterogeneous Platform.
published in the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
(TCAD 2020)
Ashutosh Pattnaik, Xulong Tang, Onur Kayiran, Adwait Jog, Asit Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, Chita R. Das
Opportunistic Computing in GPU Architectures.
In proceedings of 46th International Symposium on Computer Architecture
(ISCA 2019)
Xulong Tang, Mahmut Taylan Kandemir, Mustafa Karakoy, Meena Arunachalam
Co-Optimizing Memory-Level Parallelism and Cache-Level Parallelism.
In proceedings of 40th annual ACM SIGPLAN conference on Programming Language Design and Implementation
(PLDI 2019)
Xulong Tang, Ashutosh Pattnaik, Onur Kayiran, Adwait Jog, Mahmut Taylan Kandemir, Chita Das
Quantifying Data Locality in Dynamic Parallelism in GPUs.
In Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS Journal)
(SIGMETRICS 2019)
Xulong Tang, Mahmut Taylan Kandemir, Hui Zhao , Myoungsoo Jung, Mustafa Karakoy
Computing with Near Data.
In Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS Journal)
(SIGMETRICS 2019)
Mustafa Karakoy, Orhan Kislal, Xulong Tang, Mahmut Taylan Kandemir, Meenakshi Arunachalam
Architecture-Aware Approximate Computing.
In Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS Journal)
(SIGMETRICS 2019)
Jihyun Ryoo, Mengran Fan, Xulong Tang, Huaipan Jiang, Meena Arunachalam, Sharada Naveen, Mahmut T. Kandemir
Architecture-Centric Bottleneck Analysis for Deep Neural Network Applications.
In Proceedings of IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)
(HiPC 2019)
Jihyun Ryoo, Orhan Kislal, Xulong Tang, Mahmut T. Kandemir
Quantifying and Optimizing Data Access Parallelism on Manycores.
In proceedings of 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
(MASCOTS 2018)
Orhan Kislal, Jagadish B. Kotra, Xulong Tang, Mahmut T. Kandemir, Myoungsoo Jung
Enhancing Computation-to-Core Assignment with Physical Location Information.
In proceedings of 39th annual ACM SIGPLAN conference on Programming Language Design and Implementation
(PLDI 2018)
Sooraj Puthoor, Xulong Tang, Joseph Gross, Bradford M Beckmann
Oversubscribed Command Queues in GPUs.
In proceedings of the 11th annual General Purpose computing with Graphics Processing Units
(GPGPU 11 @ PPoPP 2018)
Xulong Tang, Orhan Kislal, Mahmut Kandemir, Mustafa Karakoy
Data Movement Aware Computation Partitioning.
In proceedings of The 50th Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO 2017)
Akbar Sharifi, Wei Ding, Diana Guttman, Hui Zhao, Xulong Tang, Mahmut Kandemir, Chita Das
DEMM: a Dynamic Energy-saving mechanism for Multicore.
In proceedings of The 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
(MASCOTS 2017)
Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, Myoungsoo Jung
POSTER: Location-Aware Computation Mapping for Manycore Processors.
In proceedings of The 26th International Conference on Parallel Architectures and Compilation Techniques
(PACT 2017 Poster)
Xulong Tang, Ashutosh Pattnaik, Huaipan Jiang, Onur Kayiran, Adwait Jog, Sreepathi Pai, Mohamed Ibrahim, Mahmut Taylan Kandemir, Chita Das
Controlled Kernel Launch for Dynamic Parallelism in GPUs.
In proceedings of The 23rd International Symposium on High-Performance Computer Architecture
(HPCA 2017)
Xulong Tang, Mahmut Kandemir, Praveen Yedlapalli, Jagadish Kotra
Improving Bank-Level Parallelism for Irregular Applications.
In proceedings of The 49th Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO 2016 Best Paper Nomination (6/61) )
Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut Taylan Kandemir, Onur Mutlu, Chita R. Das
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities,
In proceedings of 25th International Conference on Parallel Architectures and Compilation Techniques
(PACT 2016)
Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, Mahmut Taylan Kandemir, Gabriel H. Loh, Onur Mutlu, Chita R. Das
µC-States: Fine-grained GPU Datapath Power Management,
In proceedings of 25th International Conference on Parallel Architectures and Compilation Techniques
(PACT 2016)
Wei Ding, Xulong Tang, Mahmut Taylan Kandemir, Yuanrui Zhang, Emre Kultursay
Optimizing Off-Chip Accesses in Manycores,
In Proceedings of 36th annual ACM SIGPLAN conference on Programming Language Design and Implementation
(PLDI 2015)
Mahmut Taylan Kandemir, Hui Zhao, Xulong Tang, Mustafa Karaky
Memory Row Reuse Distance and its Role in Optimizing Application Performance,
In proceedings of ACM International Conference on Measurement and Modeling of Computer Systems
(SIGMETRICS 2015)
Xulong Tang, Hong An, Gongjin Sun, Dongrui Fan
A Video Coding Benchmark Suite for Evaluation of Processor
Capability,
In proceedings of the 14th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing
(SNPD13)
Gu Liu, Hong An, Xiaoqiang Li, Wei Zhou, Xuechao Wei, Xulong Tang
FlexBFS: A Parallelism-aware Implementation of Breadth-First Search on GPU,
Accepted as a poster by 17th ACM SIGPLAN Symposium on Principles and Practice
of Parallel Programming
(PPoPP12)