Publications

The materials presented on this page is to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Underlined names are students advised by me.

BibBase https://danielwong.org/publication.bib
generated by bibbase.org
  2024 (1)
[ISPASS'24] Characterizing In-Kernel Observability of Latency-Sensitive Request-level Metrics with eBPF.
Mohammadreza Rezvani, Ali Jahanshahi, & Daniel Wong.
In Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2024. (Best Paper Nominee) to appear

link   bibtex  
  2023 (4)
[HPCA'23] KRISP: Enabling Kernel-wise Right-sizing for Spatial Partitioned GPU Inference Servers.
Marcus Chow, Ali Jahanshahi, & Daniel Wong.
In Proceedings of the 29th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2023. (Acceptance Rate: 25.0%)

[HPCA'23] KRISP: Enabling Kernel-wise Right-sizing for Spatial Partitioned GPU Inference Servers [pdf] paper   link   bibtex  
[IGSC'23] CoFRIS: Coordinated Frequency and Resource Scaling for GPU Inference Servers.
Marcus Chow, & Daniel Wong.
In Proceedings of the 14th International Green and Sustainable Computing Conference (IGSC), 2023.
[IGSC'23] CoFRIS: Coordinated Frequency and Resource Scaling for GPU Inference Servers [pdf] paper   link   bibtex  
[IGSC'23] WattWiser: Power Resource-Efficient Scheduling for Multi-Model Multi-GPU Inference Servers.
Ali Jahanshahi, Mohammadreza Rezvani, & Daniel Wong.
In Proceedings of the 14th International Green and Sustainable Computing Conference (IGSC), 2023.
[IGSC'23] WattWiser: Power Resource-Efficient Scheduling for Multi-Model Multi-GPU Inference Servers [pdf] paper   link   bibtex  
[AI4Dev'23] VSCuda: LLM based CUDA extension for Visual Studio Code.
Brian Chen, Nafis Mustakin, Alvin Hoang, Sakib Fuad, & Daniel Wong.
In First Workshop on AI Assisted Software Development for HPC (AI4Dev), 2023.
[AI4Dev'23] VSCuda: LLM based CUDA extension for Visual Studio Code [link] paper   link   bibtex  
  2022 (3)
[ACM TACO'22] PowerMorph: QoS-Aware Server Power Reshaping For Data Center Regulation Service.
Ali Jahanshahi, Nanpeng Yu, & Daniel Wong.
ACM Transactions on Architecture and Code Optimization (TACO), Volume 19(Issue 3): 1–27. September 2022.

[ACM TACO'22] PowerMorph: QoS-Aware Server Power Reshaping For Data Center Regulation Service [pdf] paper   link   bibtex  
[ISPASS'22] GPUCalorie: Floorplan Estimation for GPU Thermal Evaluation.
Marcus Chow, Ali Jahanshahi, Ana Cardenas Beltran, Sheldon Tan, & Daniel Wong.
In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2022. (Poster)

link   bibtex  
[GPGPU'22] Scaleserve: A Scalable Multi-GPU Machine Learning Inference System And Benchmarking Suite.
Ali Jahanshahi, Marcus Chow, & Daniel Wong.
In Proceedings of the 14th Workshop on General Purpose Processing Using GPU (GPGPU), 2022. (Short paper)

link   bibtex  
  2021 (7)
[ISCA'21] BlockMaestro: Enabling Programmer-Transparent Task-Based Execution In GPU Systems.
AmirAli Abdolrashidi, Hodjat Asghari Esfeden, Ali Jahanshahi, Kaustubh Singh, Nael Abu-Ghazaleh, & Daniel Wong.
In Proceedings of the 48th ACM/IEEE International Symposium on Computer Architecture (ISCA), 2021. (Acceptance Rate: 18.7%)

[ISCA'21] BlockMaestro: Enabling Programmer-Transparent Task-Based Execution In GPU Systems [pdf] paper   link   bibtex  
[SC'21] MAPA: Multi-Accelerator Pattern Allocation Policy For Multi-Tenant GPU Servers.
Kiran Ranganath, Joshua D Suetterlein, Joseph B Manzano, Shuaiwen Leon Song, & Daniel Wong.
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021. (Acceptance Rate: 26.8%)

[SC'21] MAPA: Multi-Accelerator Pattern Allocation Policy For Multi-Tenant GPU Servers [pdf] paper   link   bibtex  
[ACM TACO'21] PAVER: Locality Graph-Based Thread Block Scheduling For GPUs.
Devashree Tripathy, Amirali Abdolrashidi, Laxmi Narayan Bhuyan, Liang Zhou, & Daniel Wong.
ACM Transactions on Architecture and Code Optimization (TACO), Volume 18(Issue 3): 1–26. June 2021.

[ACM TACO'21] PAVER: Locality Graph-Based Thread Block Scheduling For GPUs [pdf] paper   link   bibtex  
[NAS'21] LocalityGuru: A Ptx Analyzer For Extracting Thread Block-Level Locality In GPGPUs.
Devashree Tripathy, Amirali Abdolrashidi, Quan Fan, Daniel Wong, & Manoranjan Satpathy.
In Proceedings of the 15th IEEE International Conference on Networking, Architecture and Storage (NAS), 2021.
link   bibtex  
[NAS'21] ICAP: Designing Inrush Current Aware Power Gating Switch For GPGPU.
Hadi Zamani, Devashree Tripathy, Ali Jahanshahi, & Daniel Wong.
In Proceedings of the 15th IEEE International Conference on Networking, Architecture and Storage (NAS), 2021.
link   bibtex  
[LCPC'21] LC-MEMENTO: A Memory Model for Accelerated Architectures.
Kiran Ranganath, Jesun Firoz, Joshua Suetterlein, Joseph Manzano, Andres Marquez, Mark Raugas, & Daniel Wong.
In Languages and Compilers for Parallel Computing (LCPC), 2021.
link   bibtex  
[RSDHA'21] Energy Efficient Task Graph Execution Using Compute Unit Masking In GPUs.
Marcus Chow, Kiran Ranganath, Robert Lerias, Mika Shanela Carodan, & Daniel Wong.
In Workshop on Redefining Scalability for Diversely Heterogeneous Architectures (RSDHA), 2021.
link   bibtex  
  2020 (3)
[MICRO'20] BOW: Breathing Operand Windows To Exploit Bypassing In GPUs.
Hodjat Asghari Esfeden, Amirali Abdolrashidi, Shafiur Rahman, Daniel Wong, & Nael Abu-Ghazaleh.
In Proceedings of the 53rd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020. (Acceptance Rate: 19.4%)

[MICRO'20] BOW: Breathing Operand Windows To Exploit Bypassing In GPUs [pdf] paper   link   bibtex  
[IEEE CAL'20] GPU-NEST: Characterizing Energy Efficiency Of Multi-GPU Inference Servers.
Ali Jahanshahi, Hadi Zamani Sabzi, Chester Lau, & Daniel Wong.
IEEE Computer Architecture Letters, Volume 19(Issue 2): 139–142. 2020.
link   bibtex  
[FCCM'20] High-Performance Parallel Radix Sort On FPGA.
Bashar Romanous, Mohammadreza Rezvani, Junjie Huang, Daniel Wong, Evangelos E Papalexakis, Vassilis J Tsotras, & Walid Najjar.
In Proceedings of the 28th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2020. (poster)

link   bibtex  
  2019 (6)
[ASPLOS'19] CORF: Coalescing Operand Register File For GPUs.
Hodjat Asghari Esfeden, Farzad Khorasani, Hyeran Jeon, Daniel Wong, & Nael Abu-Ghazaleh.
In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019. (Acceptance Rate: 21.1%)

[ASPLOS'19] CORF: Coalescing Operand Register File For GPUs [pdf] paper   link   bibtex  
[HPCA'19] μDPM: Dynamic Power Management For The Microsecond Era.
Chih-Hsun Chou, Laxmi N Bhuyan, & Daniel Wong.
In Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019. (Acceptance Rate: 19.7%)

[HPCA'19] μDPM: Dynamic Power Management For The Microsecond Era [pdf] paper   [HPCA'19] μDPM: Dynamic Power Management For The Microsecond Era [pdf] slides   link   bibtex  
[IEEE CAL'19] Speeding Up Collective Communications Through Inter-Gpu Re-Routing.
Kiran Ranganath, AmirAli Abdolrashidi, Shuaiwen Leon Song, & Daniel Wong.
IEEE Computer Architecture Letters, Volume 18(Issue 2): 128–131. 2019.
[IEEE CAL'19] Speeding Up Collective Communications Through Inter-Gpu Re-Routing [pdf] paper   link   bibtex  
[IEEE CAL'19] Locality-Aware GPU Register File.
Hyeran Jeon, Hodjat Asghari Esfeden, Nael Abu-Ghazaleh, Daniel Wong, & Sindhuja Elango.
IEEE Computer Architecture Letters, Volume 18(Issue 2): 153–156. 2019.
link   bibtex  
[Applied Energy'19] Frequency Regulation Service Provision In Data Center With Computational Flexibility.
Wei Wang, Amirali Abdolrashidi, Nanpeng Yu, & Daniel Wong.
Applied Energy, Volume 251. October 2019. (IF: 8.4)

[Applied Energy'19] Frequency Regulation Service Provision In Data Center With Computational Flexibility [pdf] paper   [Applied Energy'19] Frequency Regulation Service Provision In Data Center With Computational Flexibility [link] link   link   bibtex  
[SMACD'19] Long-Term Reliability Management For Multitasking GPGPUs.
Zeyu Sun, Taeyoung Kim, Marcus Chow, Shaoyi Peng, Han Zhou, Hyoseung Kim, Daniel Wong, & Sheldon X-D Tan.
In Proceedings of the 16th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), 2019.
link   bibtex  
  2018 (2)
[IPDPS'18] Joint Server And Network Energy Saving In Data Centers For Latency-Sensitive Applications.
Liang Zhou, Chih-Hsun Chou, Laxmi N Bhuyan, KK Ramakrishnan, & Daniel Wong.
In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2018. (Acceptance Rate: 24.5%)

[IPDPS'18] Joint Server And Network Energy Saving In Data Centers For Latency-Sensitive Applications [pdf] paper   link   bibtex  
[ISLPED'18] Load-Triggered Warp Approximation On GPU.
Zhenhong Liu, Daniel Wong, & Nam Sung Kim.
In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), 2018. (Acceptance Rate: 23.3%)

link   bibtex  
  2017 (1)
[MICRO'17] Wireframe: Supporting Data-Dependent Parallelism Through Dependency Graph Execution In GPUs.
AmirAli Abdolrashidi, Devashree Tripathy, Mehmet Esat Belviranli, Laxmi Narayan Bhuyan, & Daniel Wong.
In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017. (Acceptance Rate: 18.6%)

[MICRO'17] Wireframe: Supporting Data-Dependent Parallelism Through Dependency Graph Execution In GPUs [pdf] paper   [MICRO'17] Wireframe: Supporting Data-Dependent Parallelism Through Dependency Graph Execution In GPUs [pdf] slides   link   bibtex  
  2016 (6)
[ISCA'16] Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers.
Daniel Wong.
In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (ISCA), 2016. (Acceptance Rate: 19.5%)

[ISCA'16] Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers [pdf] paper   [ISCA'16] Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers [pdf] slides   link   bibtex  
[HPCA'16] Approximating Warps with Intra-Warp Operand Value Similarity.
Daniel Wong, Nam Sung Kim, & Murali Annavaram.
In Proceedings of the 22nd IEEE International Symposium on High Performance Computer Architecture (HPCA), 2016. (Acceptance Rate: 22%)

[HPCA'16] Approximating Warps with Intra-Warp Operand Value Similarity [pdf] paper   [HPCA'16] Approximating Warps with Intra-Warp Operand Value Similarity [pptx] slides   link   bibtex  
[ICS'16] Origami: Folding Warps For Energy Efficient GPUs.
Mohammad Abdel-Majeed, Daniel Wong, Justin Kuang, & Murali Annavaram.
In Proceedings of the ACM International Conference on Supercomputing (ICS), 2016. (Acceptance Rate: 24%)

[ICS'16] Origami: Folding Warps For Energy Efficient GPUs [pdf] paper   [ICS'16] Origami: Folding Warps For Energy Efficient GPUs [pdf] slides   link   bibtex  
[ISLPED'16] Dynsleep: Fine-Grained Power Management For A Latency-Critical Data Center Application.
Chih-Hsun Chou, Daniel Wong, & Laxmi N Bhuyan.
In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2016. (Acceptance Rate: 23%)

[ISLPED'16] Dynsleep: Fine-Grained Power Management For A Latency-Critical Data Center Application [pptx] slides   link   bibtex  
[DAC'16] Invited - Cross-Layer Modeling And Optimization For Electromigration Induced Reliability.
Taeyoung Kim, Zeyu Sun, Chase Cook, Hengyang Zhao, Ruiwen Li, Daniel Wong, & Sheldon X-D Tan.
In Proceedings of the 53rd Annual Design Automation Conference (DAC)), 2016.
link   bibtex  
[SBAC-PAD'16] STOMP: Statistical Techniques For Optimizing And Modeling Performance Of Blocked Sparse Matrix Vector Multiplication.
Steena Monteiro, Forrest Iandola, & Daniel Wong.
In Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2016.
link   bibtex  
  2015 (1)
[IISWC'15] A Retrospective Look Back On The Road Towards Energy Proportionality.
Daniel Wong, Julia Chen, & Murali Annavaram.
In Proceedings of the 2015 IEEE International Symposium on Workload Characterization (IISWC), 2015. (Short paper with presentation)

[IISWC'15] A Retrospective Look Back On The Road Towards Energy Proportionality [pdf] paper   link   bibtex  
  2014 (1)
[HPCA'14] Implications of High Energy Proportional Servers on Cluster-Wide Energy Proportionality.
Daniel Wong, & Murali Annavaram.
In Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2014. (Acceptance Rate: 25.6%)

[HPCA'14] Implications of High Energy Proportional Servers on Cluster-Wide Energy Proportionality [pdf] paper   [HPCA'14] Implications of High Energy Proportional Servers on Cluster-Wide Energy Proportionality [pptx] slides   link   bibtex  
  2013 (2)
[Top Picks'13] Scaling The Energy Proportionality Wall With Knightshift.
Daniel Wong, & Murali Annavaram.
IEEE Micro's "Top Picks from the Computer Architecture Conferences of 2012", Volume 33(Issue 3): 28–37. 2013.
link   bibtex  
[MICRO'13] Warped Gates: Gating Aware Scheduling and Power Gating For GPGPUs.
Mohammad Abdel-Majeed*, Daniel Wong*, & Murali Annavaram.
In Proceedings of the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2013. (Acceptance Rate: 16.3%)
* Authors contributed equally

[MICRO'13] Warped Gates: Gating Aware Scheduling and Power Gating For GPGPUs [pdf] paper   [MICRO'13] Warped Gates: Gating Aware Scheduling and Power Gating For GPGPUs [pptx] slides   link   bibtex  
  2012 (2)
[MICRO'12] KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity.
Daniel Wong, & Murali Annavaram.
In Proceedings of the 45th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2012. (Acceptance Rate: 17.5%)
Selected as 1 of 11 IEEE Micro Top Pick in Computer Architecture 2013

[MICRO'12] KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity [pdf] paper   [MICRO'12] KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity [pptx] slides   link   bibtex  
[WEED'12] Evaluating A Prototype KnightShift-enabled server.
Daniel Wong, & Murali Annavaram.
In Workshop on Energy-Efficient Design (WEED), 2012.
[WEED'12] Evaluating A Prototype KnightShift-enabled server [pdf] paper   link   bibtex  
  2010 (4)
[MICRO'10] Adaptive and Speculative Slack Simulations of CMPs on CMPs.
Jainwei Chen, Lakshmi Kumar Dabbiru, Daniel Wong, Murali Annavaram, & Michel Dubois.
In Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2010. (Acceptance Rate: 17.4%)

[MICRO'10] Adaptive and Speculative Slack Simulations of CMPs on CMPs [pdf] paper   link   bibtex  
[FDG'10] Implementing Games On Pinball Machines.
Daniel Wong, Darren Earl, Fred Zyda, Ryan Zink, Sven Koenig, Allen Pan, Selby Shlosberg, Jaspreet Singh, & Nathan Sturtevant.
In Proceedings of the Fifth International Conference on the Foundations of Digital Games (FDG), 2010. (Acceptance Rate: 34%)

link   bibtex  
[AAAI Spring'10] Teaching Robotics And Computer Science With Pinball Machines.
Daniel Wong, Darren Earl, Fred Zyda, & Sven Koenig.
In Papers of the 2010 AAAI Spring Symposium Series, 2010.
link   bibtex  
[EAAI'10] Teaching Artificial Intelligence and Robotics Via Games.
Daniel Wong, Ryan Zink, & Sven Koenig.
In Proceedings of the First AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI), 2010.
link   bibtex  

Non-Referred Publications

Joseph Bungo, Daniel Wong, Bringing GPU Accelerated Computing and Deep Learning to the Classroom, Journal of Computational Science Education (JOCSE), Volume 12, Issue 2. Presented in Seventh SC Workshop on Best Practices for HPC Training and Education (BPHTE), 2020.

Daniel Wong, S. Lloyd, M. Gokhale, A Memory-mapped Approach to Checkpointing. Technical Report LLNL-TR-635611, Lawrence Livermore National Laboratory (LLNL), Livermore, CA, 2013.

I. Karlin, A. Bhatele, B. Chamberlain, J. Cohen, Z. Devito, M. Gokhale, R. Haque, R. Hornung, J. Keasler, D. Laney, E. Luke, S. Lloyd, J. McGraw, R. Neely, D. Richards, M. Schulz, C.H. Still, F. Wang, Daniel Wong, LULESH Programming Model and Performance Ports Overview. Technical Report LLNL-TR-608824, Lawrence Livermore National Laboratory (LLNL), Livermore, CA, 2012.

Daniel Wong, Murali Annavaram, Scalable System-level Active Low Power Mode with Bounded Latency. Technical Report CENG-2012-5, Department of Electrical Engineering, University of Southern California, Los Angeles (California), 2012.

Daniel Wong, Murali Annavaram, Enhancing Server Energy Efficiency by Shifting Light Burden to an Assistant. 2nd Annual Ming Hsiegh Department of Electrical Engineering Research Festival, 2012. Honorable Mention Poster Award Also presented at Sixth USC-Tsinghua Symposium on Green Technology and Energy Informatics

Daniel Wong, R. Zink and S. Koenig, Teaching Artificial Intelligence and Robotics via Games [Poster Abstract], Proceedings of the AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI), 2010

Daniel Wong, M. Gokhale, Real-World Performance of Document-Similarity Web Attack Classifier In Embedded Hardware. LLNL Summer Intern Poster Symposium, 2010.

John O'Hollaren, Vairavan Laxman, Noah Olsman, Michael Benzimra, Daniel Wong, and Nielson Bernardo. SeaBee III. Technical report, University of Southern California Competition Robotics (USCR), University of Southern California, 2010.

Daniel Wong, D. Earl, F. Zyda and S. Koenig. Programming Pinball Machines for Fun and Education. Technical Report 08-901, Department of Computer Science, University of Southern California, Los Angeles (California), 2008.

Press

GPU Computing 101: Why University Educators Are Pulling NVIDIA Teaching Kits into Their Classrooms, Nvidia, https://blogs.nvidia.com/blog/2019/05/23/nvidia-teaching-kits/, 2019

Interview, Nvidia's Turing Chip Opens Door to New Virtual Reality Realm, ECT News Network, https://www.ectnews.com/story/85506.html, 2018

Daniel Wong, S. Koenig, PinHorse: Teaching Old Pinball Machines New Tricks, https://www.pinballnews.com/learn/pinhorse/, 2009