Heterogeneous Systems Seminar

Overview:

The seminar covers heterogeneous systems, those that make use of different types of computing (GPUs, FPGA, ASICs, etc.) and/or memory (NVM/SCM). Our focus will be the systems and architectures that use these devices. The objective of this course is to familiarize students with important topics in heterogeneous systems, past, present, and future: the devices, the architectures, and their uses.

Format:

The seminar consists of student presentations of papers selected from a provided list. Depending on the number of students enrolled, the presentations will be done individually or in teams of two. Students will be allotted a 45 minute time slot consisting of a 30 minute presentation and 15 minutes for questions.

Grading:

Grading is based upon the quality of the presentation, the coverage of the paper including necessary background and follow-on work, and the ability to understand and critique the paper and technology. Because discussion is an integral part of the seminar format, students are allowed only one unexcused absence during the course of the semester.

Hours:

The Spring 2024 seminar is on Tuesday between 16:15-18:00.

Currently, it will occur in LFW C4.

Papers:

Nonvolatile Memory

Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. "Scalable High Performance Main Memory System Using Phase-Change Memory Technology". In: SIGARCH Comput. Archit. News 37.3 (June 2009) url: external pagehttps://doi.org/10.1145/1555815.1555760
Joy Arulraj and Andrew Pavlo. "How to Build a Non-Volatile Memory Database Management System". In: Proceedings of the 2017 ACM International Conference on Management of Data. SIGMOD '17. Chicago, Illinois, USA: Association for Computing Machinery, 2017, url: external pagehttps://doi.org/10.1145/3035918.3054780
Assaf Eisenman et al. "Reducing DRAM Footprint with NVM in Facebook". In: Proceedings of the Thirteenth EuroSys Conference. EuroSys '18. Porto, Portugal: Association for Computing Machinery, 2018. url: external pagehttps://doi.org/10.1145/3190508.3190524
Amanda Raybuck et al. "HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM". In: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. SOSP '21. Virtual Event, Germany: Association for Computing Machinery, 2021, url: external pagehttps://doi.org/10.1145/3477132.3483550
Wang et al, "SEPH: Scalable, Efficient, and Predictable Hashing on Persistent Memory" OSDI'23, url: external pagehttps://www.usenix.org/conference/osdi23/presentation/wang-chao

Disaggregated Memory

Huaicheng Li, et al. "Pond: CXL-Based Memory Pooling Systems for Cloud Platforms". In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 574–587. external pagehttps://doi.org/10.1145/3575693.3578835
Zhiyuan Guo et al. "Clio: a hardware-software co-designed disaggregated memory system". In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '22). Association for Computing Machinery, New York, NY, USA, 417–433. external pagehttps://doi.org/10.1145/3503222.3507762
Hasan Al Maruf, et al. "TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory". In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 742–755. external pagehttps://doi.org/10.1145/3582016.3582063
In-Memory Processing/Near-memory processing
Pati et al. "T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives" to appear in ASPLOS 2024 external pagehttps://arxiv.org/abs/2401.16677

GPUs:

John Nickolls et al. "Scalable Parallel Programming with CUDA: Is CUDA the Parallel Programming Model That Application Developers Have Been Waiting For?" In: Queue 6.2 (Mar. 2008), pp. 40{53. issn: 1542-7730. doi: 10.1145/1365490.1365500. url: external pagehttps://doi.org/10.1145/1365490.1365500
Victor W. Lee et al. "Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU". In: Proceedings of the 37th Annual International Symposium on Computer Architecture. ISCA '10. Saint-Malo, France: Association for Computing Machinery, 2010, url: external pagehttps://doi.org/10.1145/1815961.1816021
Lin Shi et al. "vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines". In: IEEE Transactions on Computers 61.6 (2012), url: external pagehttps://ieeexplore.ieee.org/document/5928326
Anil Shanbhag, Samuel Madden, and Xiangyao Yu. "A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics". In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. SIGMOD '20. Portland, OR, USA: Association for Computing Machinery, 2020 url: external pagehttps://doi.org/10.1145/3318464.3380595

FPGAs:

Adrian M. Caul eld et al. "A cloud-scale acceleration architecture". In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2016 external pagehttps://www.doi.org/10.1109/MICRO.2016.7783710
David Sidler et al. "DoppioDB: A Hardware Accelerated Database". In: Proceedings of the 2017 ACM International Conference on Management of Data. SIGMOD '17. Chicago, Illinois, USA: Association for Computing Machinery, 2017, url: external pagehttps://doi.org/10.1145/3035918.3058746
Young-Kyu Choi et al. "In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms". In: ACM Trans. Recon gurable Technol. Syst. 12.1 (Feb. 2019). url: external pagehttps://doi.org/10.1145/3294054
Jiacheng Ma et al. "A Hypervisor for Shared-Memory FPGA Platforms". In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, USA: Association for Computing Machinery, 2020, url: external pagehttps://doi.org/10.1145/3373376.3378482

Analog Computing:

G.E.R. Cowan, R.C. Melville, and Y.P. Tsividis. "A VLSI analog computer/digital computer accelerator". In: IEEE Journal of Solid-State Circuits 41.1 (2006), pp. 42-53. external pagehttps://www.doi.org/10.1109/JSSC.2005.858618
Suma George et al. "A Programmable and Con gurable Mixed-Mode FPAA SoC". In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24.6 (2016), doi: external pagehttps://www.doi.org/10.1109/TVLSI.2015.2504119
Wilfried Haensch, Tayfun Gokmen, and Ruchir Puri. "The Next Generation of Deep Learning Hardware: Analog Computing". In: Proceedings of the IEEE 107.1 (2019), doi: external pagehttps://www.doi.org/10.1109/JPROC.2018.2871057

In-Network Computing:

Firestone, Daniel, et al. "Azure accelerated networking: Smartnics in the public cloud." 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 2018. external pagehttps://www.usenix.org/conference/nsdi18/presentation/firestone
Jongyul Kim, et al. LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP '21). external pagehttps://doi.org/10.1145/3477132.3483565
Lao, ChonLam, et al. "ATP: In-network Aggregation for Multi-tenant Learning." NSDI. Vol. 21. 2021. external pagehttps://www.usenix.org/conference/nsdi21/presentation/lao
Yongchao He, et al, "A Generic Service to Provide In-Network Aggregation for Key-Value Streams". In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 33–47. external pagehttps://doi.org/10.1145/3575693.3575708

Custom Accelerators:

Song Han et al. "EIE: Effcient Inference Engine on Compressed Deep Neural Network". In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 2016, pp. 243-254. doi: https://ieeexplore.ieee.org/document/7551397
Norman P. Jouppi et al. "In-Datacenter Performance Analysis of a Tensor Processing Unit". In: SIGARCH Comput. Archit. News 45.2 (June 2017), url: https://doi.org/10.1145/3140659.3080246
Norm Jouppi, et al. "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings". In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA '23). Association for Computing Machinery, New York, NY, USA, Article 82, 1–14. external pagehttps://doi.org/10.1145/3579371.3589350
Yatish Turakhia, Gill Bejerano, and William J. Dally. "Darwin: A Genomics Co-Processor Provides up to 15,000X Acceleration on Long Read Assembly". In: SIGPLAN Not. 53.2 (Mar. 2018), url: external pagehttps://doi.org/10.1145/3296957.3173193
Parthasarathy Ranganathan et al. \Warehouse-Scale Video Acceleration: Co-Design and Deployment in the Wild". In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS 2021. Virtual, USA: Association for Computing Machinery, 2021, url: external pagehttps://doi.org/10.1145/3445814.3446723
Yakun Sophia Shao, et al. "Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture". In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, New York, NY, USA, 14–27. external pagehttps://doi.org/10.1145/3352460.3358302

Wild Card

Zaruba, Florian, Fabian Schuiki, and Luca Benini. "Manticore: A 4096-core RISC-V chiplet architecture for ultraefficient floating-point computing." IEEE Micro 41.2 (2020) url: external pagehttps://ieeexplore.ieee.org/abstract/document/9296802
external page
Jennifer Switzer, et. "Junkyard Computing: Repurposing Discarded Smartphones to Minimize Carbon". In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 400–412. external pagehttps://doi.org/10.1145/3575693.3575710

Contact

Dr. Michael Joseph Giardino

Lecturer at the Department of Computer Science

Institut für Computing Platforms
Stampfenbachstrasse 114
8092 Zürich
Switzerland