The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second edition
July 2013,
154 pages, (doi:10.2200/S00516ED2V01Y201306CAC024)
Luiz André Barroso Google, Inc. Jimmy Clidaras Google, Inc. Urs Hölzle Google, Inc. Abstract As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today’s WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today’s WSCs on a single board. Notes for the Second Edition After nearly four years of substantial academic and industrial developments in warehouse-scale computing, we are delighted to present our first major update to this lecture. The increased popularity of public clouds has made WSC software techniques relevant to a larger pool of programmers since our first edition. Therefore, we expanded Chapter 2 to reflect our better understanding of WSC software systems and the toolbox of software techniques for WSC programming. In Chapter 3, we added to our coverage of the evolving landscape of wimpy vs. brawny server trade-offs, and we now present an overview of WSC interconnects and storage systems that was promised but lacking in the original edition. Thanks largely to the help of our new co-author, Google Distinguished Engineer Jimmy Clidaras, the material on facility mechanical and power distribution design has been updated and greatly extended (see Chapters 4 and 5). Chapters 6 and 7 have also been revamped significantly. We hope this revised edition continues to meet the needs of educators and professionals in this area. Table of Contents: Acknowledgments / Note to the Reader / Introduction / Workloads and Software Infrastructure / Hardware Building Blocks / Datacenter Basics / Energy and Power Efficiency / Modeling Costs / Dealing with Failures / Closing Remarks / Bibliography / Author Biographies
Cited byWojciech Kabaciński, Janusz Kleban, Marek Michalski, Mariusz Żal. (2016) Strict-sense nonblocking networks with k degrees of freedom. Optical Switching and Networking 22, 18-25. Online publication date: 1-Nov-2016. CrossRef Yogesh Sharma, Bahman Javadi, Weisheng Si, Daniel Sun. (2016) Reliability and energy efficiency in cloud computing systems: Survey and taxonomy. Journal of Network and Computer Applications 74, 66-85. Online publication date: 1-Oct-2016. CrossRef Hui Dou, Yong Qi, Wei Wei, Houbing Song. (2016) A two-time-scale load balancing framework for minimizing electricity bills of Internet Data Centers. Personal and Ubiquitous Computing. Online publication date: 20-Aug-2016. CrossRef Mehiar Dabbagh, Ammar Rayes, Bechir Hamdaoui, Mohsen Guizani. (2016) Peak shaving through optimal energy storage control for data centers. 2016 IEEE International Conference on Communications (ICC), 1-6. CrossRef Catalin Negru, Mariana Mocanu, Valentin Cristea, Stelios Sotiriadis, Nik Bessis. (2016) Analysis of power consumption in heterogeneous virtual machine environments. Soft Computing. Online publication date: 28-Apr-2016. CrossRef Thanos G. Stavropoulos, George Koutitas, Dimitris Vrakas, Efstratios Kontopoulos, Ioannis Vlahavas. (2016) A smart university platform for building energy monitoring and savings. Journal of Ambient Intelligence and Smart Environments 8:3, 301-323. Online publication date: 27-Apr-2016. CrossRef Maria Malik, Avesta Sasan, Rajiv Joshi, Setareh Rafatirah, Houman Homayoun. (2016) Characterizing Hadoop applications on microservers for performance and energy efficiency optimizations. 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 153-154. CrossRef Quan Chen, Hailong Yang, Jason Mars, Lingjia Tang. (2016) Baymax. ACM SIGPLAN Notices 51:4, 681-696. Online publication date: 25-Mar-2016. CrossRef Quan Chen, Hailong Yang, Jason Mars, Lingjia Tang. (2016) Baymax. ACM SIGARCH Computer Architecture News 44:2, 681-696. Online publication date: 25-Mar-2016. CrossRef Songchun Fan, Seyed Majid Zahedi, Benjamin C. Lee. (2016) The Computational Sprinting Game. ACM SIGPLAN Notices 51:4, 561-575. Online publication date: 25-Mar-2016. CrossRef Songchun Fan, Seyed Majid Zahedi, Benjamin C. Lee. (2016) The Computational Sprinting Game. ACM SIGARCH Computer Architecture News 44:2, 561-575. Online publication date: 25-Mar-2016. CrossRef Quan Chen, Hailong Yang, Jason Mars, Lingjia Tang. (2016) Baymax. ACM SIGOPS Operating Systems Review 50:2, 681-696. Online publication date: 25-Mar-2016. CrossRef Songchun Fan, Seyed Majid Zahedi, Benjamin C. Lee. (2016) The Computational Sprinting Game. ACM SIGOPS Operating Systems Review 50:2, 561-575. Online publication date: 25-Mar-2016. CrossRef Jiang Liu, Tao Huang, Yuanming Xin, Jiannan Zhang, F. Richard Yu, Yunjie Liu. (2016) Vlan-reusing: A novel solution for efficient network virtualization. Intelligent Automation & Soft Computing, 1-7. Online publication date: 7-Mar-2016. CrossRef Ji Li, Yanzhi Wang, Xue Lin, Shahin Nazarian, Massoud Pedram. (2016) Negotiation-based resource provisioning and task scheduling algorithm for cloud systems. 2016 17th International Symposium on Quality Electronic Design (ISQED), 338-343. CrossRef Wei Cui, Hanglong Zhan, Bao Li, Hu Wang, Donggang Cao. (2016) Cluster as a Service: A Container Based Cluster Sharing Approach with Multi-user Support. 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE), 111-118. CrossRef Hiroshi Nakao, Yu Yonezawa, Yoshiyasu Nakashima, Fujio Kurokawa. (2016) RCP evaluation of electrolytic capacitor degradation for SMPS failure prediction. 2016 IEEE Applied Power Electronics Conference and Exposition (APEC), 754-758. CrossRef Nikil Dutt, Axel Jantsch, Santanu Sarma. (2016) Toward Smart Embedded Systems. ACM Transactions on Embedded Computing Systems 15:2, 1-27. Online publication date: 17-Feb-2016. CrossRef Santanu Sarma, Tiago Muck, Majid Shoushtari, Abbas BanaiyanMofrad, Nikil Dutt. (2016) Cross-layer virtual/physical sensing and actuation for resilient heterogeneous many-core SoCs. 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 395-402. CrossRef Songping Yu, Nong Xiao, Mingzhu Deng, Yuxuan Xing, Fang Liu, Zhiping Cai, Wei Chen. (2015) WAlloc: An efficient wear-aware allocator for non-volatile main memory. 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), 1-8. CrossRef Nguyen Quang-Hung, Nam Thoai. (2015) EMinRET: Heuristic for Energy-Aware VM Placement with Fixed Intervals and Non-preemption. 2015 International Conference on Advanced Computing and Applications (ACOMP), 98-105. CrossRef Radhika Sukapuram, Gautam Barua. (2015) Enhanced algorithms for consistent network updates. 2015 IEEE Conference on Network Function Virtualization and Software Defined Network (NFV-SDN), 184-190. CrossRef Cun Ji, Shijun Liu, Chenglei Yang, Lei Wu, Li Pan. (2015) IBDP: An Industrial Big Data Ingestion and Analysis Platform and Case Studies. 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI), 223-228. CrossRef Maurizio Salato, Ugo Ghisla. (2015) Optimal power electronic architectures for DC distribution in datacenters. 2015 IEEE First International Conference on DC Microgrids (ICDCM), 245-250. CrossRef Jiuyue Ma, Haibin Wang, Lixin Zhang, Yungang Bao, Xiufeng Sui, Ninghui Sun, Yupeng Li, Zihao Yu, Bowen Huang, Tianni Xu, Zhicheng Yao, Yun Chen. (2015) Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD). ACM SIGPLAN Notices 50:4, 131-143. Online publication date: 14-Mar-2015. CrossRef Jiuyue Ma, Haibin Wang, Lixin Zhang, Yungang Bao, Xiufeng Sui, Ninghui Sun, Yupeng Li, Zihao Yu, Bowen Huang, Tianni Xu, Zhicheng Yao, Yun Chen. (2015) Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD). ACM SIGARCH Computer Architecture News 43:1, 131-143. Online publication date: 14-Mar-2015. CrossRef Keith Jeferry, George Kousiouris, Dimosthenis Kyriazis, Jörn Altmann, Augusto Ciuffoletti, Ilias Maglogiannis, Paolo Nesi, Bojan Suzic, Zhiming Zhao. (2015) Challenges Emerging from Future Cloud Application Scenarios. Procedia Computer Science 68, 227-237. Online publication date: 1-Jan-2015. CrossRef Magnus Själander , Margaret Martonosi , Stefanos Kaxiras . (2014) Power-Efficient Computer Architectures: Recent Advances. Synthesis Lectures on Computer Architecture 9:3, 1-96. Online publication date: 29-Dec-2014. Abstract | PDF (1249 KB) | PDF Plus (1422 KB) | Supplementary Material Yunqi Zhang, Michael A. Laurenzano, Jason Mars, Lingjia Tang. (2014) SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers. 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 406-418. CrossRef Michael A. Laurenzano, Yunqi Zhang, Lingjia Tang, Jason Mars. (2014) Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers. 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 558-570. CrossRef Lavanya Ramapantulu, Bogdan Marius Tudor, Dumitrel Loghin, Trang Vu, Yong Meng Teo. (2014) Modeling the Energy Efficiency of Heterogeneous Clusters. 2014 43rd International Conference on Parallel Processing, 321-330. CrossRef Jorda Polo, Yolanda Becerra, David Carrera, Jordi Torres, Eduard Ayguade, Malgorzata Steinder. (2014) Adaptive MapReduce Scheduling in Shared Environments. 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 61-70. CrossRef
|
|
|