Building Big Data Processing Systems under Scale-Out Computing Model

Size: px
Start display at page:

Download "Building Big Data Processing Systems under Scale-Out Computing Model"

Transcription

1 Keynote I Building Big Data Processing Systems under Scale-Out Computing Model Xiaodong Zhang Robert M. Critchfield Professor in Engineering Department of Computer Science and Engineering The Ohio State University, USA zhang@cse.ohio-state.edu We have entered a data-driven decision making era in almost all the applications of the society. From a system perspective, an increasingly high volume of data has the following implications: (1) Conventional database systems, including parallel database systems, are not designed to such a big volume of data, demanding new system infrastructure. (2) Big data users from many application fields require costeffective solutions for their analytics because conventional data processing solutions are not scalable and affordable. (3) System designers and practitioners highly demand various new software tools for big data processing and analytics. (4) Computing paradigm for data processing has been shifted from a scale-up model for high performance to a scale-out model for high throughput as the main role of computers becomes data centers. I will discuss how system community addresses the above mentioned issues by presenting a case study on major technical advancements in Apache Hive, which has been widely adopted by many organizations for various big data analytics applications. Closely working with many users and organizations, we have identified several shortcomings of the early version of Hive in its data storage structure, query planning, and query execution. I will present a community-based effort and show how academic research lays a foundation for Hive to improve its daily operations in production systems. Xiaodong Zhang is the Robert M. Critchfield Professor in Engineering and Chair of the Computer Science and Engineering Department at the Ohio State University. His research interests focus on data management in computer, networking and distributed systems. He has made strong efforts to transfer his academic research into advanced technology to impact production systems. He received his Ph.D. in Computer Science from University of Colorado at Boulder, where he received Distinguished Engineering Alumni Award in He is a Fellow of the ACM and a Fellow of IEEE. xxxviii

2 Keynote II Cloud-Based Systems of Insight Hui Lei Senior Manager Cloud Platform Technologies Department IBM Thomas J. Watson Research Center, USA Cloud computing has evolved rapidly in terms of both enabling technology and enterprise uptake. The initial focus for the cloud was on the reduction of capital expense and operational cost, leveraging the virtualization of infrastructure resources. More recently, the advent of techniques like DevOps and software-defined systems has offered significant improvement in business agility and time to value. The latest trend in cloud computing involves the cloud emerging as the platform for business innovation and industry transformation, by way of Systems of Insight. Systems of Insight are high-value cloud workloads that integrate born-on-the-cloud systems of engagement and traditional enterprise systems of record, in order to generate key business insight and drive competitive differentiation. In this talk, I will discuss several examples of Systems of Insight developed at IBM Research, including novel business solutions that make use of genomic analytics, social analytics, and telematics analytics. I will also describe a common platform for enabling Systems-of-Insight solutions and automating solution management throughout the lifecycle of development, deployment, and delivery. Finally, I will outline the research challenges that need to be addressed to realize the full potential of Systems of Insight. Hui Lei is Senior Manager of the Cloud Platform Technologies Department at the IBM T. J. Watson Research Center, and leads IBM's worldwide research strategy in cloud infrastructure services. He is a member of the IBM Academy of Technology, and has assumed various management, strategy, and technical leadership roles at IBM Research. His broad research interest spans cloud computing, mobile computing, service-oriented computing, and business process management. His work has impacted numerous IBM software and services products, and has received extensive media coverage. Dr. Lei is Chair of IEEE Computer Society s Technical Committee on Business Informatics and Systems. He has participated in many international conferences as a General Chair, Program Chair, or keynote speaker. He received a PhD in Computer Science from Columbia University. xxxix

3 Keynote III Scalable HPC System Design and Application Yutong Lu Professor School of Computer Science National University of Defense Technology, China Since scalability is one of the major challenges for advanced HPC systems in the post-petascale and exascale era, innovative integrated technology designs are needed for new architecture as well as associated software stacks. We need to explore the capability of CPU, accelerator, interconnection, I/O storage system, and till whole system. This talk will discuss the way of scalability-centric HPC system design related to the computation, communication, data procession, and fault tolerance. The experiences on extending HPC system to Bigdata, the design and application of Tianhe-2 system, will also be given. In general, a co-design approach should be followed throughout the research and development activities to deliver a whole system for scalable computing, to support the large-scale domain applications efficiently. Professor Yutong Lu is the Director of the System Software Laboratory, School of Computer Science, National University of Defense Technology (NUDT), Changsha, China. She is also a professor in the State Key Laboratory of High Performance Computing, China. She got her B.S, M.S, and PhD degrees from the NUDT. Her extensive research and development experience has spanned several generations of domestic supercomputers in China. During this period, Prof. Lu was the Director Designer for the Tianhe- 1A and Tianhe-2 systems both of which have been internationally recognized as the top-ranked supercomputing system worldwide in respectively November of 2010 and June of Her continuing research interests include parallel operating systems (OS), high speed communications, global file systems, and advanced programming environments. xl

4 Keynote IV Axiomatic, Economic, and Strategic Models of Cloud Computing Joe Weinman Chairman IEEE Intercloud Testbed Executive Committee There are numerous ways to view the cloud. Joe Weinman reviews three major perspectives he has developed over the past few years. The first is a highly formal model of the cloud or Intercloud based on set theory, graph theory, measure spaces, metric spaces, and function spaces as an abstract distributed computing architecture with a commercial pricing model incorporated. As such, it can be viewed as a new model of computing, supplanting and extending the Turing Machine and Petri Net models. The second perspective is economic, where effects such as statistical multiplexing, reductions in penalty costs due to over or under capacity, inverse square root relationships between service node investments and latency reduction, and game theoretic dominance of pay-per-use pricing models may be explored. A third perspective is strategic: businesses can use the cloud and related technologies to evolve operational excellence to include information excellence; product leadership to cloud-connected solutions; customer intimacy to collective intimacy; and accelerate innovation. Joe Weinman is the author of Cloudonomics: The Business Value of Cloud Computing, available in English and Chinese (Wiley, 2012 and PTPress, 2014); a contributor to a new book on cloud computing available from the MIT Press in 2015; and the author of the forthcoming book Digital Disciplines (Wiley CIO, 2015). He is the chairman of the IEEE Intercloud Testbed executive committee, a frequent global keynote speaker, and a GigaOm Research Analyst. He has been awarded 21 U.S. and international patents. He has a B.S. and M.S. in Computer Science from Cornell University and the University of Wisconsin-Madison, respectively, and has completed executive education at the International Institute for Management Development in Lausanne. xli

5 Keynote V Big Data and Cloud Technologies Marcel Kunze Professor Steinbuch Centre for Computing Karlsruhe Institute of Technology, Germany marcel.kunze@kit.edu The speech addresses the technical foundations and non-technical framework of Big Data. A new era of data analytics promises tremendous value on the basis of cloud computing technology. Can we perform predictive analytics in real-time? Can our models scale to rapidly growing data? The "Smart Data Innovation Lab" at KIT addresses these challenges by supporting R&D projects to be carried out in close cooperation between industry and science ( Some practical examples as well as open research questions are discussed. Heading the research group cloud computing at Karlsruhe Institute of Technology (KIT), Dr. Kunze performs systems research in the area of dynamic and scalable web based services. He is committed to R&D in the field of Big Data analytics and system development for cloud computing. Dr. Kunze received a PhD in Physics at Karlsruhe University and finished a habilitation thesis on artificial neural systems at Bochum University. Since then he joined SLAC / Stanford University to investigate and further develop the Grid Computing paradigm for distributed processing of massive particle physics data. As an associate professor he was teaching particle physics, distributed and parallel systems, and software design. In 2002 Dr. Kunze joined KIT as a department head to foster national and international projects like LHC Computing Grid and the German D-Grid initiative. xlii