Grids and High Performance Distributed Computing. Andrew Chien March 31, 2004 CSE225, Spring Course Information

Size: px
Start display at page:

Download "Grids and High Performance Distributed Computing. Andrew Chien March 31, 2004 CSE225, Spring Course Information"

Transcription

1 Grids and High Performance Distributed Computing Andrew Chien March 31, 2004 CSE225, Spring 2004 Course Information Course Instructor: Andrew Chien, Course Meetings: WF pm in HSS2305B» Except for next week: Monday at a location TBD Course web site: Handouts (see the web) Course Information Course Schedule Course reading list (initial) & the reading materials! Course Project

2 CSE225 Course Work Read and Discuss Assigned Papers» Will be on the course web site (limited release)» Attend lectures and contribute to the discussion Homework Assignments (~4) which will delve deeper into the model problems Do a Great Course Project» Plan a great course project» Do a great course project, and present it to the class Project Planning 1) Formation of Groups and Initial Project Definition (4/2&4/5) 2) Initial Project Plans, 5 pages, (due 4/12) 3) Review Project Plans w/ Professor Chien (4/14, 4/16) 4) Final Project Plans, 15 pages, (4/23), should include in some detail 5) Midterm Project Checkup 6) Project Presentations 7) Project Final Reports => Begin discussions today!

3 Topics I Resource Description, Selection, and Binding: Using realistic resource data and several synthetic distributed application models, perform simulation experiments to explore the efficacy of several approaches. Open Resource Sharing: Compare the four basic models for resource sharing cycle stealing, batch scheduling, and slicing. Perform an in-depth literature study of the merits and capabilities of each of these approaches locally and in federated systems. Build an experimental framework which allows experiments with these approaches. Topics II Application-driven Evaluation of Grid infrastructures: From the perspective of an important computational science application (e.g. Climate modeling, Protein Folding, Toxic Chemical diffusion, etc.), analyze the capabilities of current and future grid hardware infrastructures and technologies. With application experts who are well versed in the computational issues (we have several such volunteers), develop a performance model and simulation which includes a distributed application architecture, a resource description used to acquire resources, performance models for each element, and scaling characteristics. Use this simulation infrastructure to evaluate achievable performance of Grid deployments of these applications.

4 Topics III Data-intensive applications: Traditional models of distributed filesystems and databases presume lowbandwidth networks and federation and sharing at a high-level of the system (and semantics). Presuming high speed networks (dedicated 10Gig or more), consider architectures which share data at a low-level (partitions of disks). This project is VERY CHALLENGING. Security: many things are possible, but hard to do an experimental project. Beyond the Technology: On Demand Computing Irving Wladawsky-Berger Vice President, Technology & Strategy IBM Server Group

5 Integration of Technology Into Society Public Recognition Mass Adoption Lab Early Adopters Evolution of Technology Electricity Public Recognition Mass Adoption Lab Early Adopters

6 Integration of Technology Into Society Technology Development Phase Public Recognition Post Technology Phase Mass Adoption Lab Early Adopters Evolution of Technology Information Technology.... Network Computing The Internet Mainframes "The Glass House" Client/Server PC's/LAN's

7 Key IT Requirements Technology Issues Still Dominate Technology Advances Low Costs; High Performance Standards & Integration Hiding Complexity Organizational Productivity Quality of Service Flexibility of Deployment Technology Continues to Advance

8 Technology Continues to Advance Integrated Circuit Performance Trends # Transistors per Chip Logic Density 10GHz 1GHz 100MHz 10MHz 1MHz e-business Infrastructure Middleware Customers Business Partners Suppliers Network Web Presentation Servers Directory and Security Servers Web Application Servers Transaction Servers Data Servers Quality of Service Employees Storage

9 Culture of Standards Timely, Reliable, Sophisticated, Technologies WSDL Linux SOAP Huge Talent Pool Globus Developing Standards XML Driving Innovation Standards and Integration Web Services Interface WSDL Defines how to use the service Directory UDDI "Yellow pages" for service location Transport SOAP Connecting with applications and data

10 Hiding Complexity: Grid Computing Accessing and Sharing Resources over the Internet, or Private Intranets, based on Open Protocols Productivity on an Internet Scale Virtual Organizations: Accessing and Sharing Information, Applications and Expertise

11 On Demand Grids IT Delivered as a Utility Network-Delivered Applications Business Process Outsourcing "Intelligent" Services Middleware Hosting/ Bandwidth e-commerce and B2B Transaction Services Storage Utility Services Evolution of Technology IT in a Post-Technology World On Demand Computing Mass Adoption Network Computing The Internet Mainframes "The Glass House" Client/Server PC's/LAN's

12 Beyond the Technology: On Demand Computing What are Grids? Flexible shared infrastructures that can be automatically configured and adapted to use» Utility, Shared, Plug in and Use, Dependable» Efficient, flexible, low-cost use of resources Open infrastructures that enable federation at high levels of access and functionality» Computation Sharing» Data Sharing» Standards, Self-describing presentations, Security» Enable composition of resources, services, semantics, all the way up! Things that weren t designed to work together An evolving, emergent organic infrastructure

13 Discussion Sounds good, why haven t we always done this?» Network bandwidth» Organizations didn t trust each other; Control and property» No need or benefit; No networking» Divergence from the beginning» Extremely efficient use of resources was required (resources not plentiful)» Vendor proprietary lock-in dynamic; differentiation» Standards didn t exist, couldn t keep up» Couldn t agree on right solution, or even the needs» Didn t have the right technologies» Commodity and standardization leverage» Now huge positive benefit because of large-scale network services available and interoperation What are the costs and pitfalls of this approach?» Give up some autonomy, and sharing/openness always leaves you more vulnerable (security a challenge)» Wasteful/less efficient» Gives up proprietary/lock-in, may lose control or differentiation» With all standard and commodity, how do you sustain investment to advance the technology» Distraction from core focus down activity» Might not reap the anticipated benefits, there s a cost of investing» Not all applications are suitable; not all resources are equal» Resource Management is a challenge» Convergence may produce a monoculture more susceptible to attack or critical defect» Cost of Using the grid (discovery, scheduling, etc.) may dominate the benefit» Virtualization can make it more difficult to diagnose and fix problems» Outsourcing reduces your control, accountability

14 Do all things wind up being on the grid? And what does that mean?» Some things need isolated or decoupled, notion of independent sources may be needed» Specialized functions may have marginal benefit to be on the grid» Lots of things don t need to be networked but will they be anyway? What are the key research challenges? Summary Globus architecture» Four layers (fabric, connectivity, resource, collective) Key services» Connectivity: communication and security» Resource: Resource Allocation, Data movement, GRIS, others» Collective: Index servers, Resource Brokers, Replica Catalogs, Coreservation and co-allocation Information Services» GRIS local info provider» GIIS / MDS aggregating infromation server» All LDAP based; Multi-level filtering and resource selection» Most access is thru other resource managers and brokers

15 Discussion What is the idea behind Grids? Does Globus address these issues?» Which ones?» All of them? What functionalities are actually provided? The drive to standards and research» Analogy to the internet» Many questions are unanswered Project Discussions I Resource Description, Selection, and Binding: Using realistic resource data and several synthetic distributed application models, perform simulation experiments to explore the efficacy of several approaches. How well do different selection and binding strategies work? How well can we do if system utilization is the primary goal? Application performance? Turnaround? How does quality of service vary as a function of resource utilization, application resource specification, selection and binding algorithms?

16 Projects Discussions II Open Resource Sharing: Compare the four basic models for resource sharing cycle stealing, batch scheduling, and slicing. Perform an in-depth literature study of the merits and capabilities of each of these approaches locally and in federated systems. Build an experimental framework which allows experiments with these approaches. How do these approaches for single resources affect the properties achievable on collections of resources (e.g. in Grids)? How does application/user behavior affect the properties achievable for collections of resources? What are the worst cases, and can you demonstrate them? How does the scale of resources affect the capabilities? What real stability and compositional claims can be made (these are important for life-critical applications)? How do allocation mechanisms such as markets and trade affect the ability to made such claims? Projects Discussions III Application-driven Evaluation of Grid infrastructures: From the perspective of an important computational science application (e.g. Climate modeling, Protein Folding, Toxic Chemical diffusion, etc.), analyze the capabilities of current and future grid hardware infrastructures and technologies. Working with application experts who are well versed in the computational issues (we have several such volunteers), develop a performance model and simulation which includes a distributed application architecture, a resource description used to acquire resources, performance models for each element, and scaling characteristics. Use this simulation infrastructure to evaluate achievable performance of Grid deployments of these applications.

17 Projects Discussions IV Data-intensive applications: Traditional models of distributed filesystems and databases presume low-bandwidth networks and federation and sharing at a high-level of the system (and semantics). Presuming high speed networks (dedicated 10Gig or more), consider architectures which share data at a low-level (partitions of disks). At what network speeds and latencies are these architectures competitive with local-disk systems? At what granularity of data are traditional database systems relocatable? What are the benefits that accrue in terms of faulttolerance, resource efficiency, performance, new capability? How do problems of federated data change with the real prospect of direct access and data mobility? This project is VERY CHALLENGING. Security: many things are possible, but hard to do an experimental project.