HPC Clusters as code in the [almost]* Infinite cloud Brendan Bouffler, Global Scientific Computing @boofla #scico 2016-06-30 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Science is one of the greatest areas of computation Amazon huge, really disruptive, impact what we are about
Research means Collaboration Map of scientific collaboration between researchers - Olivier H. Beauchesne - http://bit.ly/e9ekp2
Existing 1. Oregon 2. California 3. Virginia 4. Dublin 5. Frankfurt 6. Singapore 7. Sydney 8. Seoul 9. Tokyo 10. Sao Paulo 11. Beijng 12. US GovCloud 1. Ohio 2. India 3. UK 4. Canada 5. China+1 AWS Region Availability Zone regions are sovereign leaves your data never
More time spent computing the data than moving the data.
Peak: 58K cores Valley: 12K cores
In the first year: Over 400,000 scenes available Over 1 billion hits globally Used for new product development by: Colin Reilly Senior Director GIS NYC Department of IT & Telecom
ESA, Planet Labs, Copernicus data from ESA s Sentinels & Zooniverse The first live public test for an effort dubbed the Planetary Response Network (PRN) Within 2 hours of the Ecuador test project going live with a first set of 1,300 images, each photo had been checked at least 20 times. It was one of the fastest responses I ve seen, says Brooke Simmons, an astronomer at the University of California, San Diego, who leads the image processing. Steven Reece, who heads the Oxford team s machine-learning effort, says that results a heat map of damage with possible road blockages were ready in another two hours.
Cray Supercomputer
A top500 supercomputer 2013-style
Introducing Alces Flight - self-scaling HPC clusters instantly ready to compute, billed by the hour and using the AWS Spot market by default to achieve supercomputing for ~1c per core per hour. 750+ popular scientific applications AWS Marketplace immediately http://boofla.io/u/alcesflight
Virtual Private Cloud Auto-scaling group Head node Instance /shared Compute node Instance Compute node Instance Compute node Instance Compute node Instance 10G Network Head Instance 2 or more cores (as needed) CentOS 7.x OpenMPI, gcc etc Choice of scheduler: SGE (now) PBSpro & SLURM (coming soon) Compute Instances 2 or more cores (as needed) CentOS 7.x, Amazon Linux Auto Scaling group driven by scheduler queue length. Can start with 0 (zero) nodes and only scale when there are jobs.
Wall clock time: ~1 hour Wall clock time: ~1 week Cost: the same
TECHNICAL & BUSINESS SUPPORT AWS MARKETPLACE MANAGEMENT TOOLS PLATFORM SERVICES ENTERPRISE APPS HYBRID CLOUD MANAGEMENT Support Big Data & HPC Queuing Analytics Data Warehousing Mobile Sync Development Containers App Mobile & Web Front-end Virtual Desktops Direct Connect Professional Services Business Apps Notifications Hadoop Identity Source Code Functions Sharing & Collaboration Identity Federation Partner Ecosystem Security Search Streaming Push Notifications Build Tools Identity Email & Calendaring Deployment Training & Certification Development Orchestration Data Pipelines Mobile Analytics Deploymen t Data Store Directories Backups Email Machine Learning Mobile Backend DevOps Real-time Storage Gateway Integrated Management Account Management Backup SECURITY & MANAGEMENT Solutions Architects Databases Virtual Private Networks Identity & Access Encryption Keys Configuration Monitoring Dedicated Security & Pricing Reports Industry Solutions Regions Availability Zones Compute INFRASTRUCTURE SERVICES Storage Objects, Blocks, Files Databases SQL, NoSQL, Caching Networking CDN AWS Building blocks
C4 Intel Xeon E5-2666 v3, custom built for AWS. Intel Haswell, 16 FLOPS/tick 2.9 GHz, turbo to 3.5 GHz Feature Processor Number Intel Smart Cache Instruction Set Specification E5-2666 v3 25 MiB 64-bit Instruction Set Extensions AVX 2.0 Lithography Processor Base Frequency Max All Core Turbo Frequency Max Turbo Frequency Intel Turbo Boost Technology 2.0 Intel vpro Technology Intel Hyper-Threading Technology Intel Virtualization Technology (VT-x) Intel Virtualization Technology for Directed I/O (VT-d) Intel VT-x with Extended Page Tables (EPT) 22 nm 2.9 GHz 3.2 GHz 3.5 GHz (available on c4.2xlarge) Yes Yes Yes Yes Yes http://docs.aws.amazon.com/awsec2/latest/userguide/c4-instances.html Intel 64 Yes
All the traditional command-line tools will be familiar, but you can also create an Alces session and immediately launch a desktop view of your cluster to run graphical apps. Command Line (ssh) Graphical Console
graphical console simultaneously with your collaborators work together with the visual console in realtime Make more discoveries faster
There are cluster filesystem options, too for when you need extreme I/O scaling.
Disrupting research, wherever it s happening.
39 years of computational chemistry in 9 hours Novartis ran a project that involved virtually screening 10 million compounds against a common cancer target in less than a week. They calculated that it would take 50,000 cores and close to a $40 million investment if they wanted to run the experiment internally. Partnering with Cycle Computing and Amazon Web Services (AWS), Novartis built a platform thst ran across 10,600 Spot Instances (~87,000 cores) and allowed Novartis to conduct 39 years of computational chemistry in 9 hours for a cost of $4,232. Out of the 10 million compounds screened, three were successfully identified.
CHILES will produce the first HI deep field, to be carried out with the VLA in B array and covering a redshift range from z=0 to z=0.45. The field is centered at the COSMOS field. It will produce neutral hydrogen images of at least 300 galaxies spread over the entire redshift range. The team at ICRAR in Australia have been able to implement the entire processing pipeline in the cloud for around $2,000 per month by exploiting the SPOT market, which means the $1.75M they otherwise needed to spend on an HPC cluster can be spent on way cooler things that impact their research like astronomers.
1. AWS Account 3. A problem to solve