MathWorks: A Case Study in NAS Ira Cooper The MathWorks Inc.

Similar documents
Leveraging the Benefits of SDS, Software Defined Storage Technology Off-the-Shelf Solutions with Proprietary System Performance

2010 Oracle Corporation Lustre 2.0 Customer Presentation 1

Dealing with Data: Choosing a Good Storage Technology for Your Application. Rick Wagner HPC Systems Manager July 1st, 2014"

Customer-Oriented Storage Performance Management Dany Felzenszwalbe Intel

Data is growing exponentially by the second, we help make sense of it... SERVERS AND STORAGE HCL ENGINEERING AND R&D SERVICES

The Queen s Journal Advisory Board Meeting November 11 th, 2016

IBM Spectrum Scale. Advanced storage management of unstructured data for cloud, big data, analytics, objects and more. Highlights

WorkloadWisdom Storage performance analytics for comprehensive workload insight

No more storage adventures

AWS vs. Google Cloud Pricing

Ecommerce Readiness - Is your B2C Ready for the Buying Season?

Developing and Deploying Analytics for IoT Systems

Oracle SCM Cloud Solutions

CASE STUDY: STANDARD CALIBRATIONS, INC (SCI)

Cost of Changing the Activities in SDLC. Minimum of Cost at this level. code debuging unit test integration. Activity

POINTS OF DEFECT CREATION

Computing with the CRI: Storage, HPC, and Virtual Servers. November 2 nd, 2017

Turning Feedback Into Change

Dynamics NAV Upgrades: Best Practices

Managing Innovation and Entrepreneurship Spring 2008

SAS ANALYTICS AND OPEN SOURCE

CS 5220: VMs, containers, and clouds. David Bindel

CHAPTER 3 IDENTIFICATION OF THE MOST PREFERRED ATTRIBUTE USING CONJOINT ANALYSIS

2012 Global Customer Service Barometer

Leading a Successful DevOps Transition Lessons from the Trenches. Randy Shoup Consulting CTO

"Web Age Speaks!" Webinar Series. Introduction to DevOps

Strategies for MATLAB and Simulink Upgrades

Performance, scalability and availability in AWS Managing the end to end lifecycle of cloud assets.

MY SPRING 2014 CO-OP AT CHALLIS REGAN MAJOR: SOCIOLOGY MINOR: JAPANESE

Containers and

2018 INVENTORY BOOTCAMP

Simulink as Your Enterprise Simulation Platform

Reliability Engineering. RIT Software Engineering

Simulink as Your Enterprise Simulation Platform

MicroScope storage roundtable: Watch out for DAS and SSD Part One

Oracle Communications Billing and Revenue Management Elastic Charging Engine Performance. Oracle VM Server for SPARC

CASE STUDY ANSWERIQ SUPPORT ENABLES 100% AUTOMATED TICKET CLASSIFICATION AND 50% TIME-TO-RESPONSE REDUCTION FOR

10 Things You Should Do Before You Validate Your Next Package

Here are the snapshots of the changes recorded in the three-month period

PRODUCT PRESENTATION

Magillem. X-Spec. For embedded Software and Software-driven verification teams

On the Path to ISO Accreditation

No more excuses: VDI is ready!

OUTCOME-BASED BUSINESS MODELS IN THE INTERNET OF THINGS

Medical Device Product Development for Startups

Smart GSM NMS Smart Network Management System

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Metalogix Replicator for SharePoint

Comparison of Component Frameworks for Real-time Embedded Systems

WORKING WITH TEST DOCUMENTATION

Novedades de las últimas versiones de MATLAB y Simulink

Introduction to MATLAB. EL1150, Lecture 1, HT 2014

How IT Governance Could Help FeneTech, Inc. Emily Baumgartner Muskingum University. IT Governance and FeneTech 1

SYMPOSIUM March 22-23, 2018

Storage Workload Analysis

It is therefore clear beyond any doubt that the KDC breached the requirements of the LGRA.

Verification and virtual commissioning of configurable handling systems

Cut Your PPC Campaigns Down To Size

R ANAND S VIEWS ON ORGANISATIONAL COMMUNICATIONS

Sourcing In China. Beyond low cost possibility and realizing the hidden factors. Jonathan Zhang

Scope Creep: need strong change control to prevent or mitigate. Must establish the change control system early in the project.

Model-Based Design with MATLAB and Simulink to shorten the design of a new infusion pump

Lustre Beyond HPC. Toward Novel Use Cases for Lustre? Presented at LUG /03/31. Robert Triendl, DataDirect Networks, Inc.

Automating Your Way to Simplified Application Management

CASE STUDY: 1ST AVE MACHINE

AMAZON EC2 SPOT MARKET

SOLUTION OVERVIEW. Storage Performance Analytics

Case Study. Independent Verification and Validation of an aftermarket-support product built on the J2EE platform

Load DynamiX Enterprise 5.2

Fontana IMS Inventory Management System

Zynstra Software for the Retail Edge Datasheet

Press Release. ETAS Rolls Out New Solutions for Simulink Users

Chapter 1 The Manufacturing Plant as a Business System

Agile Engineering. for Managers. Introducing agile engineering principles for non-coders

CS 147: Computer Systems Performance Analysis

Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2

ICTCM 28th International Conference on Technology in Collegiate Mathematics

Communicate and Collaborate with Visual Studio Team System 2008

Software Testing Prof. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

EPISODE 463 [0:00:00.6] JM:

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Integration and Testing

QPR ScoreCard. White Paper. QPR ScoreCard - Balanced Scorecard with Commitment. Copyright 2002 QPR Software Oyj Plc All Rights Reserved

Cliché on young high-tech employees glamourous workplace. When talking about hazardous working conditions, we usually think about smoggy

Things we can learn from simulating DC networks

Everything you ever wanted to know about deployment...but were afraid to ask. Laura

General Comments. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

IT principles are defined as the firm s operating model and any other directives clarifying the role of IT

Maintenance is Dead! A path to a New Paradigm

Channel Marketing Obsession with Appointment Setting Campaigns and What to Do About It

The Incident Management process consists of seven procedures for handling support requests.

JANUARY 2017 $ State of DevOps

Testing Phases Conference Room Pilot

ABA English: An elearning Leader Learns from Its Users E B C D A

Real-time Shopfloor Integration with Oracle Manufacturing Operations Center using Kepware

Tough Math for Desktop TCO

Advanced Recognition

White Paper. Small, Incremental Adjustments Make for Significant Gains

2015 IBM Corporation

Headline. Body. Traffic Leaf Your Institute Old Marketing 2 Behind The 6 Automotive Marketing Tactics You Need To Quit Cold Turkey

Transcription:

MathWorks: A Case Study in NAS Ira Cooper The MathWorks Inc.

Who? MathWorks develops MATLAB and Simulink. Including ~80 Toolboxes and Blocksets together! We are a company of ~2500 people (~1000 developers) across many sites: United States HQ France Germany Japan India 2

Who 2? MATLAB and Simulink offer products supporting: Model Based Simulation, Design & Verification Finance Statistics Embedded Code Generation Symbolic Mathematics And many more areas 3

What? Extremely large codebase Builds can take 24hrs+ for a complete sterile build. Linear test time for all our tests is in the thousands of hours. MATLAB integrates and tests with many 3 rd party products. MATLAB uses the file system as its namespace! 4

How? Build and Test Lab! (BaT) Helped debug SMB2 implementations. Batch system, runs 24/7/365. Thousands of cores of compute power: Mac OS X Windows 32/64 bit Linux 32/64 bit 5

Why Storage? Storage was holding up the company s progress. Storage costs equaled compute node costs. Performance: We need 100k+ ops for hours from a fileserver. The more ops, the faster the jobs run, the more code goes into the product. Stability: Over 24 hour build time 6

Why Storage 2? Bug Fixing and Verification: Have to confirm the bugs are not product issues. 90% of the fight is identifying the bug. The other 90% is convincing the vendor to fix it. Now we can work these issues as needed. If not, we can at least give very precise bug reports. 7

Design Factors High reliability: To support the twenty four hour builds Performance: 100k+ mixed ops per fileserver is not unheard of. Cost: NASCAR for the price of a Corolla. 8

Design Factors 2 Huge numbers of metadata operations. So we can bias our fileservers to our needs: Large amounts of memory to cache metadata More servers, because they are cheaper! Don t have to run them as hard Close to the limits == Problems! 9

Design Choices High Availability: There is none! Deal with the Devil : Each server will go down one day a year, and we can t say which. The system will go twice as fast and cost much less. For a batch system, this can be a great deal! 10

Design Choices 2 No HA means simple servers. For multi-protocol fileservers It also allowed us to go from POC to production in about 6-8 months. 11

Design Choices 3 Timely support is critical: It can be provided in house. We know and understand the priority of our own problems better than any vendor can. 12

What solution? Nexenta Core + Samba 3.6-GIT SuperMicro servers To say more 72gb+ RAM 3 L2ARC SSDs 2 ZIL SSDs 19 10k drives all in one 2U server. 13

Why Nexenta? OpenSolaris is a very solid OS. Nexenta Core also acted more like Linux. ZFS is a very solid filesystem: Raid RaidZ, RaidZ2, Mirror Snapshots SSD Read Caching L2ARC Not quite tiering, but cost-wise that s fine! SSD Write Caching ZIL Required due to heavy NFS traffic 14

Why Nexenta 2? Nexenta is doing what we are doing! ZFS NFS On SMB we disagree: Kernel CIFS vs. Samba Always better to work with people whose interests align with yours! 15

Why Samba? Kernel CIFS had issues in our AD environment. Also, it did not have SMB2. Likewise is missing a key feature: Notify. Samba has a very mature codebase. Samba also has a very strong community. This leads to more features and bug fixes. 16

Results! Put some neat graphs here! 17

Results 2! Management is more understanding of the effort involved because we are more hands-on. They want the problem to be in our servers. They are confident we can fix it! 18

Example Problem: C/N SMB2.002. Compounded Create + Notify. Every so often, servers wouldn t reply correctly. Causing disconnects Causing our builds to fall over We were able to diagnose the issue. However, vendors wouldn t take notice. 19

Example Problem: C/N 2 What can we do? Even when we got a solution from a vendor, it might just fix the bug sometimes. We want to roll out SMB2 badly to speed up builds. A large speed up in our builds is critical to our business. 20

Example Problem: C/N 3 Eventually, we decided to do it ourselves. We worked in house to develop an awful prototype patch. The Samba team rewrote that patch. We worked with the Samba team to QA the final version that went into Samba. 21

Example Problem: C/N 4 Samba s fix led to the issue being recognized throughout the industry. Internally, we realized Hey, just like in MATLAB, finding and isolating the bug is often over 90% of the fight! So why not just do it! 22

Example Problem: Groups Solaris only allowed a user to be in only 32 groups. If setgroups was called with more, the process was killed! Samba thought this was a security issue: people should be in the groups AD says. The security issue was not a concern for us. We patched our own version and went about life. 23

Example Problem: Groups 2 A key thing about open source: Nobody can stop you from doing what you need to do to make it work! Work together as you can. Agree to disagree as you must. 24

Example Problem: ZFS Panic After one of the updates to Nexenta Core we started getting a kernel panic. Clearly we couldn t ship with it in place. We were able to take crash dumps. Used the source + Mercurial to find out what happened. Patched it locally. We felt confident, but ZFS is hairy. 25

Example Problem: ZFS Panic We sent Nexenta a patch, and asked them to verify it. It went into their shipping products! They also asked us to QA a better fix for this issue. We did and now use their fix. Work with people; make it worth their time. 26

Issues No HA does mean that from time to time things do go wrong, and we have more to clean up. No direct vendor support: You broke it, you own both halves. Learning how to build a NAS device in 6 months was a challenge. 27

Final Thoughts The customer is always right: Especially when he is right down the hall from me and paying my check directly! Working with the open source community has been a wonderful thing. The servers have been a great success. They have enabled the growth of the BaT lab and the company. 28

Questions?? 29

Thank You for Attending!! 30