Leading a Successful DevOps Transition Lessons from the Trenches. Randy Shoup Consulting CTO

Size: px
Start display at page:

Download "Leading a Successful DevOps Transition Lessons from the Trenches. Randy Shoup Consulting CTO"

Transcription

1

2 Leading a Successful DevOps Transition Lessons from the Trenches Randy Shoup Consulting CTO

3 What Is DevOps? Continuous Delivery? Rapid cycle times Automated testing and Continuous Integration Deployment automation and version control Lean Management Practices? Limiting work-in-progress via small batch sizes Rapid feedback via visual displays and monitoring Collaborative approach to Development and Operations Act as one team across different disciplines Solve problems instead of pointing fingers Organizational and cultural factors are most important

4 Taking The DevOps Journey Traditional Enterprises Adopting DevOps Financial Services: Capital One, ING, Bank of America, Nationwide Manufacturers: General Electric, General Motors, Raytheon, Intel, Cisco, HP Retailers: Target, Nordstrom, Macy s Higher Throughput and Stability High-performing IT organizations have 60x fewer failures and recover 168x faster High-performing IT organizations deploy 30x more frequently with 200x shorter lead times Improved Business Results Public companies with high-performing IT organizations had 50% higher growth in market capitalization over 3 years vs. low-performing IT organizations

5 Using Conway s Law Organization determines architecture Design of a system will be a reflection of the communication paths within the organization Agile, modular system requires an agile, modular organization Small, independent teams lead to more flexible, composable systems Larger, interdependent teams lead to more monolithic systems We can engineer the system we want by engineering the organization (!)

6 Small Service Teams Team develops a single set of applications or services Clear, well-defined area of responsibility Minimal, well-defined interface Amazon Two Pizza Team No team should be larger than can be fed by 2 large pizzas Typically 3-5 people Mix of junior and senior people

7 Small Service Teams End-to-End Ownership Cross-functional team owns application / service from design to deployment to retirement Able to move very rapidly and independently Self-Sufficiency Team has inside it all the skill sets to do the job Depends on other teams for supporting services You Build It, You Run It The same team that builds the software operates the software No separate maintenance or sustaining engineering team

8 Lose the Ticket Culture Ticket Culture Do what is asked for One-way communication Goal is to close the ticket Reactive approach Reinforces silos Prioritizes process Ownership Culture Do what is needed Two-way collaboration Goal is product success Proactive approach Reinforces collaboration Prioritizes results

9 Enforce a Service Mentality Vendor-Customer Discipline Service team is a vendor; the applications are its customers Service is useful only to the extent it provides value to its customers Customer can choose to use service or not (!) Customer team is responsible for deciding what is best for their use case Use the right tool for the right job Provides powerful incentives Service must be *strictly better* than the alternatives of build, buy, borrow

10 Charge for Usage Charge customers for *usage* of the service Aligns economic incentives of customer and provider Motivates both sides to optimize efficiency Free usage leads to waste No incentive to control usage or find more efficient alternatives E.g., App Engine usage at Google Charging particularly egregious internal customer led to 10x reduction in usage

11 Shared On-Call Duties All members of the team rotate on-call responsibilities Strong motivator to build in solid monitoring and diagnosis tools Best way to learn the real-world behavior of the system Best way to develop empathy for customers and other team members Common resistance Unfamiliarity with production systems and tools Fear of making a mistake That s not my job

12 Shared On-Call Duties On-call apprenticeship Apprentice starts as secondary on-call with an experienced primary, observes and learns from the primary in action Apprentice next takes primary on-call with an experienced secondary Apprentice graduates Ops at Google Developers are on-call for first 6+ months of a new service Service can graduate to Ops coverage only after intensive review of its monitoring, reliability, resilience, etc.

13 Turn Approvals Into Code Reduce or eliminate approval bodies E.g., ebay Architecture Review Board (-) Too late (-) Too slow (-) Too disengaged from details Package expertise in code Smart, experienced people build their knowledge into code Teams with specialized skills (databases, security, compliance, etc.) provide services, libraries, or tools

14 Turn Approvals Into Code E.g., Security at Google Provide secure foundations by maintaining lower-level libraries and services Provide self-service penetration tests, vulnerability assessments, etc. The best way to enforce a standard practice is with working code

15 Migrate to Microservices Single-purpose Simple, well-defined interface Independently testable Independently deployable A Easy to understand and reason about Smaller surface area B C D E

16 Embrace the Cloud Rapid Provisioning and Deployment Minutes, not weeks API-driven infrastructure Automatable and repeatable Constrained threat surface Pay For What You Use No utilization risk from owning / renting If it s not in use, spin it down Build on Provider s Scaling and Security Expertise Few organizations have the security resources of Amazon or Google

17 Embrace the Cloud The 2010s of computing are like the 1910s of electric power Soon it will be just as common to run your own computing infrastructure as it is to operate your own electric power generation

18 Build a Quality Culture Quality, Performance, and Reliability are Priority-0 features Stop the line if there is a degradation Equally important to users as product features or engaging user experience Developers responsible for Features Quality Performance Reliability Manageability

19 Build a Quality Culture Developers write tests and code together Continuous testing of features, performance, load Tests make better code Tests have your back Confidence to break things Confidence to refactor Tests help you move faster Catch bugs earlier, fail faster Slow down to speed up

20 Build a Quality Culture E.g., Development Process at Google Code reviews before submission Automated tests for everything Single searchable source code repository Internal Open Source Model Not here is a bug report Instead here is the bug; here is the code fix; here is the test that verifies the fix

21 Actively Manage Technical Debt Maintain sustainable and well-understood level of debt Measured by engineering effort to fix Plan for how and when you will pay it off Track feature work vs. accrued debt over time Don t have time to do it right? WRONG -- Don t have time to do it twice (!) The more constrained you are on time and resources, the more important it is to do a solid job the first time

22 Vicious Cycle of Technical Debt Quickand-dirty Technical Debt No time to do it right

23 Virtuous Cycle of Investment Invest in Quality Solid Foundation Faster and Better Confidence

24 Blameless Post-Mortems Post-mortem After Every Incident Document exactly what happened What went right What went wrong Open and Honest Discussion What contributed to the incident? What could we have done better?

25 Blameless Post-Mortems Take fear and personalization out of it Engineers will compete to take personal responsibility (!) Finally we can fix that broken system Focus on Learning and Improvement How should we change process, technology, documentation, etc.? How could we have automated the problems away? How could we have diagnosed more quickly? How could we have restored service more rapidly?

26 DevOps in Action ebay Search Ranking Improvements Which item should appear 1 st, 10 th, 100 th, 1000 th Before: Small number of hand-tuned factors Goal: Thousands of machine-learned factors Rapid experimentation and feedback Deployed hundreds of parallel A B tests every day Full year of steady, incremental improvements $120M in incremental ebay revenue

27 Not Just for Unicorns DevOps practices have become mainstream High performance is achievable by any IT organization Organizational and cultural change requires a significant investment of time and effort but the benefits are well worth it

28