Offering Overview Heptio Kubernetes Subscription. Revised September 2018

Size: px
Start display at page:

Download "Offering Overview Heptio Kubernetes Subscription. Revised September 2018"

Transcription

1 Offering Overview Heptio Kubernetes Subscription Revised September 2018

2 Introduction Heptio Kubernetes Subscription (HKS) enables you to adopt upstream Kubernetes in production through validated designs for deployment, operations runbooks, Heptio tools that improve usability, and 24x7 break-fix support. HKS delivers the assurance that comes with a distribution, but with the added benefits of control, portability, flexibility, and cost effectiveness. HKS includes: Validated designs for Kubernetes deployment, for on-premises and cloud environments 24x7 support (advisory and break-fix) of production-grade Kubernetes deployment from Heptio s Customer Reliability Engineering (CRE) team Advisory support for related cloud-native technologies (logging, monitoring, container engine, etc.) Hot fixes and custom builds (when necessary), with best effort to merge upstream Heptio tools to improve experience and supportability This document provides an overview of the HKS offering, outlines which technologies are supported and support classes and SLAs, and describes Heptio s support process. Support Overview Customer Reliability Engineering (CRE), a model originally developed at Google, focuses on systematizing and automating solutions to customers issues to solve problems for everyone. CRE is based on a close partnership between the customer and Heptio, where Heptio strives to provide advisory support on a proactive basis, rather than break-fix support on a reactive basis. Heptio s CRE team strives to provide durable and repeatable solutions in response to customer requests. Our service puts an emphasis on learning from every interaction, and taking sustainable action to prevent the same problem from recurring. We gather constant feedback from our customers on their experience, and capture the resolutions in a manner that can help the next person avoid the same problem. Kubernetes is evolving quickly, and an enterprise that seeks to adopt Kubernetes to achieve greater application portability, efficiency, and developer velocity will be well served by staying more closely connected to the upstream Kubernetes community. Heptio Kubernetes Subscription is structured to scale and change with the upstream open source community. This gives your organization the flexibility to make configuration and technology choices that are best for your business, while remaining within known 2

3 supported guard rails. Heptio s validated designs outline the desired state of a Kubernetes cluster including the related cloud-native technology components. Support Availability and SLAs Heptio offers four ticket severity levels based on the type of issue you are experiencing. Support hours differ based on the support level purchased - Business Critical (24x7) versus Business Day (6 am-6 pm PST). Severity Level Description Business Critical Hours Business Day Hours Initial Response Urgent (Severity 1) Whole production system outage, complete inability to provide service 24 x 7 x hour High (Severity 2) Operations can continue in your production or staging environments where productivity is affected. 4 hours Normal (Severity 3) Partial non-critical loss of functionality; these can be technical questions, configuration issues, and issues that affect a small number of users. 1 Business Day Low (Severity 4) Non-severe issues that include questions and feedback No SLA Support Classes Break-Fix Heptio will provide hotfix patches whenever possible for Severity-1 and Security issues and work to navigate upstream merges. Severity 1 through Severity 4 tickets allowed. Advisory Heptio will work on a best effort basis to debug and identify issues as they pertain to Kubernetes. Heptio will NOT offer hotfix patches. Only Severity 3 and 4 tickets allowed. Unsupported Heptio will not support these components unless a special support agreement is defined. 3

4 Support Coverage Technology Components Heptio Kubernetes Subscription is based on a set of validated designs centered on upstream Kubernetes. Our field engineering team delivers a strong baseline solution that works out of the gate, built on open source parts, with the flexibility to adapt or tune those parts to meet unique enterprise operating requirements. We balance your specific needs with our ability to ensure you will be able to successfully deploy and operate your Kubernetes environment. The specific technologies we support are based on experience working with a broad set of customers. We guide customers towards proven solutions rather than favoring proprietary suboptimal products. Heptio offers Break-Fix support for Kubernetes, Heptio Ark (disaster recovery), and Heptio Sonobuoy (cluster configuration). We provide Advisory support for select technologies for container engine, container network, container registry, monitoring, logging, and ingress. For a complete list of the specific technologies and versions that Heptio supports as part of HKS, refer to Appendix A. If a technology you would like to use is not listed please contact us to determine if we can support it. 4

5 Third Party Support Heptio assumes you will leverage the open source or commercial support available to you for components outside of our scope. Heptio support can assist on a case by case basis as an advocate for our customers and the issues they encounter in those components, but we can not take responsibility for driving resolution of issues encountered with components outside of our scope. Kubernetes Support Each major Kubernetes release offers enhanced functionality, features and bug fixes. Heptio provides Break-fix support for the two most recent versions of Kubernetes, and Advisory support for earlier versions. Heptio does typically not recommend upgrading on the first point release of a Kubernetes version (e.g ), and the officially supported Kubernetes versions increments with the first stable point release (typically x.y.1 or x.y.2). Heptio validated designs ensure that you are set up to successfully deploy and operate your Kubernetes environment. Our goal is to keep your cluster in a working state, and as such have defined a number of 1) technology components and 2) architectural decisions under which our support for Kubernetes will be either Advisory-level support or Unsupported. If you are using these technologies or are set up incorrectly, Heptio can help put you on a path in which you will be successful. For a full list of technologies, refer to Appendix B. For a full list of architectural decisions, refer to Appendix C. Each Kubernetes release includes features in various stages of production-readiness. These are defined as alpha, beta, and stable features. Heptio will provide Break-fix support for stable features, Advisory support for beta features, and alpha features are Unsupported. Refer to Appendix D for more details. Scope of Support Heptio s CRE team is here to help with issues as they arise. However, issues which require field engineering services are out of scope for support. Examples include: Custom feature requests Performing an upgrade Deploying new technology Fixing a significant degradation of cluster (e.g. due to newly introduced technologies) As performing upgrades is out of scope for HKS support, customers have two options. 1. Engage Heptio field engineering for a managed upgrade engagement 2. Perform upgrades on their own, with Heptio advisory: 5

6 a. Heptio participates in planning b. Heptio reviews plan c. Heptio is on-call during the upgrade Support Qualification Heptio Sonobuoy Scanner One result of the wide-scale adoption of Kubernetes by a diverse community is a plethora of ways to set up a Kubernetes cluster. A potential risk of an open source model is that your cluster was not set up properly in the first place, causing issues further down the road. Heptio mitigates this risk through cluster pre-qualification and Heptio Sonobuoy, an open source tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests. Sonobuoy is the underlying technology used to qualify any CNCF Certified Kubernetes Distributions, including those provided by vendors such as Red Hat and CoreOS. Sonobuoy enables Heptio to ensure your cluster is set up properly, so you don t run into issues. When issues do arise, output from Sonobuoy allows Heptio to quickly diagnose and address them. Learn more about Heptio Sonobuoy at heptio.com/products/#heptio-sonobuoy. Support Prequalification For customers that have existing Kubernetes deployments, Heptio needs to ensure the deployment is supportable. Heptio will request customers run Heptio Sonobuoy across all clusters to be supported. Support will revert to an Advisory level of support for any environment that does not pass a Sonobuoy scan. Heptio will also request customers to accept a dummy patch to ensure patches can be delivered. A review of the Sonobuoy scan and additional information will be discussed in a pre-qualification meeting. The objective of the prequalification meetings is to work collaboratively to verify the cluster for supportability. Following the prequalification meeting Heptio provides a report to the customer with details from Heptio Sonobuoy and the prequalification report. This report highlights the variance between Heptio s preferred configuration and best practices. Heptio will recommend a path forward that may require a Heptio Field Engineering engagement for the environment to be supported. 6

7 Quarterly Review Process and Ongoing Qualification Heptio conducts a quarterly review process with HKS customers to ensure you continue to be set up for success, discuss any ongoing issues you may be experiencing, and collect feedback on Heptio support interactions so that we can improve. As part of the quarterly review, Heptio requires customers run Sonobuoy across all clusters once per quarter. If a customer has either changed something in their environment, or failed to upgrade required technologies, and the environment no longer meets HKS qualification criteria, we will notify you of any changes required and work together on a path forward. If you fail to make the required changes, you may fall back to Advisory support or Unsupported status, based on the requirements outlined in Appendix A, Appendix B, and Appendix C. Support Onboarding Prior to the start of the HKS contract period, the Heptio CRE team will set up an onboarding meeting and request a list of users on your team. The onboarding meeting ensures you are successfully set up for ongoing support and have the information you need. It will cover the terms of support based on what you purchased, how to submit tickets & overview of priority levels, and the patch process. Ticket Process Communication Channels Tickets can be submitted via our ticketing system at support.heptio.com or by directly ing support@heptio.com. Tickets submitted via at support@heptio.com will automatically default to Severity 3 (Normal). During the course of troubleshooting any ticket, the assigned Heptio Support Engineer and their customer counterpart may determine that an alternative interaction model (Google Hangout or Zoom session, Slack, phone call) would be a more effective communication mechanism, and we encourage the use of any and all of those channels as long as the latest status and next steps about the ticket are maintained in the ticketing system. Every single issue for which we are contacted is logged as a ticket. 7

8 Severity 1 issues 1. You report an issue via the ticketing interface or . This creates a ticket and notifies Heptio CREs. 2. The responding engineer will respond with a conference bridge invitation for real time collaboration. 3. If an engineer needs to ask you a question while no one is in the conference bridge they will add a comment to your ticket, and set the ticket to a Pending state, while they await your response. 4. When you respond via web browser or your response sets the ticket to Open while you await an answer from Heptio Support. 5. Once the issue has been mitigated the priority of the ticket will be lowered to Severity 2 to finish the ticket lifecycle. All other severity level issues 1. You report an issue via the ticketing interface or . This creates a ticket and notifies Heptio CREs. 2. If an engineer needs to ask you a question they will add a comment to your ticket, and set the ticket to a Pending state, while they await your response. 3. When you respond via web browser or your response sets the ticket to Open while you await an answer from Heptio Support. 4. If the issue requires a Heptio provided bug fix then the ticket will be put into an On-Hold state while it is worked on by engineering staff internally to Heptio. During the On-Hold process Heptio will continue to provide updates as the issue progresses. a. Identify code fix is needed and confirm customer desire for us to work on/implement b. Create task for Engineering team and set projected completion date c. Work with Engineering team to identify the resource(s) that will work on the fix 5. This flow will continue until your issue is resolved. a. If you do not respond within 3 business days we will update the ticket asking if everything is working as intended. b. After 5 business days we will update the ticket to solved, as the assumption is the issue is no longer a priority. If you respond to the ticket after this it will automatically reopen the ticket into our queue. 6. After 30 days of a ticket being Solved the ticket will be marked solved and will not be able to be reopened but can be referenced within your organization's ticket 8

9 history. During an issue triage Heptio Support will request a Sonobuoy run to verify cluster telemetry at the current time. Heptio support will use this information as a basis for cluster state for investigative purposes. Software Patch Process One potential risk of using upstream Kubernetes is when you face an issue and need an immediate fix. To address this challenge, we can deliver hot patches to make sure your Kubernetes cluster is production hardened, and work with the community to ensure fixes are included in the next Kubernetes release. We will work with you to determine if we can provide a patch based on the issue encountered. Heptio s patch process is designed to provide you immediate bug fixes for upstream Kubernetes, and is not intended to provide additional features or functionality. Heptio releases typically will include backports of patches that have merged to the kubernetes/kubernetes master branch, so that customers running a previous release can acquire bug fixes in a timely fashion. Only a small subset of bugs are backported from master to previous versions. The majority, instead, are delivered in whatever the next release is. For example, if a bug is fixed after v1.9.x is released, and it is not a critical issue, then the first release containing that bug fix will be v Continuing with this example, many customers may not be willing to wait for v to obtain a particular bug fix. They may also be hesitant to upgrade to a.0 patch level, opting instead to wait for.1 or.2. Heptio will potentially backport bugs that the upstream community chooses not to, which means that customers can get the fixes they need without having to perform a full upgrade to a newer version to acquire them. These releases may contain patches that have not yet merged to kubernetes/kubernetes, depending on their importance to an individual customer. Heptio will make every effort to ensure these patches are merged upstream, but in rare circumstances, this may not happen. Supported versions We will support providing bug fixes to the current (n) and previous (n - 1) versions of Kubernetes. In other words, if v1.9.x is the current version, we will support both v1.9.x and v1.8.x. 9

10 Release artifacts Below is a list of artifacts that make up a release. Any component not listed below is out of scope for a Heptio Kubernetes release. We may, at our discretion, choose to build additional components as needed (e.g. kube-dashboard). We will at least build an executable file for every component. For server-side artifacts, we will initially build for the linux/amd64 architecture. For client-side artifacts, we will initially build for the linux/amd64, darwin/amd64, and windows/amd64 architectures. Core artifacts These are artifacts the are required to have a functional Kubernetes cluster. Component Role(s) RPM/DEB Docker image Arch kubelet Node Bootstrap X linux kubeadm Bootstrap X linux kubectl Client Bootstrap X Linux, Windows, mac kube-proxy Node X linux CNI plugins Node X linux kube-aggregator Control plane X linux kube-apiserver Control plane X linux kube-controllermanager Control plane X linux kube-scheduler Control plane X linux 10

11 Add-on artifacts These are additional components that enhance the Kubernetes experience. We will only build and publish these as needed. They will not be part of a core release. Component RPM/DEB Docker image Arch kube-dns X linux coredns X linux Carrying backport branches forward When Kubernetes releases a new patch version, Heptio will determine which backports are still relevant for the new patch version. A backport is still relevant if the new patch version does not contain the fix. End of life When Kubernetes releases a new patch version (e.g ), and we ship a release based on that new version, any and all previous Heptio patch versions for the same major.minor release are immediately end-of-life (e.g , 1.9.1, 1.9.2). 11