Service Assurance for the Virtualizing and Software-Defined Networks Cisco Knowledge Network Presentation Product Management, Cloud and Virtualization October 7, 2015 Paola Arosio Deepak Bhargava
Agenda NFV Adoption Trends Current OSS Limitations and Challenges The New Approach to Service Assurance Required Attributes Summary, Contact and Resources 2
NFV Adoption Trends Source: Heavy Reading NFV and Service Assurance Report, June 2015 3
Current OSS Limitations and Challenges 4
OSS Limitations and Challenges for Service Assurance 73% of problems are reported by end users Yet only 2% of users with a bad experience complain Customer Reported Issues 73% 98% Customers who don t complain Provider detected issues 27% Source: Forrester Study 2% Source: Gartner how to approach customer experience management Customers who complain By inference: Existing systems detect <1% of performance issues ~97% of issues are neither detected nor reported 5
NFV and SDN Introduce Further Complexities This stuff is complex. I don t know if the fault is me or them Service Service NFV Application Everyone in Support is working in isolation of each other NFV Transmission and Mobile Virtualized Networks and Compute Cloud, Elastic Compute, SDN and NFV Application We need to detect issues before the end users call Disassociated People Silos Disassociated Apps & Infrastructure Disassociated Monitoring and Aggregation Tools Owner Operated DevOps Apps 3P Service Outsourcer Operated NfV Alert Management / Log Management / APM / Performance Analytics 6
NFV and SDN Requires a New Approach to Service Assurance CSP Business Requirements Traditional Assurance Approach Cisco s Approach to Service Assurance (Aided by orchestration) End-to-end service and customer experience focus Resource centric Service state calculated as an after-thought based on low-level KPIs Service model based SLA decomposition and cross-domain data aggregation Rapid Service Creation An after-thought Coupling of orchestration and assurance Service model-driven assurance Visibility into dynamic Infrastructure Bottoms-up modeling Rules-based approach Subscription to VI/VNF changes and dependency mapping Self-healing and optimization enabled by feedback loop Managing both virtual and physical resources Static model for dedicated resources Hybrid analytics & model based approach for service impact and root cause analysis Seamless integration with OSS/BSS current environment Segmented & specialized tools and operations Modular and layered architecture delivering deployment flexibility Open horizontally scalable data platform Decoupling of publishing from consumption layer - break silo s without disrupting operations 7
Poll Question- What are your top three requirements for next-generation service assurance? (choose top 3) 8
A New Approach to Service Assurance 9
Service Assurance: Key Tenets Deliver reliable services and a consistent user experience Cross-domain and Multi-vendor: End-to-end visibility across multiple domains (i.e. CPE to WAN to NFV and Cloud) Multi-layer: Correlated views across various layers - service, virtual, and physical Orchestration Integration: Provision service assurance at the time of service instantiation Self Healing: Policy-based automation that combine visibility and analytics to control and optimize Out-of-Box Content: Pre-defined content for supporting use-cases External Integration: Based on open APIs Example: Cloud VPN L2/L3 CPE Branch A (ISR, NID) Branch B x86 CPE VPN (IPSec or IPv6 L2TPv3 WAN CSR1kV NFV NAT vwaas vfw vwsa Hosted DC Compute Storage 10
Modular Architecture Breaks siloes and enable vertical specialization Out-of-box content to support specific use cases. Use common YANG service model for service and assurance descriptors definition Loosely coupled and tight integration with Service Orchestration Cross-Domain Orchestrator Fulfillment Cross- Domain Domain Specific Collection Analysis Presentation VMS VPC GiLAN Other Customer Portal SLA Dashboard Problem/Incident Management Service Health Impact Analysis Fault and Cause Analysis Event Analysis Service Quality Assessment Service Management Events Logs Metrics Network, Compute, Storage (Physical, Virtual) Operator Portal Service Health Dashboard Capacity Planning & Forecasting Log Analysis Operations Executive Dashboard Routing & Reservation Console Distribution Bus Billing Mediation Legacy OSS GWs Service/Workload Placement Optimization Metric Analysis Business BSS/OSS Mediation Modular and Layered architecture allows for: Reuse of existing collection mechanisms Integration with existing customer and third-party OSS applications Analysis layer covering specific capabilities: RCA, SIA, SLA Modular approach to allow analytics plug-ins Horizontal Scalable Platform to collect data from different sources and publish data to consumers 11
Poll Question: Which Service Assurance functions are gating your NFV deployment? On a scale from 1 (not important) to 5 (critically important), please rank the importance of having the following service assurance analytics functions in place when FIRST deploying NFV. 12
Service Assurance Architecture Required Attributes 1. Open and modular, aligned with Big Data framework 2. Service model-driven assurance 3. Analytics based OSS functions applied across physical and virtual infrastructure 13
Open and Modular Architecture 14
Open and Modular Architecture Enables flexible deployment Cross- Domain Domain Specific Collection Analysis Presentation Customer Portal SLA Dashboard Problem/Incident Management Service Health Impact Analysis Fault and Cause Analysis Event Analysis Service Quality Assessment Service Management Events Logs Metrics Network, Compute, Storage (Physical, Virtual) Operator Portal Service Health Dashboard Capacity Planning & Forecasting Log Analysis Operations Executive Dashboard Routing & Reservation Console Distribution Bus Billing Mediation Legacy OSS GWs Service/Workload Placement Optimization Metric Analysis Business BSS/OSS Mediation Greenfield Mix Brown- Green Field Brownfield SERVICE ASSURANCE NETWORK SERVICE MANAGEMENT INSTRUMENTATION 15
How Service Assurance is Realized Today SNMP stats Polling Stats Perf Analysis Dashboard & Reporting Tight coupling of data aggregation/store/ analysis pipelines realised in products Multi-stage processing- both at aggregation and analysis SNMP traps Event aggregation Events Fault Analysis Dashboard & Reporting Architecture is a function of product choices Filtering on aggregation to reduce data volume Logs Log Aggregation Logs Log Index/ Search Dashboard & Reporting Analysis functions largely based on programmatic rules, derived topology models and static thresholds Data sources Data aggregation Data store Data analysis Outputs 16
Big Data based Service Assurance Analytics enables self-healing and optimization OSS functions can be expressed as operations against the entire OSS data set: Fault management = ƒ(event data, metric data) Performance management = ƒ(metric data) Billing mediation = ƒ(event data, metric data) Capacity management = ƒ(metric data) Analytics in the Loop(s): Self-healing and optimization enabled by feedback loop 17
Big Data Architecture Customer- level data Infrastructure and service- level data Customer QoE Monitoring Applications Orchestration Controllers Devices Metric aggregation Event aggregation Log Aggregation Network Telemetry Publishers: Data aggregation Open Data Platform Data Distribution Live stream Stream processing Real Time Data Store Master Data Store Batch Processing Data Store & Processing Deep Historical Query Real Time Query Fault Analysis Incident & Problem Management Performance Analy.cs SLA Repor.ng Log Search Security and Threat Analysis Capacity Analy.cs Billing (Media.on) Business Intelligence Consumers: Data analysis Applications Benefits An open system architecture with no dependency to any specific vendor or product Allows any analytics application to mine any data source, leveraging the full value of the OSS dataset Extensible add new OSS analytics functions quickly and seamlessly with minimum development cost Minimizes duplicate polling collect data once, use many times Remove cross-system integration Leverage rapid innovation in Big Data analytics space Platform is extensible beyond OSS 18
Service Assurance Functional Baseline Event analytics Customer- level data Infrastructure and service- level data Customer QoE Monitoring Applications Orchestration Controllers Devices Real-time inventory Event data Metric Data Data Platform Fault Analytics Service Health Performance Analytics Incident & Problem Management Service Status Dashboard Incident UI Event Console Views Real-Time Service Health SLA Reporting Time-series analytics 19
Service Model Driven Assurance 20
Service Assurance is a Service Lifecycle Problem SLA Definition What SLA is required? Service Level Definition Day-0: Before service provisioning Service availability Loss, latency, jitter, Service Placement Where can it be supported? Orchestration Put it there Service Placement Day-1: During service provisioning Service Provisioning Admission Control Workload Placement Service Availability Monitoring Reporting Service Assurance Verify the service is available and how it is performing Scale-up/-down based upon load Local recovery actions if the VNF is unavailable/underperforming Identify underlying causes and fix them asap Service Monitoring Service Management & Operations Service Elasticity and Availability Day-2: After service provisioning Service Elasticity and Availability Performance Mgmt Service Level Monitoring Fault management {cause analysis, Impact analysis} Incident / problem mgmt. Remediation 21
SLA Definition Services are different Service Level Definition No generic KPI measurement component for all service types Smart components (i.e probes) per service-type often needed Focus on Service-Type KPI definition and direct Service KPI monitoring VoIP Video on Demand Service-type specific KPIs e.g. MOS, App Response Time, MDI, vmos Video Streaming Exchange Sharepoint WiFI IaaS iwan Cloud VPN Service-type specific KPIs Service-type specific KPIs Service-type specific KPIs Generic KPIs Latency Availability, Uptime f() SLA Status Violated Jeopardized 22
End to End Service Assurance e2e Service Orchestration Service Model Creation / Change Service Provisioning Assurance Service, SLA & Policy Definition Feedback Loop Dashboard/ Reports CPE Service WAN Service IAAS Service VNF Service Monitoring Policy Definition CPE Service WAN Service VNF Service #1 Managed CPEs WAN WAN End to End Service IaaS #1 VNF Service #2 VNF #3 VNF #4 IaaS #2 Virtual Network FuncEons IaaS #3 Virtualized Data Center Infrastructure Visibility-Policy- Control Feedback Loop SLA attributes (examples): Domain Specific SLA Decomposition Availability: 99.99% (RFS) Response Time: < 100ms Throughput: Control/ > 1 Gbps Instrumentation Provisioning Configurati on changes Network/Compute/Storage Analytics (..) Cross-Domain, Multi-Vendor Collection 23
Service Assurance Auto Enablement Leverage YANG and Orchestration Engine for Service Assurance Provisioning Service Provisioning Service Model Extension Service, SLA & Policy Definition Remediation feedback Dashboard/ Reports Extend service model to describe service intent and SLA descriptors Leverage full power of YANG model to support the service assurance parameters Monitoring Policy Definition Visibility-Policy- Control Feedback Loop Domain Specific SLA Decomposition (RFS) Analytics (..) Inventory Models to facilitate SIA and RCA Auto-provisioning of instrumentation, test capabilities and probes Instrumentation Provisioning Network/Compute/Storage Cross-Domain, Multi-Vendor Collection Configure collection systems 24
Service Assurance Orchestration Service Provisioning Container {. Leaf {. List { } } } Service Model VNFD NSO Defines VNF and associated SLA Activation Test: validates that the service works Defines scale and availability parameters for VNF which are managed locally by VNFM; determines VNF monitoring by VNFM Configures required instrumentation on that VNF, e.g. syslog, SNMP, etc. Configures required monitoring on monitoring system for that VNF, e.g. what to poll for and when Configures reporting system so that it knows how to interpret monitoring data, e.g. rollup calculations, alter thresholds etc. Share Service Definition and infrastructure context for analytics VNFM (V)NF Monitoring Systems Reporting Real-time inventory 25
Analytics based OSS Functions 26
Hybrid Analytics and Model-based approach Expedite Service Impact and Root Cause Analysis Cross-Domain Orchestrator Data Enrichment Service Model Model Based Analytics Based Collection Analysis CPE Service WAN Service e2e Service CPE Service WAN Service VNF Service #1 Resource/Domain Manager Resource/Domain Manager Distribution Bus Events Logs Metrics IaaS #1 IAAS Service VNF Service VNF Service #2 VNF #3 VNF #4 IaaS #2 Resource/Domain Manager IaaS #3 27
Analytics-based Fault and Situation Analysis Contextualized Alert Clusters reduces Time to Detect, Diagnose and Resolve Clean Contextualize Close Real-time, Automatic Data Categorization and Noise Filtering (Unsupervised Machine Learning) Real-time, Automatic Anomaly Detection; Alert Groups (Unsupervised Machine Learning) Real-time, Automatic Situation Awareness; Dynamic Teaming (Unsupervised Machine Learning) Events Filtered Alerts Situations Notifications Situation Rooms Network, Compute, Apps, Sentiment 28
Analytics-based Fault and Situation Analysis Contextualized Alert Clusters reduces Time to Detect, Diagnose and Resolve Clean Contextualize Close SNMP Translation Syslog Translation NSO Ticketing System Automation Event Enrichment Situation Enrichment Stakeholder Notification Events Event Filtering Situations Orchestration Knowledge Significance Ranking Situation Prioritization Collaboration 29
Analytics-based Fault and Situation Analysis Enhance Operational Efficiency with Collaboration and Knowledge base Historical Algorithms Generated Similarity Factor Identifies Past Similar Problems User Comments Entered during troubleshooting and resolution of past problems helps resolve current instance of recurring problem Knowledge Base 30
Summary Service Assurance: Attributes to effective operationalization of virtual infrastructures An ideal solution must: Leverage on YANG service models as a bridge between orchestration and assurance Provide a horizontal and scalable big data platform & real-time data collection based on "publish/ subscribe" principles Incorporate analytics functions across both physical and virtual elements Enable self-healing through close looped feedback 31
Poll Question: Service Assurance in Hybrid Physical and Virtual Infrastructures 32
Contacts and Resources 33
For More Information Cisco Contacts Americas Moti Beharav: meharav@cisco.com EMEAR Brett Holmes: breholme@cisco.com APJC Andrew Eaton: aneaton@cisco.com Resources Heavy Reading White Paper The Role of Service Assurance in the Virtualizing Network Cisco Evolved Services Platform www.cisco.com/go/esp Cisco Service Management and Orchestration Software Portfolio www.cisco.com/go/servicemano l 34