Continuous Validation An Approach to Production Assurance in DevOps

Size: px
Start display at page:

Download "Continuous Validation An Approach to Production Assurance in DevOps"

Transcription

1 Continuous Validation An Approach to Production Assurance in DevOps Shailja Sehgal, Sr Principal Quality Engineer Karanam Ramadevi, Sr Principal Software Engineer Manhattan Associates Inc.

2 EXECUTIVE SUMMARY IT industry is going through an Inflection point where majority of companies are looking for ways to bring innovation faster to their customers and provide them the competitive advantage. One of the most popular trends that is evolving is re-architecting their large monolithic enterprise applications (esp. in Retail & ecommerce) and adopting micro services architecture in order to enable Continuous delivery, and faster time to market. This architecture along with Cloud/SaaS based offering helps achieve high availability, easy upgradability and deployability. One recent research shows 66% of these companies is adopting DevOps model and there is fundamental change in testing paradigm where Developer/SDETs are merged into testing cycle right at the beginning to implement concepts like Shift left, TDD, early and continuous automation etc. Some of the testing practices evolving in this space are: Unit Level Contract based Integration within containers While there are many benefits of going in this direction but at the same time it is posing basic challenges like End to end product view can be minimized Balance between test as specified vis-à-vis test as expected Balance between speed of delivery vis-à-vis quality Another important aspect which organizations are grappling with is issues leaked to production gets visible fairly quickly and hence it has impact in the overall positioning and brand value of the organization. To achieve faster time to market, the traditional way of doing mammoth system testing cycle approach to stabilize a product will not work anymore. Hence, in addition, to continuous delivery and continuous integration practices, organizations need to adopt a more leaner and flexible continuous validation approach. Also, this approach needs to bring in customer perspective fairly early in the cycle. In US, 40% of top retailers are Manhattan Associate s customer and our supply chain solutions forms the heart of their revenue generation stream and forms the key differentiator to serve their customers. Hence it is imperative that our continuous validation approach has our customer at the center. And that is what the paper talks about how we simulate/mimic A Live automated Model company by running a 24X7 in production like environment where the target is to Use the product like our customer does. Focus is on mimicking real end to end key customer operations covering customer data, profile, volume, geographies, demographics etc. In parallel, we have a live monitoring system focusing on outages, leakages, system anomalies, deadlocks etc. The clear focus is we

3 need to move from reactive mode of response to a proactive mode to ensure our customers uptime, scalability and availability is adhered to. BACKGROUND OBJECTIVE/CHALLENGE To understand the background, given below is the snapshot of product for which we are building this tool. Manhattan Active Omni offered so enterprises can better compete in a world that prioritizes prices, product availability and speed. Manhattan Active HQ Everyone has the same access and visibility for customer transactions across all touchpoints with a real-time view of perpetual inventory across fulfillment locations, including in-transit, on-order, and third-party owned/fulfilled inventory. Omnichannel Customer Service Available to Commerce Enterprise Inventory Distributed Selling Adaptive Network Fulfillment Manhattan Active Store Help associates sell better, provide cross-channel customer service, manage inventory and serve as a fulfillment specialist. Point-of-sale (POS) and robust clienteling capabilities make the shopping experience a more memorable and convenient one for customers. Point of Sale Clienteling Store Fulfillment Store Inventory See more at: With the microservices based architecture along with DevOps poses following challenges: Focus is more on component development rather than holistic product. Agile Test pyramid focuses 90% on unit and integration testing and 10% on endend business scenarios with this horizontal view of product is lost. Longer system testing cycles can t be carried out as time to market needs to be faster Component up time needs to be 99%

4 TARGET As mentioned above, 40% of key retailers in US are Manhattan s customers. So our target was to design a 24X7 Live Model Company by simulating real customer business scenarios with key focus in mind to become the first customer of our product thereby Use the product(s) as a customer does Adopt testing strategies to a continuous delivery model Automated configuration(s) with robust production like monitoring Following is the model company that we have designed to perform 24x7 production like operations. Key components of the model company design are: Active OMNI solutions Simulated Host systems ( similar to standard ERP systems) Simulated Order capturing system (ecom systems) 3 rd Party Integrations ( Tax verificaiton system, Address verification and payment transaction) Out of box integration with other Manhattan suite of products (like Warehouse management, Inventory and forecasting systems etc.) For this model company, here is the approach that we took to automate retails operations across various OMNI channels like ecommerce Call Center Point of Sale

5 The solution covering Day Begin, Day IN and Day end operations completely automated (as shown below) similar to what our customers will do given below is a simplistic representation of the customer landscape. Along with automation, we have also orchestrated our customer demographics and data volumes for these business scenarios for Day in life for OMNI covering A regular business day was orchestrated with below volume numbers 42,000 unique items 3500 orders across channels, like e-com, Call Center, Point of Sale 100 stores across geographies and across various time zone 1 Distribution center and 1 return center for fulfilling the orders. Please note for retailers, in addition to the regular sale day, there are events where a significant revenue is generated like Cyber Sale, Thanks Giving weekend etc. where system availability and scalability is not negotiable. Hence we needed to get our solution subjected to this mock events and our tool is highly configurable to set these up within a short time. To highlight the difference, here are some numbers to put things into perspective: An order volume of 50K per hour Across 1200 stores across geographies across time zones With a good volume of inventory 60MM* for 60K items across store

6 Testing Methodology The testing methodologies adopted was multi-fold, which has been briefly described here under: We started with Understanding customer s business and their primary flows for which we selected our top Tier retail customers and detailed analysis of their business processes, data profiles, demographics along with their current implementation As a next step, we designed business scenarios, which should cover 80% of our top 20% customer s daily business operations. Built a robust automated system with end to end business operations covering end to end business scenarios in a Day in life of an retail customer with focus on: o Continuous system running 24x7 and should simulate a production like environment o Simulate/Integrate 3 rd Party /(and/or) Manhattan systems to the model o Inject incremental load into the model periodically o Bring in randomization logic to build unpredictability These business operations were then time sliced as different scheduled jobs to simulate daily business operations right from Inventory operations to order capturing through shipping(as shown below) that happens in one day Designed and executed Peak day scenario to cover events like CyberSale, BigBillion days where system was subjected to very high volume of 50K orders in a day as against a regular load of 3500 orders in a day. Live Monitoring Systems While the business operations are in progress, in a SaaS model, there is a mandatory requirement to monitor the systems at different levels. In addition to running the live system our solution focusses on this aspect to a great extent. Some critical monitors are o Infrastructure level monitoring to ensure optimal usage of system resources and act as an early alert system e.g. CPU usage, I/O, network, disk usage, communication queues etc.

7 o Business critical KPIs to ensure the business health is in good shape else trigger corrective action e.g. order inflow vis-à-vis outflow, unexpected situations at fulfilment centers etc. Set-up up Mock Stores to mimic real store operation with key focus on Nontraditional validation of mobile applications under different wifi, low connectivity like conditions Role based validation of various customer personas covering operation done by Store manager and Store Associate Once all the validation is done from DevOps pipeline, the new components are upgraded periodically, as will be the case in real production, to this environment to ensure no breaks in business operations before it can be upgraded to Live Production system. Some of the critical decision factors for upgrade to production are: Seamless execution of business operations in this environment for a defined period of time Ensure backward compatibility with new features No impact of implicit system behavior e.g. performance, scalability, security etc. The findings of the systems becomes the critical factor for the upgrade decision factor and development teams are expected to ensure this system is running 24X7 as is the case needed in real production set up. MEASURABLE IMPACT Following are some of key impacts that our tool has brought in: The 24X7 simulated systems acted as a constant Quality enablers as it helped us in achieving: o Continuous uptime validation of business operations covering 70% of daily business operations o Simulated outage scenarios which helped us in ensuring uptime is 99% Improvement in upgradability from 50-90% Actionable Analytics: Through constant monitoring, we started measuring $ value impact internally e.g if the inflow of order per hour is 2000 and average value of an order for our retailer is $50, there is a direct impact of 10000$ per retailer per hour. Enhances Customer Experience helped us to assess our systems in terms of Production Readiness Helped in achieving scalability by designing scenarios and volume data for Peak season covering Cyber Sale Helped identifying issues like performance, failure tolerance, and user experience, new and unexpected usage scenarios

8 Key release decision making as these automated scenarios must run continuously for at least 5 days in order to promote a build to production Author Biography Shailja Sehgal is a Sr Princicpal QE in the R&D organization at Manhattan Associates India She has 14+ years of strong and proven expertise in Mobility, Supply Chain Retail, OMNI channel, Automation (Jmeter, API Integration, SeeTest for mobility). Active member of Mobile CoE and partnering with other organizations for sharing best practices on Mobility. Expertise in QA testing for various supply chain products. Good understanding and exposure of JMeter, REST Api, Selenium, SeeTest, TestNG, and RDBMS and worked on different databases like Oracle, DB2, SQL Server.Worked on integration and functional, user interface, regression testing and smoke testing. Expertise in defect tracking and bug reporting tools. Actively contributing to external forums by presenting white papers and formal presentations. Karanam Ramadevi is Sr Princicpal QE in the R&D organization at Manhattan Associates India She has 13+ years of strong and proven expertise in Manual and Automation Testing on Supply Chain Retail, OMNI channel (Cloud platform), Automation(Selenium, Java & JMeter and API Integration). Expertise in QA testing for various supply chain products. Good understanding and exposure of Java, JMeter, REST Api, Selenium, TestNG, Jenkins and RDBMS and worked on different databases like Oracle, MySQL, SQL Server. Worked on integration and functional, user interface, regression testing and smoke testing. Expertise in defect tracking and bug reporting tools. Actively contributing to external forums by presenting white papers. Strong expertise on Product Quality Assurance, product testing and Process Engineering

9 THANK YOU!