Service Assurance in Hybrid Cloud at an Affordable TCO

During the spring 2017 ONUG general meeting, at the conclusion of the Monitoring and Analytics (M&A) panel, a member of the audience commented, “We are already collecting a lot of data; and, you are telling us to collect even more?” His point was well understood by the panel members and the audience:  there is a total cost of ownership associate with all that collection that is spiraling out of control.  

The M&A’s working group guidance was and remains simple at its core:  if you are responsible for a service (e.g., web transactions, VoIP, Virtual Desktop Interface (VDI), streaming media, SaaS, or batch processing), you must have a way of verifying its availability and performance such as through synthetic testing or wire data.  And, you can further benefit from monitoring the Key Performance Indicators (KPIs) for each component of the service chain that comes together to deliver that service.

The KPIs are different, not only for each component of the service chain, but also for different classes of technology.  In the legacy world, the KPIs for infrastructure might have started with the “golden Signals” of utilization, saturation, and error rates for CPU and memory, while for the network and storage they may also have included throughput and latency as well as for a breakdown of the nodes and applications behind the utilization and errors.  Yet, for Unified Communications and Collaboration (UC&C) applications, VDI, and streaming media, Jitter and latency become important.  Application containers such as J2EE, Databases, and hypervisors have their own KPIs and so on.  The outcome is monitoring solutions that are a myriad of domain or silo specific tools.

A day earlier Paul Silverstein, of Cowen and Company, in response to another question during the “On Premises vs. Off Premises IT: Where Should Investment Dollars” panel had reminded the audience that capital expense is only about 20% of the Total Cost of Ownership (TCO).  The remaining 80% of the cost has to do with the proper architectural planning and alignment of tools, hardware and storage, implementation and periodic upgrades, and most prominently, that of skilled staffing levels in multiple domains to operate the tools and analyze their data. The largest components can be characterized as domain specific tools and domain specific skilled staffing levels.  The number of technologies one operates has a multiplier effect here.

When it comes to the cloud, private or public, the numbers of environments you operate also has a multiplier effect.  Monitoring on-premises is different from that in the public cloud.  And, what works in one public cloud often does not in others.  For example, AWS prescribes monitoring through a combination six of their own services:  CloudWatch, CloudWatchLogs, CloudTrail, Config, X-Ray, and Trusted Advisor, while denying any access to hypervisor and network underlay.  Azure monitoring strategy is completely different and based on Microsoft products:  OMS and SCOM. And, yet Google Cloud has a different management scheme and so on.  

According to IDC and 451 Research, a majority of companies with assets in the cloud pursue a multi-cloud strategy to meet their fiduciary responsibility.  Lack of consistency in monitoring tools and workflows between cloud providers is the reason for this multiplier of monitoring OpEx cost effect.  The unfortunate reality is that this is happening against a backdrop of a massive shortage of skilled staffing in the industry, which further exacerbates the situation and in some cases acts as a barrier to adoption.

The next reason is the rapid adoption of new technologies driven by DevOps movement; cloud 2.0, which include microservices, container services and containers that could have very short life spans; orchestration services; noSQL databases; and newer programming languages that APM vendors don’t have any solutions for; to name a few.  They each present many benefits but still no business application can be deployed without monitoring and visibility.  Hence, a slew of startups, or worse, having to code your own solution.

I believe the foundation of one’s monitoring strategy starts long before a tools discussion on the data sources we leverage, and it should take into account the economic impact on the solution and ultimately its sustainability.  Wire data is uniquely positioned to simplify the hybrid, multi-cloud monitoring strategy and in the process not only reduce tools sprawl and TCO but also alleviate the skilled staffing shortage necessary to monitor these new environments and technologies.  

All users, applications, and application service chain components communicate with each other over the network using well defined protocols. Since these protocols have to account for not only the health and congestion of the network but also the availability and health of end nodes, their headers carry signals about users, servers, and the network.  The packet payload offers information about the health and performance of the transactions across the board including web applications, UC&C, VDI, streaming media, and so on.  Wire data can be used to monitor across all your technologies.  

Users and application components still communicate over the network regardless of which cloud or container service.  Wire data has shown to have tremendous staying power.  When was the last time TCP/IP changed as compared to development technologies?  It is also responsive in the sense that decodes for new applications can be developed fairly quickly.  Lastly, one needs no retraining to apply wire data workflows to the cloud or container.  However, for this to work, the wire data solution should embody the following characteristics:

  • Must be ubiquitously available across legacy, private cloud, public multi-cloud, and container services;
  • Must be presented via the same user interface and workflows across all environments;
  • Must be easy to consume by distilling voluminous network data into actionable intelligence;
  • Do all the above in an affordable fashion.

When it comes to monitoring the enterprise there are no silver bullets.  Your monitoring strategy will likely also include other components such as Infrastructure Performance Management (IPM) and service availability monitoring tools.  Always bear in mind the impact of the right data source on your TCO and skilled staffing level requirements.

Author's Bio

Babak Roushanaee

Director, Enterprise Business Operations at NetScout

Babak Roushanaee is Director, Enterprise Technology Operations at NetScout and has more than 25 years of industry and technology experience across enterprise and service provider markets with a strong background in service delivery, service assurance, performance management, security, enterprise networking transformation projects. He has consulted for a number of Fortune 500 companies in the area of service assurance and enterprise management and is currently focused on technology strategy at NETSCOUT.