Just over 20 years ago, Ronco Inventions started running infomercials for their Showtime Rotisserie BBQ Oven on shopping channels. The infomercial promoted the oven with a simple tag line: “Set it and forget it”, which became a pop culture reference.
While this has been a very successful approach for the WAN, and we see similar approaches in other technology domains—independently, these isolated approaches can’t always provide the full picture that network admins require on a daily basis. For example, when a user calls into the help desk with the complaint that his/her Zoom session is artifacting, the network admin may only be able to see an isolated view or the customer issue. When the customer issue comes in, a lengthy process of fault isolation starts. Is it a laptop issue? The Wi-Fi? The WAN? A peering issue? An application problem? Troubleshooting the issue in this way is a process of elimination and uses up valuable time.
The Old Approach to Visibility and Correlation
In the recent past, the preferred approach to bring end-to-end visibility and correlation to pinpointing networking issues started with forming a data lake that captured and correlated all available data. This approach had limited success due to the high cost of maintaining this warehouse and—at that time—the limited machine learning (ML) and artificial intelligence (AI) capabilities that could extract meaningful recommendations on where to look first.
A New Approach to Data Correlation
VMware is very proud to participate in the ONUG AIOps working group that aims to advance an industry standard approach to data correlation, where multiple vendors can contribute technology stacks that allow an AI engine to make several suggestions, coupled with a confidence interval on the possible root cause of the misbehaving Zoom call that was reported.
This approach steps away from the use of data lakes and instead uses data available in domain orchestrators, while continuing to correlate it with other vendor data. The idea is not to build a new library with all information, but instead to hire a detective that can synthesize relevant data from the authoritative places where the data resides. It all starts from the questions that arise in the network admin’s mind when the help desk call comes in:
Getting data from all these sources in a data lake would be an enormous task, but if we can leave the data in place and correlate it when needed to answer a specific question, it becomes a more manageable task that can be accomplished without having to set up a data warehouse. This can be done rapidly to provide insight into recent events and narrow the scope of the investigation.
Even when simply recommending areas for further investigation, this approach greatly reduce the time network and application administrators spend in diagnosing reported issues. It also allows on-demand data gathering when key performance metrics deviate from the norm and proactive troubleshooting is warranted.
The ONUG AIOps working group is combining data from Mist access points and VMware SD-WAN through an AtScale semantic layer that can easily correlate data on demand from the existing domain orchestrators. The group is aiming to standardize the way data is represented, stored and accessed to facilitate rapid correlation. The correlated data becomes a source for an AI engine to suggest what the root cause of detected performance degradations or outages in the network can be to reduce the time needed to restore service.
As the engines become more accurate, a closed loop system can emerge that can self-heal the network based on past experiences, truly getting back to the ‘set it and forget it’ concept.