by Karlo Zatylny, SolarWinds
Below are my notes from the meeting – they are somewhat scattered due to the jumping around of the conversation:
Feb 15 2017 Notes (Karlo Zatylny – SolarWinds)
- Open Traffic Monitoring Format
- Where to focus traffic analytics?
- WAN
- SDWAN is becoming more popular but should not dominate our methodology
- LAN
- Data center
- Enterprise
- WAN
- How do we use different technologies to provide insight into the application
- Flow, interface util etc. need to tie back to an overall metric that focuses on the performance of the target application infrastructure
- How do the statistics allow the person monitoring the network allow them to be proactive in debugging troubles
- Should know the problem before the user calls
- Should be able to describe the impact zone
- Define a framework that focuses on the application and is agnostic to network infrastructure (LAN, WAN, Hybrid, data center)
- Analysis should apply to each layer from carrier to enterprise to SMB
- How do different layers of networks interact with monitoring and analytics
- How does the enterprise talk a common analytics language to the carrier and use common analytics terms and methods to describe accurately issues that cross from one network into another
- Is there a way for carriers and customers to communicate without disclosing information that is closed to the public
- Does analytics play a role in being able to describe a language to facilitate network problems across networks?
- What is the output of ONUG M&A group?
- Use cases?
- Framework?
- What are the common threads and technologies used today that can be used for analytics?
- Use term “service centric” as opposed to “application” so that we include technology like LDAP and other services that are not necessarily tied to a single end user application
- If I am troubleshooting:
- What is the information needed?
- What analysis could be done on that information to give better insight to drive to action?
- Are there historical issues that share similar traits that could be used for analysis?
- Output should be in the format:
- Problem Statement
- Use Cases
- Data available
- Analytics possible
- Desired plausible actions
- Cause – Resolution description
- How do we progress from “my application is slow/broken” to finding root cause, to identifying a resolution?
- Understand application/service topology and dependencies
- Understand what metrics describe specific applications and their behavior
- What analytics can point to anomalous behavior?
- No need to distinguish hardware vs software metrics but their relationships need to be understood
- What are symptoms of common issues due to common configuration issues
- What does BGP flapping often manifest as?
- How does route configuration issues often have symptoms of?
- How can we be predictive knowing when the green is about to go to yellow?
- Can predictive analytics forecast when a green status is about to move to yellow/red?
- How do different levels of depth play a role?
- Response time à NetFlow à DPI
- Can we create tools and protocols to examine real time tracing of an application?
- This would segment the problem
- With a view on the future, how do we help mold the future tools with the right monitoring and analytics that yields a result that gives the user a method for identifying, troubleshooting, and fixing service issues?
- TODO Action before March 1st: Come up with a document that is a skeleton document that can be used to collaborate and unify the working set of ideas
- Put something out there for everyone to comment
- What is the format that we need to use?
- TODO Action before March 1st: Write use cases: Need volunteers for writing use cases:
- Please fill in your name and specific area. I heard one person on the call but don’t know who it was
- Format: use case, specific problem statement, associated man hours
- Areas of focus
- Datacenter
- SDWAN
- Well documented use cases for presentation in April
- How to collaborate moving forward