Wikis > Traffic Monitoring/Visibility Use Case Working Group Wiki

Notice: The views and opinions expressed here are collective derived from the members of this ONUG working group and are not the express opinion of any individual or company. 

SPRING 2016

ONUG Use Case Working Groups Schedule: Traffic Monitoring/Visibility
Date Time
December 14, 2015 3:30-4:30pm ET Conference Call  
January 11, 2016 3:30-4:30pm ET Conference Call  
February 8, 2016 3:30-4:30pm ET Conference Call  
March 14, 2016 3:30-4:30pm ET Conference Call  
April 11, 2016 3:30-4:30pm ET Conference Call  
May 9-11, 2016 TBD ONUG Spring 2016 – Face-to-Face
Working Group Chairman
Aryo Kresnadi (FedEx)
Members
Howard Wang Bank of America
Amiri Gonzalez Express Scripts
Travis Griffin FedEx
James Noble Gaikai
Mike Gemza GE Oil & Gas
Jongwon Kim GIST
Silke Lohser Independent Consultant
Jem Pagan JNK Securities
Pete Kwon Legacy Texas Bank
Bob Natale MITRE
Hailu Meng Monsanto
Tony Chau Morgan Stanley
Brian Ong Navy Federal Credit Union
Kenneth Miller Penn State
Nick Feamster Princeton
Steve Lafrentz Principal Financial Group
Azharul Mannan Standard Chartered Bank
Mahesh Desai Tesla Motors
Sumanth Bopanna Thomson Reuters
Sean Wang UBC
Chris Parchinski Unilever
Mehrdad Moradi University of Michigan
Rahul Godha Unifersity of Wisconsin-Madison
Zakir Mohideen Visa
Fred Lima Visa
Phil Hattwick Wellington Management
Aziz Abdul Cisco
Ravi Rajamani Cisco
Raj Gulani Citrix
Mukhtiar Shaikh CloudGenix
Matt Gwyther FatPipe
Sanch Datta FatPipe
Craig Mills HP
Dave Hawley HP
Ted Qian HP
Sunay Tripathi Pluribus
Marco Pessi Pluribus
Charles Kaplan Riverbed
Hansang Bae Riverbed
Charles Kaplan Riverbed
Matthew Williamson vArmour

Meeting Notes:

Traffic Monitoring/Visibility White Paper

Traffic Monitoring/Visibility Top Ten Requirements

Traffic Monitoring/Visibility Test Plan

ONUG Fall 2015 Traffic Monitoring/Visibility Presentation

December 14, 2015 – Meeting Presentation

February 2, 2016 – Meeting Presentation

February 22, 2016 – Meeting Presentation

March 8, 2016 – Meeting Presentation

*Format Requirement Slides


FALL 2015

ONUG Use Case Working Groups Schedule: Traffic Monitoring/Visibility
Date Time
July 9, 2015 4:00-5:00pm ET Conference Call  
August 11, 2015 4:00-5:00pm ET Conference Call  
September 9, 2015 4:00-5:00pm ET Conference Call  
September 21, 2015 4:00-5:00pm ET Conference Call  
October 6, 2015 4:00-5:00pm ET Conference Call  
October 22, 2015 4:00-5:00pm ET Conference Call  
November 3-5, 2015 TBD ONUG Fall 2015 – Face-to-Face
Working Group Chairman
Aryo Kresnadi
Conference Bridge # ONUG Webex
Members
Howard Wang Bank of America
Travis Griffin FedEx
Silke Lohser Independent Consultant
David White Institute of Technology Tallaght Dublin
Jem Pagan JNK Securities
Bob Natale MITRE
Kenneth Miller Penn State
Nick Feamster Princeton
Sumanth Bopanna Thomson Reuters
Sean Wang UBC
Zakir Mohideen Visa
Phil Hattwick Wellington Management
Mukhtiar Shaikh CloudGenix
Matt Gwyther FatPipe
Sanch Datta FatPipe
Dominic Cafarelli Gigamon
Shehzad Merchant Gigamon
Cary Wright Ixia
Dario David Ixia
Glenn Chagnot Ixia
Michael Githens Ixia
Pierre Lynch Ixia
Sunay Tripathi Pluribus
Marco Pessi Pluribus
Hansang Bae Riverbed

ONUG Traffic Monitoring/Visibility Use Case DRAFT

Two side bars:

Definition of Open Networking  Open networking is a suite of interoperable software and/or hardware that delivers choice and design options to IT business leaders, service and cloud providers. At its core, open networking is the separation or decoupling of specialized network hardware and software – all in an effort to give IT architects options in the way in which they choose to design, provision, and manage their networks. These technologies must be based on industry standards. The standards can be de facto as adopted by a large consortium of the vendor community, open in the sense that they are community based, or defined as standards by the prevailing standards bodies. Open networking hopes to deliver on two promises: 1) Decoupling of network hardware and software which mitigates vendor lock-in and shifts network architecture structure options to users, 2) Significant reduction of the total cost of ownership model, especially operational expense.

Open Networking User Group (ONUG)  ONUG is one of the largest industry user groups in the networking and storage sectors. Its board is made up exclusively of IT business leaders, with representation from Fidelity Investments, FedEx, Bank of America, UBS, Cigna, Pfizer, JPMorgan Chase, Citigroup, Credit Suisse, Gap and Symantec. The ONUG mission is to guide and accelerate the adoption of open networking solutions that meet user requirements as defined through use cases, proof of concepts, hackathons, and deployment examples to ensure open networking promises are kept. The ONUG community is led by IT business leaders and aims to drive industry dialogue to set the technology direction and agenda with vendors. To that end, ONUG hosts two major conferences per year where use cases are defined and members vote to establish a prioritized list of early adopter, open networking projects that communicate propensity to buy and budget development. The vendor community stages proof of concepts based upon ONUG Use Cases, while standards and open source organizations prioritize their initiatives and investments based upon them. ONUG also hosts user summits and smaller, regional user-focused Fireside Chat Meet-Ups through the year. ONUG defines six architectural areas that will open the networking industry and deliver choice and design options. To enable an open networking ecosystem, a common multivendor approach is necessary for the following six architecture components: 1) Device discovery, provisioning, and asset registration for physical and virtual devices, 2) Automated “no hands on keyboards” configuration and change management tools that align DevOps and NetOps, 3) A common controller and control protocol for both physical and virtual devices, 4) A baseline policy manager that communicates to the common controller for enforcement, 5) A mechanism for sharing (communicating or consuming) network state and a unified network state database that collects, at a minimum, MAC and IP address forwarding tables automatically, 6) Integrated monitoring of overlays and underlays. (To be formatted after editorial review – see descriptions in ONUG white paper)

 

ONUG Traffic Monitoring/Visibility Use Working Group

 

Network Traffic Monitoring/Visibility Product/RFI Requirements

Version 1.1

 

Scope:

The scope of this document is to provide a set of tactical and strategic requirements aimed at guiding enterprise organizations in their design and selection criteria for traffic monitoring and traffic visibility solutions. Targeting a complete open interaction between varying solutions, the requirements listed include ranges from baseline architecture through data collection and capturing capabilities to data output from collector interchanges.

In detail, this white paper will address the following:

  1.  Identify the problems for today’s enterprise network manager.
  2.  List current implementations and limitations.
  3.  Define the product requirement and expected baseline for collection and capturing of any network traffic.
  4. Outlook on requirements for a product, which allows for flexibility and growth to mature in parallel with industry requirements.

In addition to the above, the following document will highlight existing limitations of currently available solutions, including their implications, which should be considered in future enhancements and new solutions.

To define a limit of this white paper, the user group will only evaluate an Ethernet protocol-based solution. Additional network traffic types may be evaluated in future revisions.

Not included in scope are end network design nor considerations regarding:

  • Network security aspects,
  • Network monitoring appliances for storing/processing traffic itself, and
  • Network performance, trending and analysis.

Executive Summary

The expected outcome for the Traffic Monitoring/Visibility Use Case Working Group adoption and usage in the enterprise market can be summarized, but is not limited to meeting this set of ten product requirements for:

1. Commodity hardware based upon merchant silicon with either an open or propriety Switch Operating System (OS).

2. Granular filtering based on 5-tuple and/or even more, including application signatures, and Quality of Service (QoS) marking capability.

3. Capability to work with both underlay and overlay protocols, providing independent filtering on either, and/or correlate both traffic.

4. Process data without impact to production flow/processing flow (CPU/Memory/Bandwidth).

5. Horizontal scalability with the capability of resource management feedback.

6. The ability to locally process data and create traffic signaling/alerting, while executing defined traffic-based actions.

7. Interoperability between vendors: integration and output that will support data collection integration and Open Application Programming Interface (API) for access and management (in/out).

8. Capability of Packet De-duplication/Packet Slicing/Data Masking and Application Recognition, including Packet Organization.

9. The ability  to be Security and Compliance-aware.

10. Multilayer visibility between underlay and overlay protocol use for management/Service-Level Agreement (SLA), monitoring/alerting, troubleshooting and reporting capabilities.

Problem Statement:

Today, traffic usage of network infrastructure exponentially increases due to many factors. They include:

  • Implementation of more cost-effective unified IP-based storage.
  • Cloud orchestration capability, instantly creating computer resources.
  • The movement of virtual infrastructure freely without fear of layer 2 fault domain due to the implementation of new technology, such as, Ethernet fabric, or any software-defined networking overlay and underlay capability.
  • Big data implementation.

In short, the collection and capture of network infrastructure traffic data focusing on data network traffic analysis remains complex and costly.

In addition to the above challenge, information security has an increased need to collect the correct network traffic for security analysis, ranging from security threat analysis for any intrusion prevention mechanism, as well as real-time capability to assess any spread of unwanted traffic. This too, adds complexity.

Despite these factors, traffic visibility remains highly sought due to needs ranging from timely troubleshooting of performance issues, collecting applications analysis for capacity planning and forecasting capability that can be used for capital and operational expenditure planning. All of this relates to a requirement for better understanding of cost-effective network and infrastructure usage.

In order to provide the type of visibility above, most enterprise network management face many challenges, such as:

  1. Proliferation of network packet inspection traffic-capturing tools, including the support infrastructure that creates another reliability challenge, manageability issue, additional costs, including capital and operating cost.
  2. Sampling of traffic can lead to less accurate information.
  3. Data loss due to oversubscription (SPAN Port – Switched Port Analyzer – limitation).
  4. Technology limitations – SPAN port limitation, (Network) Taps.
  5. Bandwidth contention between production traffic and monitoring-related traffic.
  6. Lack of flexibility around and beyond 5-tuple/marking-based filtering.
  7. Security, Compliance and Control (PCI -Peripheral Component Interconnect, HIPAA – Health Insurance Portability and Accountability Act, PII – personally identifiable information, PAI – Privacy Act information) challenges due to requirements of capturing raw traffic data.
  8. Challenges around capturing Encrypted Data.

The limitations described above traditionally had direct impact on root cause identification and remediation during network incidents or network security related incidents and other capability mentioned above. Furthermore, the tooling lacked proactive management of network availability and performance forecasting methods.

In addition, as software-defined network (SDN) implementation grow, for successful implementations with overlay and underlay networks, any of the outlined limitations need to be addressed and resolved to varying degrees.

 

Current Environment:

Currently, at the enterprise level and most of the time, beyond, we summarize three typical implementations (with the hybrid explanation on #4) to capture/gain visibility of the traffic in the enterprise network:

1. Traffic capturing devices are installed at select key network points and mirror the port/Virtual Local Area Network (VLAN) on the production switches. The production traffic flow is copied to the collector port and, then, analyzed separately. This implementation usually introduces security concern, due to the distributed nature of the tooling, including the overhead of the individual mirror port network devices. The proliferation of the tools itself is creating a burden of cost, scalability and manageability for the network manager. In addition, oversubscription of the traffic will be still a challenge on this implementation.

 

Figure 1: Current Traffic Capture Approach

 

2. To answer the limitation and challenge of point #1, usually, the network manager will try to centralize the tools and then, create a separate physical network to centralize the mirror data capture by the user of local or remote mirroring capability on the network devices. While this answers the proliferation of the tools itself, the limitation of this approach includes the challenge of data loss due to oversubscription, and the lack of flexibility of the tools usage, due to the limitation of the current 5-tuple filtering capability and mirror/span port limitation.

 

Figure 2: Centralized Tools & Separate Network

 

3. To reduce the limitation of point #2, especially related to data loss and oversubscription, the network manager will implement point #3, where, at each junction of the needed traffic visibility, network/mirror taps will be inserted to fully capture the entire flow of traffic without the challenge of oversubscription. While this implementation answers the challenge of full flow capture, this, however, does not solve the challenge of taps monitoring and the decrease of network reliability. Often this implementation also introduces proprietary encapsulation methods, and/or creating the overhead of forcing encapsulation awareness.

 

Figure 3: Parallel Taps Installed

4. The last implementation is the hybrid of the above three implementations, where to work around the above limitation and challenges, multiple hybrid ways are implemented, including the usage of taps, span/mirror and distributed tooling. This implementation allows the reduction of oversubscription of the traffic captured, still keeping network stability and reliability in place, reducing the usage of proprietary encapsulation, and, without  proliferating tools deployment. This implementation, however, has a high complexity and support challenge, including the higher cost of both capital and operation.

 

Desired/Targeted Environment:

The desired target solution could  have a topology shown in the diagram below, or it could be a topology which is a combination of topologies shown in current environment diagrams 2 and 3.

Screenshot 2015-04-28 20.21.09

Figure 4: Desired ONUG Target Environment

 

Conventions

The following conventions are used throughout this document. The requirements that apply to the functionality of this document are specified using the following convention. Items that are REQUIRED (contain the words MUST or MUST NOT) will be labeled as [Rx]. Items that are RECOMMENDED (contain the words SHOULD or SHOULD NOT) will be labeled as [Dy]. Items that are OPTIONAL (contain the words MAY or OPTIONAL) will be labeled as [Oz].  In addition, a priority value of High (H), Medium (M) or Low (L) may be assigned to each item.

The priority will be labeled as [RHx], [DHy] or [OHz] for High priority, [RMx], [DMy] or [OMz] for Medium priority or [RLx], [DLy] or [OLz] for Low priority.  The integer values {x, y, z} shall be unique across the document but are not required to be unique across the 3-tuple set {x, y, z}.  For example, RM10 and DM10 are allowed whereas RM10 and RL10 are prohibited. Requirements in this document are numbered using increments of 10.

Where needed, related sub-requirements are numbered using increments of 1.  The priority assignments are defined as follows.

High (H):  Functionality that must be supported on day one and is critical for baseline deployment.

Medium (M):   Functionality that must be supported, but is not mandatory for initial baseline deployment.

Low (L): Desired functionality, which should be supported, but can be phased in as part of a longer-term solution evolution.

 

Product Requirements:

 

Baseline:

[RH10] Commodity hardware, 

[RH20] Granular filtering based on 5-tuple and/or even more., including application signatures, and QoS marking capability,

[RH30] Capability to work with both underlay and overlay protocols, providing independent filtering on either, and/or correlate both traffic,

[RH40] Process data without impact to production flow/processing flow (CPU/Memory/Bandwidth),

[RH50] Horizontal scalability with the capability of resource management feedback,

[RM60] Allow to run in-band and out-of-band (NetFlow vs. Deep Packet Inspection (DPI), Full DPI vs. Filter DPI, Green Field and Brownfield implementation),

[RM70] Integrate with current protocol such as NetFlow/sflow or Internet Protocol Flow Information Export (IPFIX),

[RM80] Provide flexibility to integrate with controller deployment,

[RM90] Role-based access control and Identify and Access Management (IAM) integration,

[RL100] Scalable across multiple infrastructure models (data center, distributed/branch, enterprise), and

[RL110] Both Virtual and Physical Form Factor.

 

Collection:

 

[RH120] Need to be able to locally process the data and create traffic signaling/alerting, while executing defined traffic-based actions, 

[RH130] Interoperability between vendors: integration and output that will support data collection integration,

[RH140] Capability of Packet De-duplication/Packet Slicing/Data Masking and Application Recognition, including Packet Organization,

[RH150] Needs to be Security and Compliance aware,

[RM160] Capability to integrate with the virtual compute information, such as software span on the virtual switch,

[RM170] Capability to prioritize highly important monitoring traffic, like applying QoS on the Span (– controlled oversubscription), and

[RL180] Capability to handle SPAN port/monitor port traffic separate from production traffic.

 

Processing/Presentation:

[RH190] Open API for access and management (in/out),

[RH200] Multilayer visibility between underlay and overlay protocol use for management/SLA, monitoring/alerting, troubleshooting and reporting capabilities,

[RM210] Common output to assure interoperability between vendors, and

[RH220] Provide Application visibility through Application Signature capability, including custom application signatures for homegrown applications.