by Sunay Tripathi
If you are building, operating, or enhancing a network in 2015, you are in a good position. The reason is: several industry trends are coalescing to provide the network buyer an opportunity to do more for less.
Several factors, including the rise of merchant silicon, hardware commoditization, disaggregation of hardware and software in the network, and emerging software-defined network (SDN) solutions enable a hyper scale datacenter to transform into a forward-looking enterprise to get more out of their IT and network investment.
Merchant Silicon and Fixed Port Leaf/Spine Economics
A vast majority of the server industry has standardized on merchant silicon x86 CPUs from Intel and AMD (with few notable exceptions, such as the IBM z13 mainframe CPU). Similar trends occurred with the desktop as well – mostly, you run some sort of x86 with either an Apple, Microsoft, or in some limited cases Linux OS. While the commoditization of the desktop and server were hugely disruptive to the marketplace, it benefitted the IT buyer and industry at large.
Similar trends have let network switching turn to merchant silicon as well, with the switching from Broadcom (Trident/Tomahawk) and Intel (Bali/Alta). Much of the driving force behind this migration to merchant silicon CPUs has been complexity and time to market – with 5.57 billion transistors in a Xeon E5-2600 spinning a new CPU. In fact, this has become an effort of such magnitude that few in the industry can do it at all, much less do it cost effectively and on a competitive timeline. Given that switching silicon has surpassed the server CPU with 7 billion transistors (Broadcom Trident), it is easy to see how few could effectively keep up.
For most, it has become faster (and better) to roll with merchant silicon solutions. Even basic, modern switch chips like the Trident or Alta come in port densities of around 64x10g port (most common form factor is 48x10g + 4x40g) – adequate to drive a rack that typically has 20-40 servers, leaving 4x40g for uplink. Newer chips from Broadcom like the Trident2+ come in 32x40g form factors, with the Tomahawk clocking in at 32x100g – configurations work very well as spine switches.
From a network buyer’s perspective, this means that many recent switch offerings, particularly those for Top of Rack or Leaf/Spine deployments, ship on relatively similar hardware with differentiation in areas like software and customer support. Pricing is easing downward, while overall capability is moving upward. Keep in mind that just because the hardware is largely the same, it does not mean that the end user experience need be. Consider how the Apple, Windows, and Linux laptops, all powered by the same CPU, have vastly different UI, support and out-of-the-box experiences.
Hardware Commoditization, Software Disaggregation, and White Box vs. Brite Box
Hardware commoditization can be said to be a large-scale extension of the rise of merchant silicon. In a way, the evolution of the network can be seen as a recapitulation of the move from mainframes to x86. With the mainframe in the past, you got all your hardware, accessories, software, applications, training, and support from the same vendor. While it may have kept things simpler under the “one throat to choke” doctrine, it also kept things relatively static and inflexible.
Network operating systems used to be shipped almost exclusively with dedicated hardware. Now, there is a small but growing trend where network hardware, such as OCP switching, is available at commodity pricing while the choice of operating system is left to the network buyer, an approach sometimes called “white box networking.” While hyper scale/web scale companies with large and highly skilled networking groups may be comfortable with such approaches, few standard enterprises would be.
We’ve already seen something like this play out with Linux on the server front. One of the barriers to adoption for widespread commercial Linux deployment in the enterprise was the lack of enterprise-class support. Once such support was available, either via server makers or Linux distribution providers, CIOs and other enterprise IT decision makers grew much more comfortable buying Linux-powered solutions. Now, the once revolutionary concept of using an open source operating system on an enterprise server is pretty standard.
Similarly, a model that includes white box-type hardware, a network operating system, and enterprise class support is emerging, called brite box by some or branded white box switching. For some, this may be a useful compromise as it allows the network buyer to enjoy the competitive pricing of white box switching with a choice in network operating system plus the warrantees and support typically needed for more widespread enterprise adoption. Another approach is for a vendor to create an OS based on open source that is packaged with off-the-shelf server and switch components with support options. Regardless of the variations, increasingly-capable hardware is coming to market at highly competitive pricing with either open source or solutions built at least partially on open source. The bottom line is these new solutions bring new capabilities, such as true server-type programmability, to the network.
Scaling Beyond TCAM
Moving from the world of servers to the world of switches opens up new opportunities, such as TCAM. TCAM, ternary content addressable memory, is both very fast and expensive. However, its speed makes it ideal for use in switches.
In the past, the amount of TCAM/CAM in a switch directly dictated the number of MAC addresses that a switch could handle, besides the size of tables that affected the scalability. Performance was impaired once TCAM ran out.
Building a switch with a very fast control plane, and mapping TCAM/CAM entries into the kernel of the network OS is one way around this problem. This enables TCAM to be used as fast cache, enabling inexpensive RAM in the system to determine table sizes and address million of MACs or more.
Another opportunity presents itself in the form of the VTEP – virtual tunnel end point. With the advent of virtualized overlay networks, one of the things we have seen is that it is increasingly common for tunnel endpoints, typically servers, to take on the compute burden created by virtual tunnels. While the calculations required by network tunnels are relatively expensive to run on generalized x86 compute, they are relatively easy for networking hardware, making it possible to improve overall system performance by off-loading tunneling to the network, and relieving end points of this burden. Network buyers, particularly those for whom L2 scalability is a concern, may want to ask potential vendors about how they handle TCAM.
SDN, OpenFlow and a Distributed Architecture
SDN, software defined networking, seems to mean different things to different people, and it is still fairly common for people to believe that the only way to implement SDN is OpenFlow. SDN implies separation of the control and data planes while some will highlight the need for a centralized control plane as well.
OpenFlow is a protocol that describes how to implement a network where the control and the data planes are separate. A fundamental architectural tenant in reliable IT systems avoids a central point of failure.
Implementing a distributed network OS is one way around it. Have an instance of the OS running on each and every switch, and have those switches connected together in a fabric-cluster, sharing visibility into the network. Don’t put all your command and control in a single place, but distribute it across the network to manage it as a single logical entity.
With a distributed OS running on server class compute, distributed across the entire network, you gain fairly sophisticated abilities to see into the traffic that is on the network. When building this OS, you would want to include open APIs – RESTful APIs, Python, Java and while you are at it, you may as well make it play well with well known, popular orchestration and management tools like Puppet and Chef and of course, OpenStack.
When you combine deep visibility into network traffic with open APIs and the ability to run server class applications on the network itself, you then have the ability to do interesting things with the network. You could run sophisticated analytics, where applications can now not only be network-aware, but provision themselves if needed. You potentially have visibility into the health and performance of applications on the network in addition to visibility into the performance between the user and the network itself – whether it is underlay or overlay. Application performance troubleshooting now becomes more of a science powered by data, rather than guessing and intuition.
Network buyers should look for open APIs, and if OpenStack is part of your ongoing plans, look for Neutron plug-in support, which enables the network to be managed by OpenStack. We would also recommend that buyers avoid any single points-of-failure, including centralized controllers.
Enterprise vs Mega Scale Data Centers – Similarities and Differences
Both enterprises and MSDCs want the economics of SDN on merchant silicon, but how they utilize it is very different. Let’s look at the needs of the both:
- Enterprise data centers typically have complex environments with software from many vendors and limited network programming expertise. These customers tend to need broad feature/protocol support and high availability.
- Enterprises are also heavily virtualized, which means they also need heavy Layer 2 protocol support to deal with server-side virtual machines.
- The enterprise data center is generally divided into PODs based on application tiers, business boundaries where a POD is in general 10 racks (which at 40 servers per rack yields 400 servers). When multiplied by a VM density of 100 VMs per server, this works out to about 4000 VMs per POD.
- The MSDCs at very large companies such as Google and Facebook have much bigger scale, and a relatively small set of very complex home grown applications.
- The MSDCs have large and highly skilled teams of network programmers and engineers who are used to building their own custom solutions to meet their scaling needs.
For the normal data center, a 2 spine, 2-24 leaf layer 2 POD, which connects into a Layer 3 core network is generally a very economical design (as shown above). Meanwhile, large MSDCs typically need a much heavier fan-out, and will need to cover many more racks (typically in the order of hundreds to few thousand). Since Layer 2 high availability or fan-out is generally limited to 2 or 4 switches, the MSDC have to go to Layer 3 at the top of the rack and use BGP/ECMP to create a much larger protocol fabric.
Network buyers should consider not only factors like the architecture and scale of their networks, but also the size and nature of their staff.
What about Brownfield Networks?
An encouraging feature in some SDN rollouts is the flexibility to preserve the significant investment in an existing network, particularly the expensive gear at the core. One such approach involves upgrading the ToR or Leaf/Spine switches. Use of a L2 fabric in smaller deployments and an IP fabric in larger implementations enables IT organizations to manage the network, providing a single point of management while extending powerful segmentation, analytics, and other capabilities. While the network refresh cycle is a matter of some debate, what is clear for network buyers is that the useful lifespan of a network in place can be extended by a relatively inexpensive upgrade of leaf/spine switching which can in turn bring many of the advantages of SDN to even legacy brownfield networks. On a larger scale, the network buyer should look to an approach where network hardware refresh cycles become longer, with functional upgrades via software at one or more points during those life cycles.
I am looking forward to a lively discussion of these and other related topics at my ONUG tutorial.
Founder and CTO, Pluribus Networks Inc.
Sunay is the Founder and CTO of Pluribus Networks Inc and was the Sr. Distinguished Engineer and Chief Architect for Kernel/Network Virtualization in Core Solaris OS. Sunay has an extensive 21+ years of software background, and was one of the top code contributor to OpenSolaris. He also holds over 100+ patents encompassing Network and Server virtualization including Virtual Switching and H/W based Virtual NICs in the server. At Pluribus Networks, Sunay is creating the Distributed Network Operating System, Netvisor, to run on switches to brings server economics, programmability and innovation to the switching layer.