Next-Gen Data Centre Networking – Built for AI, Powered by AI

Artificial Intelligence/Machine Learning (AI/ML) is creating new opportunities for businesses and consumers. Indeed, PwC, a major consultancy, expects AI to contribute over $15.7 trillion to the global economy by 2030. At the same time, within the data centre, a move to micro-services-based software architectures, distributed storage, and Artificial Intelligence/Machine Learning (AI/ML) workloads are pushing east-west traffic to previously unseen heights.

Research predicts that the AI adoption rate will reach 86% by 2025. This will also have a big impact on the data centre network, when AI/ML applications will drive the data centre towards the AI era from the cloud era.

Application Workload Trends with AI/ML

Consumer and business applications continue to evolve driving change in the data centre. Video and media-rich content make up the bulk of traffic on the Internet today, taking up over three-quarters of all IP traffic. Streaming services as well as social and mobile applications embedding more interactive content will ensure the continued growth of media over north-south data lanes in data centers.

Meanwhile, the rise of 5G, IoT and web-based application consumption will drive a significant amount of API and transactional traffic over those same paths. These are also creating new challenges for data center networks, compounding high volumes of east-west traffic already generated by sources including media and streaming, data analytics and distributed storage.

New Requirements for data center networking

Data center networks need to reach new levels of performance for these new workloads and challenges.

First and foremost, to adequately support the scale and demand of these new workloads, DCNs are adopting new technologies including higher-speed connectivity switches (deployment of 400GbE) and HW NIC accelerator offloads. DCNs are also facilitating scale and manageability through software-defined networking (SDN), automation, and intent-based management and orchestration systems simultaneously, we’re seeing early limited adoption of highly-programmable merchant silicon with support for programming constructs like P4.In addition, automated tools for fault detection and troubleshooting are starting to innovate and mature in the DCN space.

AI Fabric Intelligent and Lossless Data Center Network Requirements

AI Fabric Intelligent and Lossless Data Center Network should provide zero packet loss, low latency and high throughput network performance that improving performance under AI and machine-learning workloads as well as distributed storage solutions.

AI Fabric should utilize remote direct memory access (RDMA) over converged Ethernet, also known as ROCE. By combining a high-speed fabric with more intelligent switch management, Huawei thinks that Ethernet will be the foundation of future data centers powering AI/ML applications.

The AI Fabric solution has been validated by EANTC, an internationally recognized test center in Europe. The EANTC tests validated the network performance of the switches using RoCEv2. As a result, AI Fabric demonstrated a reduction in internode communications latency of up to 40% in the HPC test scenario and in the DFS tests, the fabric showed a 25% improvement in overall IOPS. This was completed without any packet loss and with high throughput.

 

Author's Bio

George Zhao

Director, OSS & Ecosystem, America Research Center, Huawei Technologies Co., Ltd.