OCP Summit 2025: SRv6 for AI Backend

At the OCP Global Summit 2025 (opencompute.org), Changrong Wu and Abhishek Dosi from Microsoft presented “SRv6 for AI Backend.”

In Ethernet-based AI backend networks, routing has become a significant challenge because traditional BGP+ECMP schemes can no longer meet the unprecedented and stringent communication requirements of AI training jobs. We innovatively use Segment Routing over IPv6 (SRv6), originally designed for traffic engineering across wide-area network, in our AI backend networks to provide fine-grained network path control, maximizing network utilization and delivering excellent fabric resiliency. In this presentation, we aim to showcase our methodology for using SRv6 in AI backend networks, focusing on how we can use SRv6 to enable continental-scale GPU AI clusters.

 

Slides

 

Video