Arista announces technology demonstration of AI data centres

Arista Networks has announced a technology demonstration of AI data centres in order to align compute and network domains as a single managed AI entity, in collaboration with NVIDIA.

In order to build optimal generative AI networks with lower job completion times, customers can configure, manage, and monitor AI clusters uniformly across key building blocks including networks, NICs, and servers. This demonstrates the first step in achieving a multi-vendor, interoperable ecosystem that enables control and coordination between AI networking and AI compute infrastructure.

As the size of AI clusters and large language models (LLMs) grows, the complexity and sheer volume of disparate parts of the puzzle grow apace. GPUs, NICs, switches, optics, and cables must all work together to form a holistic network. Customers need uniform controls between their AI servers hosting NICs and GPUs, and the AI network switches at different tiers. All these elements are reliant upon each other for proper AI job completion but operate independently.

This could lead to misconfiguration or misalignment between aspects of the overall ecosystem, such as between NICs and the switch network, which can dramatically impact job completion time, since network issues can be very difficult to diagnose. Large AI clusters also require coordinated congestion management to avoid packet drops and under-utilisation of GPUs, as well as coordinated management and monitoring to optimise compute and network resources in tandem.

At the heart of this solution is an Arista EOS-based agent enabling the network and the host to communicate with each other and coordinate configurations to optimise AI clusters. Using a remote AI agent, EOS running on Arista switches can be extended to directly-attached NICs and servers to allow a single point of control and visibility across an AI data centre as a holistic solution.

This remote AI agent, hosted directly on an NVIDIA BlueField-3 SuperNIC or running on the server and collecting telemetry from the SuperNIC, allows EOS, on the network switch, to configure, monitor, and debug network problems on the server, ensuring end-to-end network configuration and QoS consistency. AI clusters can now be managed and optimised as a single homogenous solution.

John McCool, Chief Platform Officer for Arista Networks, comments, “Arista aims to improve efficiency of communication between the discovered network and GPU topology to improve job completion times through coordinated orchestration, configuration, validation, and monitoring of NVIDIA accelerated compute, NVIDIA SuperNICs, and Arista network infrastructure.”

This new technology demonstration highlights how an Arista EOS-based remote AI agent allows the combined, interdependent AI cluster to be managed as a single solution. EOS running in the network can now be extended to servers or SuperNICs via remote AI agents to enable instantaneous tracking and reporting of performance degradation or failures between hosts and networks, so they can be rapidly isolated and the impact minimised.

Since EOS-based network switches are constantly aware of accurate network topology, extending EOS down to SuperNICs and servers with the remote AI agent further enables coordinated optimisation of end-to-end QoS between all elements in the AI data centre to reduce job completion time.

Zeus Kerravala, Principal Analyst at ZK Research, adds, “Best-of-breed Arista networking platforms with NVIDIA’s compute platforms and SuperNICs enable coordinated AI data centres. The new ability to extend Arista’s EOS operating system with remote AI agents on hosts promises to solve a critical customer problem of AI clusters at scale, by delivering a single point of control and visibility to manage AI availability and performance as a holistic solution.”

Arista will demonstrate the AI agent technology at the Arista IPO tenth anniversary celebration in NYSE on 5 June, with customer trials expected in the second half of 2024.

For more from Arista, click here.

The post Arista announces technology demonstration of AI data centres appeared first on Data Centre & Network News.

Arista Networks has announced a technology demonstration of AI data centres in order to align compute and network domains as a single managed AI entity, in collaboration with NVIDIA.

In order to build optimal generative AI networks with lower job completion times, customers can configure, manage, and monitor AI clusters uniformly across key building blocks including networks, NICs, and servers. This demonstrates the first step in achieving a multi-vendor, interoperable ecosystem that enables control and coordination between AI networking and AI compute infrastructure.

As the size of AI clusters and large language models (LLMs) grows, the complexity and sheer volume of disparate parts of the puzzle grow apace. GPUs, NICs, switches, optics, and cables must all work together to form a holistic network. Customers need uniform controls between their AI servers hosting NICs and GPUs, and the AI network switches at different tiers. All these elements are reliant upon each other for proper AI job completion but operate independently.

This could lead to misconfiguration or misalignment between aspects of the overall ecosystem, such as between NICs and the switch network, which can dramatically impact job completion time, since network issues can be very difficult to diagnose. Large AI clusters also require coordinated congestion management to avoid packet drops and under-utilisation of GPUs, as well as coordinated management and monitoring to optimise compute and network resources in tandem.

At the heart of this solution is an Arista EOS-based agent enabling the network and the host to communicate with each other and coordinate configurations to optimise AI clusters. Using a remote AI agent, EOS running on Arista switches can be extended to directly-attached NICs and servers to allow a single point of control and visibility across an AI data centre as a holistic solution.

This remote AI agent, hosted directly on an NVIDIA BlueField-3 SuperNIC or running on the server and collecting telemetry from the SuperNIC, allows EOS, on the network switch, to configure, monitor, and debug network problems on the server, ensuring end-to-end network configuration and QoS consistency. AI clusters can now be managed and optimised as a single homogenous solution.

John McCool, Chief Platform Officer for Arista Networks, comments, “Arista aims to improve efficiency of communication between the discovered network and GPU topology to improve job completion times through coordinated orchestration, configuration, validation, and monitoring of NVIDIA accelerated compute, NVIDIA SuperNICs, and Arista network infrastructure.”

This new technology demonstration highlights how an Arista EOS-based remote AI agent allows the combined, interdependent AI cluster to be managed as a single solution. EOS running in the network can now be extended to servers or SuperNICs via remote AI agents to enable instantaneous tracking and reporting of performance degradation or failures between hosts and networks, so they can be rapidly isolated and the impact minimised.

Since EOS-based network switches are constantly aware of accurate network topology, extending EOS down to SuperNICs and servers with the remote AI agent further enables coordinated optimisation of end-to-end QoS between all elements in the AI data centre to reduce job completion time.

Zeus Kerravala, Principal Analyst at ZK Research, adds, “Best-of-breed Arista networking platforms with NVIDIA’s compute platforms and SuperNICs enable coordinated AI data centres. The new ability to extend Arista’s EOS operating system with remote AI agents on hosts promises to solve a critical customer problem of AI clusters at scale, by delivering a single point of control and visibility to manage AI availability and performance as a holistic solution.”

Arista will demonstrate the AI agent technology at the Arista IPO tenth anniversary celebration in NYSE on 5 June, with customer trials expected in the second half of 2024.

For more from Arista, click here.

The post Arista announces technology demonstration of AI data centres appeared first on Data Centre & Network News.