AI inference systems are undergoing continuous changes. Changes include at the model level, the accelerator level, the interconnect level and the system level inclusive of the software. At an AI inference system level these changes are significant. We focus on technology direction for these changes at each level, including the accelerator, the scale up/scale-out network and the rack implementation. We discuss on accelerator design choices, especially opportunities in co-design, interconnect technologies such as the intercept at 224G copper and future 448G as well as optics interconnect, and rack infrastructures for LLM deployments.

Ashwin Gumaste
Ashwin Gumaste is an AI Architect at Microsoft. He specializes in building AI systems, scaling networks, and model estimation. Before joining Microsoft, Ashwin held key roles at Infinera, Cisco, and Fujitsu, as well as academic positions at IIT Bombay and MIT. Earlier in his career at IIT Bombay, he designed, developed, and successfully commercialized carrier Ethernet switch routers, earning India's highest science and technology honor, the Bhatnagar Prize. A distinguished fellow of the Indian National Academy of Engineering and the National Academy of Sciences in India, Ashwin has made significant contributions to AI and network engineering. He has authored three books, published 215 peer-reviewed papers, and holds 28 granted patents.