| Page 180 | Kisaco Research

Arm Neoverse is designed to meet these evolving needs, offering high compute density, exceptional energy efficiency, and a strong total cost of ownership (TCO). As host processors, Neoverse-based CPUs integrate seamlessly with GPUs and AI accelerators to enable flexible, power-efficient, and high-performance deployments across heterogeneous AI platforms capable of managing the complexity and coordination required by agentic AI systems.

In this session, we’ll demo an agentic AI application running on an AI server powered by Arm Neoverse as the host node. The application coordinates multiple agents to accelerate decision-making and streamline workload execution. We’ll also highlight the advantages of running agentic AI on heterogeneous infrastructure, explain why Arm CPUs are ideal as host processors, and demonstrate how Arm provides a scalable, efficient foundation for real-world enterprise and cloud environments.

Author:

Na Li

Principal Solution Architect
arm

Na Li is Principal AI Solution Architect for the Infrastructure Line of Business (LOB) at Arm. She is responsible for creating AI solutions that showcase the values on Arm-based platforms. She has around 10 years of experience developing AI applications across various industries. Originally trained as a computational neuroscientist and received a PhD from the University of Texas at Austin. 

Na Li

Principal Solution Architect
arm

Na Li is Principal AI Solution Architect for the Infrastructure Line of Business (LOB) at Arm. She is responsible for creating AI solutions that showcase the values on Arm-based platforms. She has around 10 years of experience developing AI applications across various industries. Originally trained as a computational neuroscientist and received a PhD from the University of Texas at Austin. 

AI inference costs are high and workloads are growing, especially when low latency is required. We demonstrate NorthPole's energy efficiency and high throughput for low-latency edge and datacenter inference tasks.

Author:

John Arthur

Principal Research Scientist
IBM

John Arthur is a principal research scientist and hardware manager in the brain-inspired computing group at IBM Research - Almaden. He has been building efficient and high-performance brain-inspired neural network chips and systems for the last 25 years, including Neurogrid at Stanford and both TrueNorth and NorthPole at IBM. John holds a PhD in bioengineering from University of Pennsylvania and BS in electrical engineering from Arizona State University.

John Arthur

Principal Research Scientist
IBM

John Arthur is a principal research scientist and hardware manager in the brain-inspired computing group at IBM Research - Almaden. He has been building efficient and high-performance brain-inspired neural network chips and systems for the last 25 years, including Neurogrid at Stanford and both TrueNorth and NorthPole at IBM. John holds a PhD in bioengineering from University of Pennsylvania and BS in electrical engineering from Arizona State University.

This hands-on session is designed for developers and architects building and scaling generative AI services. We will provide a practical look at Google Kubernetes Engine (GKE) as the foundation for high-performance large language model (LLM) inference. The session will feature a live demo of the GKE Inference Gateway, highlighting its model-aware routing and serving priority features. We will then delve into the open-source llm-d project, showcasing its vLLM-aware scheduling and disaggregated serving capabilities. To cap it off, we'll explore the impressive performance gains of running vLLM on Cloud TPUs for maximum throughput and efficiency. You will leave with actionable insights and code examples to optimize your LLM serving stack.

Author:

Nathan Beach

Director, Product Management
Google Cloud

Nathan Beach is Director of Product Management for Google Kubernetes Engine (GKE). He leads the product team working to make GKE a great platform on which to run AI workloads. He received his MBA from Harvard Business School and, prior to Google, led his own startup. He is a builder and creator passionate about making products that superbly meet user needs. He enjoys career coaching and mentoring, and he is eager to help others transition into product management and excel in their careers.

Nathan Beach

Director, Product Management
Google Cloud

Nathan Beach is Director of Product Management for Google Kubernetes Engine (GKE). He leads the product team working to make GKE a great platform on which to run AI workloads. He received his MBA from Harvard Business School and, prior to Google, led his own startup. He is a builder and creator passionate about making products that superbly meet user needs. He enjoys career coaching and mentoring, and he is eager to help others transition into product management and excel in their careers.

Innovation happens where AI meets the edge. In this interactive session, we’ll demonstrate how the Metis® platform enables breakthrough applications across industries. Discover how scalable, efficient inference unlocks new possibilities in manufacturing, logistics, energy, and beyond.

Author:

Manuel Botija

VP, Product Management
Axelera AI

Manuel Botija is an engineer with degrees from Telecom Paris and Universidad Politécnica de Madrid. Over the past 17 years, he has led product innovation in semiconductor startups across Silicon Valley and Europe. Before joining Axelera, Manuel served as Head of Product at GrAI Matter Labs, which was acquired by Snap Inc.

Manuel Botija

VP, Product Management
Axelera AI

Manuel Botija is an engineer with degrees from Telecom Paris and Universidad Politécnica de Madrid. Over the past 17 years, he has led product innovation in semiconductor startups across Silicon Valley and Europe. Before joining Axelera, Manuel served as Head of Product at GrAI Matter Labs, which was acquired by Snap Inc.

Author:

David Marks

Field Application Engineer
Axelera AI

David Marks

Field Application Engineer
Axelera AI

Outdated x86 CPU/NIC architectures bottleneck AI's power, limiting true Generative AI potential. NeuReality's groundbreaking NR1® Chip combines entirely new categories of AI-CPU and AI-NIC into one single chip, fundamentally redefining AI data center inference solutions. It solves these bottlenecks, boosting Generative AI token output up to 6.5x for the same cost and power versus x86 CPU systems, making AI widely affordable and accessible for businesses and governments. It works in harmony with any AI Accelerator/GPU, maximizing GPU utilization, performance, and system energy efficiency. Our NR1® Inference Appliance, with its built-in software, intuitive SDK, and APIs, comes preloaded with out-of-the-box LLMs like Llama 3, Mistral, DeepSeek, Granite, and Qwen for rapid, seamless deployment with significantly reduced complexity, cost, and power consumption at scale.

Author:

Moshe Tanach

Co-Founder & CEO
NeuReality

Moshe Tanach is Founder and CEO at NeuReality.

Before founding NeuReality, he served as Director of Engineering at Marvell and Intel, leading complex wireless and networking products to mass production.

He also served as Appointed Vice President of R&D at DesignArt-Networks (later acquired by Qualcomm) developing 4G base station products.

He holds Bachelor of Science in Electrical Engineering (BSEE) from the Technion, Israel, Cum Laude.

Moshe Tanach

Co-Founder & CEO
NeuReality

Moshe Tanach is Founder and CEO at NeuReality.

Before founding NeuReality, he served as Director of Engineering at Marvell and Intel, leading complex wireless and networking products to mass production.

He also served as Appointed Vice President of R&D at DesignArt-Networks (later acquired by Qualcomm) developing 4G base station products.

He holds Bachelor of Science in Electrical Engineering (BSEE) from the Technion, Israel, Cum Laude.

Five Forces Reshaping AI Infrastructure in 2025

Over the last six months, we held two dozen closed‑door interviews with the people who pour the concrete, sign the power‑purchase agreements, and deploy the GPUs that drive today’s AI boom. They ranged from Fortune‑100 cloud operators and traditional utilities to private‑equity financiers and immersion‑cooling specialists. Taken together, the conversations reveal a market in hyper‑growth mode but constrained by physics - power density, transmission capacity, thermal limits, and by a brutally tight equipment supply chain. Five forces rise above the noise and will shape every capital‑allocation decision in AI infrastructure during 2025.