National Technology & Innovation Uncategorized

Ironwood : Computer Chip

Ironwood is Google’s seventh-generation Tensor Processing Unit (TPU), designed to accelerate AI applications with a focus on inference . It’s engineered to handle the intensive computational demands of “thinking models” like large language models (LLMs) and Mixture of Experts (MoEs) .

About Ironwood chip:

Performance and Efficiency: Ironwood offers double the performance per watt compared to Google’s previous Trillium TPU . It is also nearly 30 times more power-efficient than Google’s first Cloud TPU from 2018 .
Scalability: Ironwood can scale up to 9,216 chips, delivering 42.5 Exaflops of compute power . This is more than 24 times the compute power of the world’s largest supercomputer, El Capitan . For Google Cloud customers, Ironwood is available in 256 chip and 9,216 chip configurations .
Key Features:
- High Bandwidth Memory (HBM): It features 192 GB of HBM per chip, which is six times more than Trillium. This enables the processing of larger models and datasets .
- HBM Bandwidth: Ironwood achieves 7.37 TB/s HBM bandwidth per chip, 4.5 times that of Trillium. This ensures rapid data access, crucial for memory-intensive workloads .
- Inter-Chip Interconnect (ICI): The ICI bandwidth has been increased to 1.2 TBps bidirectional, 1.5 times that of Trillium, facilitating faster communication between chips for efficient distributed training and inference .
- SparseCore: Ironwood includes an enhanced SparseCore, a specialized accelerator for processing ultra-large embeddings common in advanced ranking and recommendation workloads .
- Pathways: Google’s machine learning runtime, Pathways, enables efficient distributed computing across multiple TPU chips .
Target Use: Ironwood is designed to minimize data movement and latency on the chip while processing massive amounts of data . It is well-suited for AI workloads, agentic AI workloads, and handling the complex computation and communication demands of “thinking models” .
Benefits: The improved HBM bandwidth aids in running more intensive AI workloads, while faster communication speeds between chips enable efficient distribution of workloads during LLM training or inferencing . Its performance-per-dollar is considered competitive compared to GPUs from Nvidia and AMD .