I have often wondered what makes a system Google strong ,and how does google assistant understand precisely what my two-year old wants when she mumbles “Ok Google”, followed by something that seems Latin to me. The answer to the first question was at-least addressed at the OpenPower Summit earlier this year, when Marie Mahoney referred to Zaius, a dual socket POWER9 based system, as such.

So, what’s special about Zaius? It all begins with its heart and soul, the POWER9TM processor, which is part of a family of chips, architected to address the wide spectrum of applications, ranging from power and cost-effective scale-out systems, to high end enterprise systems that a large memory footprint, with extreme reliability and robustness, coupled with phenomenal core through-put. The family consists of two variants of IO subsystems, a direct memory attached version with 8 DDR4 ports, or with high bandwidth Differential Memory Interface (DMI) with 8 buffered memory channels.

Each of these can be coupled with two variants of the boosted core, namely, a 24 core 4- thread variant, primarily optimized for the Linux environment and a 12 core 8 –thread variant optimized for larger partitions in the PowerVM ecosystem. The large thread count is one of the many features that make POWER9 attractive for search – like workloads.

 The newly architected POWER9 core with its execution slices, reduced pipeline depth and merged fixed and floating point units is optimized for cognitive workloads. The fundamental Lego block is a 64-bit slice, two of which are coupled together to form a 128-bit super-slice[1]. The SMT4 core is built using 2 super-slices, while the SMT8 variant is composed of 4 super-slices. The split 32kB each Instruction and Data Caches are backed up by a larger L2 and a shared 120MB on-chip L3, made of embedded DRAMs cells.

What makes this processor truly impressive is the associated bandwidth built to feed the compute-loaded cores. In addition to the 230GB/s memory bandwidth, and the 256 to 384GBs of SMP interconnected bandwidth, POWER9 is the first processor to support PCI-Express 4.0 (48 lanes) and NVLink 2.0, which is a high-bandwidth highway to Nvidia’s latest GPUs. The PCIe interface is also used for the next generation CAPI 2.0.

The processor houses nearly 8 billion transistors (i.e., more than 1 transistor per living person on earth), in less than 700 mm2 using FinFET technology. While three thin-oxide logic transistor threshold voltages (VTHs) are employed, the usage of the lowest VTH is kept to low single digit percentage to balance power with performance. the 10 different input voltages and 17 PLLs control 58 independent clock meshes, some of which are resonant, that adorn the chip. In addition, an on-chip PowerPC controller (so, really there are 25 cores on-chip!) is centrally placed that monitors information from over 63 thermal sensors and 30 droop monitors to manages the overall energy efficiency, and reliability under operating system directives. 

With all its bells and whistles, to go with a highly beefed up compute core, it is no wonder that POWER9 excels not just at search workloads, but also in the Summit and Sierra systems, dubbed as “AI supercomputers”. Now, if I could only put all the compute to work to help me understand what my two – year old wants!!

For details on the POWER9 processor, check out the following

The 24-core POWER9 processor

The POWER9 advantage

Summit: Oak Ridge Leadership Computing Facility