There’s never been a better time before than now to see the transformation of a vision 4 years back into transformation reality today with the launch of POWER9. Every new trend, every new tide in the technology industry is addressed in an innovative manner with design for future infrastructures of IT world and computing paradigms. With Power9, data is never seen as a problem but a differentiating edge over the competition. Let me dive back into history a bit and walk you through the evolutionary journey of new generation processor, Power9.
The Vision of Power9 Architecture
It is now a common knowledge that Moore’s law has come to an end in practical terms and in fact, chip speeds can go backwards. So IT innovations can no longer come from just the processors but need full system stack innovations for performance advantage.
New era in computing as IBM sees it is cognitive era where AI (Augmented Intelligence), Machine Learning, Deep Learning with Big Data is the crux of majority of workloads. Traditional HPC workloads and enterprise workloads are vastly different from these cognitive workloads. With the emergence of new era, come new IT consumption models like custom HSDC (Hyper Scale Data Centers), Hybrid Clouds and more importantly with Open Solutions. However existing processor and system designs are insufficient to drive optimum price/performance advantage. Thus needed a disruption in architectural designs to pave way for “Information superhighway” which can break the barriers and maximize data bandwidths across system stack.
Power Acceleration with Power9
Power9 being the first commercial platform to combine advanced CPU, exceptional floating point performance and new generation graphics support along with high speed tunnels to transfer modern data, is the acknowledgement from IBM to this new disruption in enterprise AI. Check out the article from Rahul Rao on Power9 capabilities. Power9 accelerates modern workloads with the support of accelerators like GZIP on chip, GPUs through on chip support of NVIDIA’s next-generation NVLink, OpenCAPI 3.0 and PCI-Express 4.0 all in one system.
For a quick comparison of how this has changed the horizon of acceleration of new gen workloads, Power9 has taken the accelerations to a minimum of 5x compared to existing art.
So Power9 acceleration is designed for seamless CPU/Acceleration interaction with low latency and increased performance. Coherent memory sharing, Enhanced Virtual address translation and design for efficient programming models with reduced SW/HW overhead is the key differentiation.
OpenCAPI for FPGA/New Gen Memories/ASICs
- Coherent, virtual address enabled low latency and high bandwidth bus interface to FPGAs and ASICs allows accelerators behave as if they are integrated into custom microprocessors. To know what CAPI is good for, please read Anand Haridas's blog article here.
- Provides bandwidth that will be needed to support rapidly increasing network speeds. Network controllers based on virtual addressing can eliminate software overhead without complexity.
- Allows system designers to take full advantage of emerging memory technologies for cost efficient data centers
- Coherent storage controller can eliminate kernel software overhead and enable extreme IOPS with minimal CPU intervention.
- Architecture agnostic and open architecture for inter architecture portability.
A few announcements around OpenCAPI are
Hybrid Memory System Architecture:
Cavium will leverage OpenCAPI to accelerate Data Center workloads:
OpenCAPI specification and detail information can be obtained from www.opencapi.org
GPU Connect for Deep learning/ HPC
- Next generation of CPU/GPU bandwidth and integration with NVLink 2.0
- 25GB/s Link connecting CPU and GPU with up to 150GB/s bandwidth between components
- 4-6 NVIDIA Volta GPUs with NVLink 2.0 interconnect
- SXM2 Form Factor.
- Coherent programming model for reduced SW complexity
- Virtual addressing for user level direct access and interoperability between OpenCAPI devices and CPUs
- Transfers data up to 5.6 times faster than the CUDA host-device bandwidth of tested x86 platforms [1].
I will also be writing in detail in my future blogs on how to decide between FPGAs vs GPUs and how these advanced interfaces change the paradigm of accelerator programming model. We no longer need to be limited on data transfer bandwidth while offloading the work to accelerators. This will give rise to new thinking where the mix of compute intense as well as data intense workloads benefit on accelerators.
Stay tuned and Register to know more about what IBM research is doing with CORAL benchmarks, the main challenges that IBM and NVIDIA solved in past few months and the software tools that IBM is developing internally to ride the 2020-2023 Exascale wave.
#Power9inIndia #POWER9 #IBMISDL #IBM