Traditional system cost/performance improvements due to processor technology improvement every generation has been stagnant for the last decade. We are increasingly beginning to see heterogeneous ‘compute’ elements aka accelerators, be it GPUs, FPGA’s or custom silicon come together to help improve this performance.
This ‘Heterogeneous’ approach, of using an IO attached accelerator, works very well when the workload is ‘partitionable’ or uses ‘disjointed memory’ to hand off the compute from the processor to the accelerator and vice versa. This traditional I/O attached accelerator approach however tends to fall apart when there is a need to share a lot of data. Typically, this results in very high processor overhead when applications need a lot of communication with the accelerator devices.
The other needs of a ‘data centric’ system architecture is the ability to
- Integrate hybrid emerging multiple memory/storage technologies (ReRAM, STTRAM, MRAM, NRAM etc.) that different access methods, coherency and performance attributes in a seamless manner.
- Integrate capability to have very high network bandwidth
The OpenCAPI Consortium (Open Coherently Attached Processor Interface) was envisioned to address these very issues. Some of its key attributes
- Open – Allow broad industry participation without vendor lock-in, ecosystem development and the volume / pricing advantages that this will drive.
- High Performance - high bandwidth and low latency with ‘limited’ overhead
- Coherent - Attached devices operate natively within application’s user space and coherently with host processor
- Architecture Agnostic – Interoperability between multiple CPU architectures
The specifications for the OpenCAPI standard
§ Supports a wide range of use cases and access semantics
- Hardware accelerators
- High-performance I/O devices
- Advanced memories & classic memory
§ Reduced complexity of design implementation
- Complexities of coherence & virtual addressing are integrated onto the host processor
- Simplifies attached devices & facilitate interoperability across architectures
Multiple vendors beyond the OpenCAPI consortium are looking at these problems as well. In 2016, we also witnessed the formation of new coherent interconnect standards - GenZ & CCIX to address this space. Each consortium (with a lot of cross-linked participation) takes a
slightly different approach to tackle them. Gen-Z’s focus being a rack-level interconnect that gives access to large pools of accelerators, memory & storage resources. CCIX, like OpenCAPI, is focused inside the system node but has taken the approach of building this on top of a pre-existing standard (PCIe) where OpenCAPI is building this from ground-up.
Brad Benton, has an excellent comparison of the different standards in his Open Fabric Alliance Talk [Slides] from 2017.
IBM’s POWER9 [1,2,3,4] the latest flagship POWER processor, is the first processor to implement the OpenCAPI interface, running on the 25Gbps BlueLink. This interface, based on the OIF CEI 28G SR spec, signaling and protocol were built to enable very low latency, high bandwidth between POWER9 and the attached device. This allows for high performance attach of memory, network adapters, accelerators onto the platform. Also, unlike the prior CAPI implementation on POWER8 [5,6], the virtual-to-physical address translation & coherence is managed by the POWER9 processor simplifying the accelerator design. Some additional details are summarized here
Manoj Dusanapudi, has written an insightful article that walks through some more of the POWER9 Accelerator capabilities.
On a personal note, I am very excited by the imminent launch of POWER9 based systems in the India South Asia (ISA) market. It has been deeply satisfying, for us here at the India Systems Development Lab (ISDL), to have played a key role in delivering this transformational architecture to the market. Stay tuned as we continue to work to refresh the rest of the IBM POWER portfolio ... more soon.
#POWER9 #Power9InIndia #IBMISDL #IBM #OpenCAPI
References / Further Reading:
- IBM POWER9 Processor.
- POWER9 Processor for the Cognitive Era - HotChips28, 2016.
- Open CAPI, A New Standard for High Performance Attachment of Memory, Acceleration, and Networks - Switzerland HPC Conference 2017.
- How you can Boost Acceleration with OpenCAPI, Today! - Myron Slota
- Coherent Accelerator Processor Interface (CAPI) for POWER8 Systems – White Paper
- POWER9 - CAPI2.0 (PCIe-G4) & OpenCAPI (25Gbps) co-exist
- Accelerate the Future of Computing with POWER Acceleration – Manoj Dusanapudi