We just announced cuTile, a tile programming model for CUDA! It's an array-based paradigm where the compiler automates mem movement, pipelining & tensor core utilization, making GPU programming easier & more portable. It's built on top of a powerful new compiler stack and MLIR dialect called Tile IR. I've been lucky enough to work with the amazing team behind this over the last few months. Tile programming is a big deal and will be transformative for CUDA!
Interesting, hopefully it will be faster than cuda malloc on the fly for dynamic arrays
Looking forward to checking it out and hoping there are general purpose applications for this, non ML and such.
Bryce Adelstein Lelbach how is cutile compared to triton ?
When can we expect APL symbols to make an appearance 😅 More seriously, this looks quite elegant, awesome job!
cool
Is it a nVidia version of Triton? Seems both of them are at "CTA Tile" level.
any chance of these coming to open standards for gpu programming like vulkan?
hope the compiler stack is open source.. so cutile paradigm can be ported eventually to CUDA like GPU stacks like AMD HIP and MUSA
Wow! This is awesome. How can I begin to explore it awesome capabilities?