- The dimension of input matrix should be multiple size of block size.
- ******Adjustable work group size*****
- The kernel has square shape
- RD_WG_SIZE_0 or RD_WG_SIZE_0_0 describe one dimension
- The actually dimension = RD_WG_SIZE_0 * RD_WG_SIZE_0
- USAGE:
- make clean
- make KERNEL_DIM="-DRD_WG_SIZE_0=16"
|