The dimension of input matrix should be multiple size of block size.
******Adjustable work group size*****
The kernel has square shape
RD_WG_SIZE_0 or RD_WG_SIZE_0_0 describe one dimension
The actually dimension = RD_WG_SIZE_0 * RD_WG_SIZE_0
USAGE:
make clean
make KERNEL_DIM="-DRD_WG_SIZE_0=16"