OLD-README 9.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207
  1. Rodinia Benchmark Suite 3.1
  2. ===========================
  3. I. Overview
  4. The University of Virginia Rodinia Benchmark Suite is a collection of parallel programs which targets
  5. heterogeneous computing platforms with both multicore CPUs and GPUs.
  6. II. Usage
  7. 1. Pakage Structure
  8. rodinia_2.1/bin : binary executables
  9. rodinia_2.1/common : common configuration file
  10. rodinia_2.1/cuda : source code for the CUDA implementations
  11. rodinia_2.1/data : input files
  12. rodinia_2.1/openmp : source code for the OpenMP implementations
  13. rodinia_2.1/opencl : source code for the OpenCL implementations
  14. 2. Build Rodinia
  15. Install the CUDA/OCL drivers, SDK and toolkit on your machine.
  16. Modify the rodinia_2.1/common/make.config file to change the settings of rodinia home directory and CUDA/OCL library paths.
  17. To compile all the programs of the Rodinia benchmark suite, simply use the universal make file to compile all the programs, or go to each
  18. benchmark directory and make individual programs.
  19. 3. Run Rodinia
  20. There is a 'run' file specifying the sample command to run each program.
  21. IV. Change Log
  22. Dec. 12, 2015: Rodinia 3.1 is released
  23. ********************************************************
  24. 1. Bug fix
  25. 1). OpenCL version Hotspot (Thanks Shuai Che from AMD)
  26. Delete this parameter "CL_MEM_ALLOC_HOST_PTR" for device-side buffer allocation.
  27. 2). OpenCL version Kmeans (Thanks Jeroen Ketema from Imperial College London, Tzu-Te from National Chiao Tung University, Shuai Che and Michael Boyer form AMD )
  28. Fix data race problem for reduce kernel.
  29. 3). OpenCL version Leukocyte (Thanks Jeroen Ketema from Imperial College London)
  30. Fix data race problem for find_ellipse kernel.
  31. 4). OpenCL version srad (Thanks Jeroen Ketema from Imperial College London)
  32. Fix data race problem for reduce kernel
  33. 5). OpenCL version dwt2d (Thanks Tzu-Te from National Chiao Tung University)
  34. Fix a bug for buffer size.
  35. 2. New benchmarks (Thanks Linh Nguyen from Hampden-Sydney College)
  36. 1). Hotspot3D(CUDA, OpenMP and OpenCL version)
  37. 2). Huffman (only CUDA version)
  38. 3. Performance improvement
  39. 1). Openmp version nn (Thanks Shuai Che from AMD)
  40. 2). OpenCL version nw (Thanks Shuai Che from AMD)
  41. 3). CUDA version cfd (Thanks Ke)
  42. 5. Several OpenMP benchmarks have been improved (Thanks Sergey Vinogradov and Julia Fedorova from Intel)
  43. 1). BFS
  44. 2). LUD
  45. 3). HotSpot
  46. 4). CFD
  47. 5). NW
  48. Mar. 02, 2013: Rodinia 2.3 is released
  49. ***********************************************************************
  50. A. General
  51. Add -lOpenCL in the OPENCL_LIB definition in common/make.config
  52. OPENCL_LIB = $(OPENCL_DIR)/OpenCL/common/lib -lOpenCL (gcc-4.6+ compatible)
  53. B. OpenCL
  54. 1. Particlefilter OpenCL
  55. a) Runtime work group size selection based on device limits
  56. b) Several bugs of kernel fixed
  57. c) Initialize all arrays on host side and device side
  58. d) Fix objxy_GPU array across boundary access on device
  59. objxy_GPU = clCreateBuffer(context, CL_MEM_READ_WRITE, 2*sizeof (int) *countOnes, NULL, &err);
  60. and
  61. err = clEnqueueWriteBuffer(cmd_queue, objxy_GPU, 1, 0, 2*sizeof (int) *countOnes, objxy, 0, 0, 0);
  62. e) #define PI 3.1415926535897932 in ex_particle_OCL_naive_seq.cpp
  63. f) put -lOpenCL just behind -L$(OPENCL_LIB) in Makefile.
  64. g) delete an useless function tex1Dfetch() from particle_float.cl.
  65. h) add single precision version!
  66. 2. B+Tree OpenCL
  67. a) Replace CUDA function __syncthreads() with OpenCL barrier(CLK_LOCAL_MEM_FENCE) in kernel file
  68. 3. Heartwall OpenCL
  69. a) Lower work item size from 512 to 256 (Better compatibility with AMD GPU)
  70. b) Several bugs fixed on kernel codes
  71. c) Several bugs fixed on host codes
  72. 4. BSF OpenCL
  73. a). Replace all bool with char since bool is NOT a valid type for OpenCL arguments .
  74. b). -lOpenCL just behind -L$(OPENCL_LIB) in Makefile. (gcc-4.6+ compatible)
  75. c). remove NVIDIA-specific parameters and decrease thread block size for Better compatibility with AMD GPU
  76. BFS/CLHelper.h:
  77. //std::string options= "-cl-nv-verbose"; // doesn't work on AMD machines
  78. resultCL = clBuildProgram(oclHandles.program, deviceListSize, oclHandles.devices, NULL, NULL,? NULL);
  79. bfs.cpp:
  80. #define MAX_THREADS_PER_BLOCK 256 // 512 is too big for my AMD Fusion GPU
  81. d) Correct bad mallocs
  82. BFS/CLHelper.h
  83. oclHandles.devices = (cl_device_id *)malloc(deviceListSize * sizeof(cl_device_id));
  84. d_mem = clCreateBuffer(oclHandles.context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, size, h_mem_ptr, &oclHandles.cl_status);
  85. d_mem = clCreateBuffer(oclHandles.context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, size, h_mem_ptr, &oclHandles.cl_status);
  86. h_mem_pinned = (cl_float *)clEnqueueMapBuffer(oclHandles.queue, d_mem_pinned, CL_TRUE,? \
  87. CL_MAP_WRITE, 0, size, 0, NULL,? \
  88. bfs.cpp
  89. d_graph_mask = _clMallocRW(no_of_nodes*sizeof(bool), h_graph_mask);
  90. d_updating_graph_mask = _clMallocRW(no_of_nodes*sizeof(bool), h_updating_graph_mask);
  91. d_graph_visited = _clMallocRW(no_of_nodes*sizeof(bool), h_graph_visited);
  92. compare_results<int>(h_cost_ref, h_cost, no_of_nodes);
  93. f) Add #include <cstdlib> in bfs.cpp
  94. g) Conditional including time.h
  95. 5. CFD OpenCL
  96. a) Comment out two useless clWaitForEvents commands in CLHelper.h. It will get 1.5X speedup on some GPUs.
  97. b) -lOpenCL just behind -L$(OPENCL_LIB) in Makefile. (gcc-4.6+ compatible)
  98. c) cfd/CLHelper.h
  99. oclHandles.devices = (cl_device_id *)malloc(sizeof(cl_device_id) * deviceListSize);
  100. 6. Backprop OpenCL.
  101. a) Opencl doesn’t support integer log2 and pow
  102. backprop_kernel.cl 40 & 42 To:
  103. for ( int i = 1 ; i <= HEIGHT ; i=i*2){
  104. int power_two = i;
  105. b) Change if( device_list ) delete device_list; to
  106. if( device_list ) delete[] device_list;
  107. 7. gaussianElim OpenCL
  108. a) Add codes to release device buffer at the end of ForwardSub() function (gaussianElim.cpp)
  109. b) gaussian/gaussianElim.cpp
  110. Add cl_cleanup(); after free(finalVec);
  111. 8. Lavamd OpenCL: In lavaMD/kernel/kernel_gpu_opencl_wrapper.c
  112. add : #include <string.h>
  113. 9. pathfinder OpenCL
  114. a) OpenCL.cpp: add #include <cstdlib>
  115. b) Makefile: Changed the plase of -lOpenCL for better compatibility of gcc-4.6+.
  116. 10. streamcluster OpenCL: In CLHelper.h
  117. oclHandles.devices = (cl_device_id *)malloc(sizeof(cl_device_id)*deviceListSize);
  118. 11. Hotspot OpenCL: In hotspot.c add clReleaseContext(context);
  119. before main function return.
  120. 12. kmeans OpenCL: Add shutdown() in main function to release CL resource before quit.
  121. C. CUDA
  122. 1. CFD CUDA: solve compatablity problem with CUDA 5.0.
  123. 2. Backprop CUDA: Correct include command in backprop_cuda.cu
  124. 3. BFS CUDA: Correct include command in backprop_cuda.cu
  125. 4. kmeans CUDA: Add “-lm” in link command.
  126. 5. nn CUDA: Fix makefile bugs
  127. 6. mummergpu CUDA
  128. a) add #include <stdint.h> to
  129. mummergpu_gold.cpp
  130. mummergpu_main.cpp
  131. suffix-tree.cpp
  132. b) mummergpu.cu: correct void boardMemory function parameters types.
  133. c) Rename getRef function to getRefGold in mummergpu_gold.cpp to avoid multiple definition
  134. D. OpenMP
  135. 1. Kmeans OpenMP
  136. Rename variable max_dist to min_dist in kmeans_clustering.c in kmeans_openmp/ and kmeans_serial/ folders to avoid misunderstanding.
  137. ***********************************************************************
  138. For bug reports and fixes:
  139. Thanks Alexey Kravets, Georgia Kouveli and Elena Stohr from CARP project. Thanks Maxim Perminov from Intel.Thanks Daniel Lustig from Princeton. Thanks John Andrew Stratton from UIUC. Thanks Mona Jalal from University of Wisconsin.
  140. Oct. 09, 2012: Rodinia 2.2 is released
  141. - BFS: Delete invalid flag CL_MEM_USE_HOST_PTR from _clMallocRW and _clMalloc functions in opencl verion. Thanks Alexey Kravets (CARP European research project).
  142. - Hotspot: hotspot_kernel.cl:61 correct the index calculation as grid_cols *loadYidx + loadXidx. Correct the same problem in hotspot.cu:152. Thanks Alexey Kravets.
  143. - Pathfinder: Added two __syncthreads in dynproc_kernel function of CUDA version to avoid data race. Thanks Ronny Krashinsky(Nvidia company) and Jiayuan Meng(Argonne National Laboratory). Alexey Kravets found and corrected the same problem in opencl version.
  144. - SRAD: Replace CUDA function __syncthreads() in srad OpenCL kernel with OpenCL barrier(CLK_LOCAL_MEM_FENCE).
  145. - NN: Fixed the bug of CUDA version on certain input sizes. The new version detects excess of x-dimension size limit of a CUDA block grid and executes a two-dimensional grid if needed.(Only cuda version has this problem)
  146. - Promote B+Tree to main distribution (with output)
  147. - Promote Myocyte to main distribution (with output)
  148. June 27, 2012: Rodinia 2.1 is released
  149. - Include fixes for SRAD, Heartwall, Particle Filter and Streamcluster
  150. Nov 23, 2011: Rodinia 2.0.1 is released
  151. - Include a CUDA version of NN comparable to the OCL version.
  152. - Use a new version of clutils that is BSD, not GPL.
  153. Nov 11, 2011: Rodinia 2.0 is released
  154. - Include several applications into the main suite:
  155. lavaMD, Gaussian Elimination, Pathfinder, k-Nearest Neighbor and Particle Filter.
  156. Detailed application information can also be found at http://lava.cs.virginia.edu/wiki/rodinia
  157. - Merge new OpenCL implementations into the main tarball.
  158. Mar 01, 2010: Rodinia 1.0 is released
  159. III. Contact
  160. Ke Wang: kw5na@virginia.edu
  161. Shuai Che: sc5nf@cs.virginia.edu
  162. Kevin Skadron: skadron@cs.virginia.edu
  163. Rodinia wiki:
  164. http://lava.cs.virginia.edu/wiki/rodinia