Andrea Gussoni 1f1ceb7172 Initialized repo with Rodinia 3.1 8 년 전
..
README 1f1ceb7172 Initialized repo with Rodinia 3.1 8 년 전

README

//========================================================================================================================================================================================================200
// INFO
//========================================================================================================================================================================================================200

//======================================================================================================================================================150
// UPDATE
//======================================================================================================================================================150

// 2009.12 Lukasz G. Szafaryn
// -- converted from MATLAB to CUDA
// 2010.01 Lukasz G. Szafaryn
// -- arranged, commented
// 2012.05 Lukasz G. Szafaryn
// -- arranged, commented
// 2012.05 Lukasz G. Szafaryn
// -- converted from CUDA to OpenCL

//======================================================================================================================================================150
// DESCRIPTION
//======================================================================================================================================================150

// The Heart Wall application tracks the movement of a mouse heart over a sequence of 104 609x590 ultrasound images to record response to the stimulus.
// In its initial stage, the program performs image processing operations on the first image to detect initial, partial shapes of inner and outer heart walls.
// These operations include: edge detection, SRAD despeckling (also part of Rodinia suite), morphological transformation and dilation. In order to reconstruct
// approximated full shapes of heart walls, the program generates ellipses that are superimposed over the image and sampled to mark points on the heart walls
// (Hough Search). In its final stage (Heart Wall Tracking presented here), program tracks movement of surfaces by detecting the movement of image areas under
// sample points as the shapes of the heart walls change throughout the sequence of images.

// Tracking is the final stage of the Heart Wall application. It takes the positions of heart walls from the first ultrasound image in the sequence as determined by the
// initial detection stage in the application. Tracking code is implemented in the form of multiple nested loops that process batches of 10 frames and 51 points in each
// image. Displacement of heart walls is detected by comparing currently processed frame to the template frame which is updated after processing a batch of frames.
// There is a sequential dependency between processed frames. The processing of each point consist of a large number of small serial steps with interleaved control
// statements. Each of the steps involves a small amount of computation performed only on a subset of entire image. This stage of the application accounts for almost
// all of the execution time (the exact ratio depends on the number of ultrasound images).

//======================================================================================================================================================150
// PAPERS
//======================================================================================================================================================150

// L. G. Szafaryn, K. Skadron, and J. J. Saucerman. "Experiences Accelerating MATLAB Systems Biology Applications." In Proceedings of the Workshop on Biomedicine
// in Computing: Systems, Architectures, and Circuits (BiC) 2009, in conjunction with the 36th IEEE/ACM International Symposium on Computer Architecture (ISCA),
// June 2009.

//======================================================================================================================================================150
// DOWNLOAD
//======================================================================================================================================================150

// Rodinia Benchmark Suite page

//======================================================================================================================================================150
// IMPLEMENTATION-SPECIFIC DESCRIPTION (OpenCL)
//======================================================================================================================================================150

// This is the OpenCL version of Tracking code.

// OpenCL implementation of this code is a classic example of the exploitation of braided parallelism. Processing of sample points is assigned to multiprocessors (TLP),
// while processing of individual pixels in each sample image is assigned to processors inside each multiprocessor. However, each GPU multiprocessor is usually
// underutilized because of the limited amount of computation at each computation step. Large size of processed images and lack temporal locality did not allow for
// utilization of fast shared memory. Also the GPU overhead (data transfer and kernel launch) are significant. In order to provide better speedup, more drastic GPU
// optimization techniques that sacrificed modularity (in order to include code in one kernel call) were used. These techniques also combined unrelated functions and
// data transfers in single kernels.

//======================================================================================================================================================150
// RUNNING THIS CODE
//======================================================================================================================================================150

// The code takes the followint input files that need to be located in the same directory as the source files:
// 1) video file (input.avi)
// 2) text file with parameters (input.txt)

// The following are the command parameters to the application:
// 1) Number of frames to process. Needs to be integer <= to the number of frames in the input file.
// Example:
// ./a.out 104

//======================================================================================================================================================150
// End
//======================================================================================================================================================150

//========================================================================================================================================================================================================200
// End
//========================================================================================================================================================================================================200