|
@@ -1,49 +1,61 @@
|
|
|
\section{Summary Of The Work}
|
|
|
|
|
|
+\subsection{Becoming familiar with the OpenCL framework}
|
|
|
+Before starting the project I never worked with OpenCL, so before starting the work I decided to research information through the documentation available online.
|
|
|
+In the meantime I tried to compile and play with \textbf{pocl} on my laptop, just to understand how to start from an OpenCL application and run it on the hardware.
|
|
|
+My main reference has been the pocl website, in particular the documentation \cite{poclwebsite} on the pocl project website.\\
|
|
|
+I had some previous experience with the \textbf{LLVM }framework \cite{llvmwebsite}, that pocl uses for compiling the runtime, so this part was not too difficult to manage, also because the required version of LLVM (3.8) is the default version shipped by the Ubuntu distribution, but anyway I had it already compiled on my machine.\\
|
|
|
+Once I had the runtime compiled and ready for my laptop, I moved to becoming familiar with the designated benchmark suite.\\
|
|
|
+The first impact with the benchmark suite has been a little problematic since, for reasons that will be more clear when reading the section dedicated to the modifications made at the \textbf{Rodinia Benchmark Suite}, the suite is tailored for running on GPU, and since the pocl runtime on my laptop only exposed a CPU device, I wasn't able to run a single benchmark, and not having yet developed the skills necessary to debug and work with the C++ OpenCL Wrapper API, I was having some difficulties.\\
|
|
|
+For this reason I decided to begin with something simpler, and I searched for other Benchmark Suites online. I searched for a little bit and found the \textbf{ViennaCL} \cite{viennawebsite} suite.\\
|
|
|
+This time things went better, and after some experiments and tentatives I managed to run some benchmarks of the suite on my laptop, and reading the code I began to understand how the initialization and run of an OpenCL platform worked.\\
|
|
|
+Also during the documentation phase I become aware of the existence of the \textbf{Beignet} project, an Open Source OpenCL implementation to support the integrated GPUs on Intel chipset, so I had the opportunity to experiment a little also with a GPU device even before working on the board.\\
|
|
|
+At this point I felt that I had the prerequisites to start working with the \textbf{ODROID}, so I began the work on the board.
|
|
|
|
|
|
\subsection{Build of the runtime}
|
|
|
-The first challenge to tackle was the retrieval and compilation of the OpenCL runtimes.
|
|
|
-The runtime for the Mali GPU is already provided in the Hardkernel repository, so a simple \lstinline{sudo apt-get install mali-fbdev} does the trick.
|
|
|
-For what concenrs the Pocl runtime instead we need to start from scratch.
|
|
|
-The first thing to do is to retrieve the last version of the OpenCL runtime (curenntly version 0.14) from the \href{http://portablecl.org/downloads/pocl-0.14.tar.gz}{website}.
|
|
|
+The first challenge to tackle was the retrieval and compilation of the OpenCL runtimes.\\
|
|
|
+The runtime for the \textbf{Mali GPU} is already provided in the Hardkernel repository, so a simple \lstinline{sudo apt-get install mali-fbdev} does the trick.
|
|
|
+For what concenrs the Pocl runtime instead we need to start from scratch.\\
|
|
|
+The first thing to do is to retrieve the last version of the OpenCL runtime (currently version 0.14) from the \href{http://portablecl.org/downloads/pocl-0.14.tar.gz}{website}.
|
|
|
The next thing to do is to decompress the archive of with simple \lstinline{tar xvfz pocl-0.14.tar.gz}.\\
|
|
|
-Pocl take adavante of \textbf{LLVM} to build itself, so we need to install a few dependencies from the package manager before being able to compile it. We can find at the \href{http://portablecl.org/docs/html/install.html}{dedicated page} on the official wiki a list of all the packages needed for the build. Basically we need LLVM and a bunch of development package of it, CMake to build the Makefiles, the standard utilities for compiòing (gcc, lex, bison), and some packages to have an Installable client driver (ICD) to be able to load the appropriate OpenCL at runtime.\\
|
|
|
+Pocl take adavante of \textbf{LLVM} to build itself, so we need to install a few dependencies from the package manager before being able to compile it. We can find at the \href{http://portablecl.org/docs/html/install.html}{dedicated page} on the official wiki a list of all the packages needed for the build. Basically we need LLVM and a bunch of development package of it, CMake to build the Makefiles, the standard utilities for compiling (gcc, lex, bison), and some packages to have an Installable client driver (\textbf{ICD}) to be able to load the appropriate OpenCL at runtime.\\
|
|
|
What we need to do on our system is basically:
|
|
|
+\bigskip
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
|
sudo apt-get update && sudo apt-get upgrade -y
|
|
|
-sudo apt-get install -y vim build-essential flex bison libtool\
|
|
|
-libncurses5* git-core htop cmake libhwloc-dev libclang-3.8-dev\
|
|
|
-clang-3.8 and llvm-3.8-dev zlib1g ocl-icd-libopencl1 clinfo\
|
|
|
-libglew-dev time gnuplot clinfo ocl-icd-dev ocl-icd-opencl-dev\
|
|
|
-qt4-qmake libqt4-dev libusb-1.0-0-dev
|
|
|
+sudo apt-get install -y vim build-essential flex bison libtool libncurses5* git-core htop cmake libhwloc-dev libclang-3.8-dev clang-3.8 and llvm-3.8-dev zlib1g ocl-icd-libopencl1 clinfo libglew-dev time gnuplot clinfo ocl-icd-dev ocl-icd-opencl-dev qt4-qmake libqt4-dev libusb-1.0-0-dev
|
|
|
|
|
|
\end{lstlisting}
|
|
|
+\bigskip
|
|
|
|
|
|
-At this point can proceed and build Pocl. To to that we enter the directory with the sources and create a folder called \textit{build} in which we will have all the compiled stuff. At this point we take advantage of CMake for actually preparing our folder for the build. Usually a \lstinline{cmake ../} should suffice, but on the ODROID we have a little problem. Since our CPU is composed of four cortex a7 and four cortex a15 cores, CMake can't by itself understand what is the target CPU to use for the build. Luckily the two types of cores shares the same ISA, so we can explicitly tell CMake to use the cortex a15 as a target type of cpu. All we have to do is to launch \lstinline{cmake -DLLC\_HOST\_CPU=cortex-a15 ../}.\\
|
|
|
-At this point we are ready for the build, just type \lstinline{make -j8} and we are done. At this point we can run some tests with \lstinline{ctest -j8} just to be sure that everything went smooth, and finally install the runtime in the system with \lstinline{sudo make install}. At this point if everything went fine we will have a \lstinline{pocl.icd} file in \lstinline{/etc/OpenCL/vendors/}, and running \lstinline{clinfo} we should be able to see our brand new OpenCL runtime.\\
|
|
|
+At this point we can proceed and build pocl. To to that we enter the directory with the sources and create a folder called \textit{build} in which we will have all the compiled stuff. At this point we take advantage of \textbf{CMake} for actually preparing our folder for the build. Usually a \lstinline{cmake ../} should suffice, but on the ODROID we have a little problem.\\
|
|
|
+Since our CPU is composed of four cortex a7 and four cortex a15 cores, CMake can't by itself understand what is the target CPU to use for the build. Luckily the two types of cores shares the \textbf{same ISA}, so we can explicitly tell CMake to use the cortex a15 as a target type of cpu. All we have to do is to launch \lstinline{cmake -DLLC\_HOST\_CPU=cortex-a15 ../} .\\
|
|
|
+At this point we are ready for the build, just type \lstinline{make -j8} and we are done. We can also run some tests with \lstinline{ctest -j8}, just to be sure that everything went smooth, and finally install the runtime in the system with \lstinline{sudo make install}. At this point if everything went fine we will have a \lstinline{pocl.icd} file in \lstinline{/etc/OpenCL/vendors/}, and running \lstinline{clinfo} we should be able to see our brand new OpenCL runtime.\\
|
|
|
|
|
|
-Additionally in order to be able to use the runtime for the Mali GPU we additionally need to place a file containing:
|
|
|
+Additionally in order to be able to use the runtime for the \textbf{Mali GPU} we additionally need to place a file containing:
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
/usr/lib/arm-linux-gnueabihf/mali-egl/libOpenCL.so
|
|
|
\end{lstlisting}
|
|
|
|
|
|
in a file named \lstinline{mali.icd} at the path \lstinline{/etc/OpenCL/vendors/}.\\
|
|
|
-This should conclude the part regarding the OpenCL runtime deploy, and at this point we should be able to see both the CPU Pocl platform with an eight core device and the Mali GPU platform with two devices of four and 2 cores respectively.
|
|
|
+This should conclude the part regarding the OpenCL runtime deploy, and at this point we should be able to see both the CPU Pocl platform with an eight core device and the Mali GPU platform with two devices of four and two cores respectively.
|
|
|
|
|
|
\subsection{Build of the power measurement utility}
|
|
|
At this point we should get and compile the utility for measuring the power consumption of the board. The utility used is a modified version of the official utility provided by Hardkernel, that simply stores the consumption detected in a csv file, that we can later use for results analysis and plotting.
|
|
|
-For building the utility we start from \href{https://bitbucket.org/zanella_michele/odroid_smartpower_bridge}{the repository} containing utility, that has been kindly provided to me by \textit{Michele Zanella}.
|
|
|
-In bash commands:
|
|
|
+For building the utility we start from \href{https://bitbucket.org/zanella_michele/odroid_smartpower_bridge}{this repository}.\\
|
|
|
+The use of the utility has been kindly granted to me by \textit{Michele Zanella}, who is the main maintainer of the utility. He also helped me understanding how to make the utility work on the board, and he helped me debugging a problem with the setup of the USB interface and kindly agreed to publish on his repository a dedicated branch were all the unnecessary Qt dependencies have been removed.\\
|
|
|
+As first step we can retrieve the repository with the following bash command:
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
git clone https://bitbucket.org/zanella_michele/odroid_smartpower_bridge
|
|
|
\end{lstlisting}
|
|
|
|
|
|
-At this point we should switch to the \textbf{no\_qt} branch with a simple \lstinline{git checkout no_qt}. In this branch all the non essential dependencies to Qt libraries have been removed, in order to avoid cluttering the board with the full KDE framework for just storing an integer representing the consumption.\\
|
|
|
-Unfortunately the HIDAPI library provided with the sources of the utility has been already compiled for x86 and stored in the repository, causing an error when trying to link the utility. To avoid this we need to recompile the library, by entering the HIDAPI folder and giving the following commands:
|
|
|
+At this point we should switch to the \textbf{no\_qt} branch with a simple \lstinline{git checkout no_qt}. In this branch all the non essential dependencies to Qt libraries have been removed, in order to avoid cluttering the board with the full KDE framework for just storing an integer representing the consumption. Of course if we want to have available the original GUI interface we need to compile the version present on the \textbf{master} branch.\\
|
|
|
+Unfortunately the HIDAPI library provided with the sources of the utility has been already compiled for x86 and stored in the repository, causing an error when trying to link the utility.\\
|
|
|
+To avoid this we need to recompile the library, by entering the HIDAPI folder and giving the following commands:
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
qmake
|
|
@@ -113,7 +125,7 @@ The benchmark sources imported a \textbf{timer} utility for debug purposes that
|
|
|
This benchmark didn't compile for problems with the import of the \textit{rand()} function, so we fixed this. In addition the platform and device selection was not parametrized, so we also changed this. In this case we use the standard convention on the parameters as explained before.
|
|
|
|
|
|
\subsubsection{Dwt2d}
|
|
|
-Implemented the device selection and fixed a bug with a \lstinline{char} variable not compatible with our architecture.
|
|
|
+Implemented the device selection and fixed a bug with a \lstinline{char} variable not compatible with our architecture. Since the -d flag was already taken in this benchmark to specify the dimension we used -i for the device id specification.
|
|
|
|
|
|
\subsubsection{Gaussian}
|
|
|
This benchmark already presented a prototype of platform and device selection. Added the possibility to select also the device type and changed some minor details in the use of the OpenCL primitives.
|
|
@@ -169,3 +181,30 @@ One problem that you can spot as soon as you look at a single commit is that the
|
|
|
To avoid cluttering the commits is a lot of blank space removals, substitutions of tabs with white-space I preferred to disable on my editor all mechanism that corrected this thing and leave the source code with misaligned lined but at least highlighting only the changes really made to the source.\\
|
|
|
I then tried as much as possible all this things in a later commit that simply tries to fix all this things to obtain a source code not horrible.\\
|
|
|
I apologize for this inconvenient and I ask you to not look at this problems withing the commits, but I preferred to keep them as little as possible to have a better chance to spot the real modifications made and to get lost in a commit with thousands of line added and removed to fix a tab.
|
|
|
+
|
|
|
+\subsection{Running the benchmarks}
|
|
|
+Arrived at this point we should have a working version of the benchmarks. We can then proceed to run them on our board.
|
|
|
+We can take advantage of the scripts present in the folder of each benchmark to run it on the different devices available.\\
|
|
|
+As the names of the run scripts say:
|
|
|
+\begin{itemize}
|
|
|
+ \item \lstinline{run-cpu} runs the benchmark on the 8 cores of the CPU belonging the pocl OpenCl platform
|
|
|
+ \item \lstinline{run-gpu-primary} runs the benchmark on the GPU device composed of 4 cores belonging to the Mali OpenCL platform
|
|
|
+ \item \lstinline{run-gpu-secondary} runs the benchmark on the GPU device composed of 2 cores belonging to the Mali OpenCL platform
|
|
|
+\end{itemize}
|
|
|
+We can also use the targets present in the Makefile inside the benchmark directory to conveniently run the sequence of all the benchmarks. We have:
|
|
|
+\begin{itemize}
|
|
|
+ \item \lstinline{OPENCL_BENCHMARK_CPU} to run all the benchmarks on the cpu
|
|
|
+ \item \lstinline{OPENCL_BENCHMARK_GPU_PRIMARY} to run the benchmarks on the GPU device 1
|
|
|
+ \item \lstinline{OPENCL_BENCHMARK_GPU_SECONDARY} to run the benchmarks on the GPU device 1
|
|
|
+ \item \lstinline{OPENCL_BENCHMARK_ALL} to run the benchmarks on alle the three previous devices
|
|
|
+ \item \lstinline{OPENCL_BENCHMARK_GPU} to run the benchmarks on the GPU device 1 (kept for compatibility reasons)
|
|
|
+\end{itemize}
|
|
|
+\bigskip
|
|
|
+
|
|
|
+These targets automatically invoke the \lstinline{time-and-save} bash script that is responsible of timing the benchmarks and collecting the power measurements using the \lstinline{SmartPower} utility. Then it saves the results in differents files all named \lstinline{total.dat} each one in the right sub-folder in the \lstinline{results} folder in the benchmark folder.
|
|
|
+The files are basically \textit{csv} files with three columns, and each record is composed of, in order:
|
|
|
+\begin{itemize}
|
|
|
+ \item the name of the benchmark
|
|
|
+ \item the run time of the run expressed in seconds
|
|
|
+ \item the consumption expressed in Watt/hour
|
|
|
+\end{itemize}
|