12345678910111213141516171819202122232425 |
- \section{Conclusions and Future Works}
- \subsection{OpenCL and Heterogeneous Computing}
- We have seen that even a quite affordable and small ARM board has real potential in running computational intensive applications. With a fraction of the cost of an X86 platform we can achieve really surprising results, also remembering that the power consumption is really limited.\\
- This, together with applications written for \textbf{heterogeneous computing platforms}, can really open the road to embedded devices running heavy tasks with a fraction of the costs of a standard X86 platform. The development of the Linux Kernel for ARM devices has really advanced a lot in the last years, and now we have a lot of alternatives for running a Linux-based distribution on an ARM device. So having available our favorite platform is no more a problem.\\
- What I really think is important is to have a common programming language or framework that enables exploiting all the computational power provided by a board like an ODROID with the minimum effort possible in porting the applications.\\
- \smallskip
- \\
- I never worked with OpenCL before, but the approach that underlies the project is really promising, since once we have a well written OpenCL kernel we can basically run it on different types of devices without additional effort. If we imagine this applied on a dedicated board with a lot of GPU devices and a \textit{small} CPU that only serves for running the OS and dispatching the tasks, we can easily obtain power efficient devices able to run heavy tasks. In addition to this we could have also other types of accelerators (such FPGAs, co-processors, cryptographic accelerators) that could benefit from this type of architecture.
- \\
- There are also other examples of parallel and heterogeneous computing oriented platforms and programming paradigms (such as CUDA or OpenMP), but OpenCL really suits well the environment of embedded and low power platforms in my opinion.\\
- The main challenge will be to exploit as much as possible all the computational power provided by these kind of boards, since we have seen that using a GPU can reduce by an half or more the execution of graphical oriented applications, and not exploiting all the hardware available on our device is really a waste.\\
- Another main goal should be to understand when a task is more efficient on a GPU/CPU/Accelerator, and dispatch it accordingly to this criteria. For example in the case of our benchmarks we would like to have a policy that dispatch the streamcluster task on the GPU, while the gaussian one on the CPU, where it is more efficient.
- \subsection{Possible future extensions}
- A natural continuation for this project will be to better investigate how much the performances of the benchmarks are affected by the read/write speed on the main storage. Some benchmarks work on really huge input files, so it is probable that this fact affects in a huge way the final data that we get from the benchmarks.\\
- We could try to run the benchmarks on inputs of different sizes, or track how much of the time is used for reading the files and how much for doing the actual computation. We could also try to preallocate in the main memory the data for example using a \textbf{tmpfs} file system and try to re-execute the benchmarks and see the differences.
- \medskip
- \\
- Another possible extension could be to try to investigate how the benchmarks execution time and power consumption are affected by scheduling policy of the Linux kernel. We could try to change the scheduling class \cite{linuxscheduling} when executing the benchmarks and try to see how the results are affected by this changes.
- \smallskip
- \\
- Another interesting thing to do would be to try to compare the power consumption of an X86 machine with the one of the ODROID, but of course we would need an appropriate criteria to fairly compare the two measurements.
- \smallkip
- \\ Another interesting thing could be to try to execute the benchmarks on only the big or LITTLE core provided by the CPU. In this manner we can see how much the performances vary when using the low power cores or when using the high power ones, or even better understand if the LITTLE cores are in some way a bottleneck for the application.
- \pagebreak
|