db2-presentation.tex 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303
  1. \documentclass{beamer}
  2. \usepackage[utf8]{inputenc}
  3. \usepackage{graphicx}
  4. \usepackage{dirtytalk}
  5. \usepackage{epstopdf}
  6. \usepackage{hyperref}
  7. \graphicspath{ {images/} }
  8. \usetheme{CambridgeUS}
  9. \usecolortheme{beaver}
  10. \AtBeginSection[]
  11. {
  12. \begin{frame}
  13. \frametitle{Table of Contents}
  14. \tableofcontents[currentsection]
  15. \end{frame}
  16. }
  17. \title{Databases 2 - Optional Paper Presentation}
  18. \author{Andrea Gussoni}
  19. \institute{Politecnico di Milano}
  20. \date{July 15, 2016}
  21. \begin{document}
  22. \frame{\titlepage}
  23. \section{Coordination Avoidance}
  24. \begin{frame}
  25. \frametitle{Some information on the paper}
  26. \begin{itemize}
  27. \item \textbf{Title:} Coordination Avoidance in Database Systems.
  28. \item \textbf{Authors:} Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica.
  29. \item Presented at \textbf{2015 VLDB}.
  30. \end{itemize}
  31. \end{frame}
  32. \begin{frame}
  33. \frametitle{The Starting Problem}
  34. At the present time, Database Systems in a distributed scenario are increasingly common. This means that the task of coordinating different entities is assuming a lot of importance.
  35. \end{frame}
  36. \begin{frame}
  37. \frametitle{The Starting Problem}
  38. Usually \textbf{concurrency control} protocols are necessary because we want to guarantee the consistency of the application level data through the use of a database layer that check and solve the possible problems and conflicts. An example can be the use of a 2PL serialization technique that is often used in commercial DBMS.
  39. \end{frame}
  40. \begin{frame}
  41. \frametitle{The Starting Problem}
  42. Mixing this with a distributed scenario means the necessity to introduce complex algorithms (such as 2PC) that coordinate the various entities involved in the transactions, introducing latency. Coordination also means that we cannot exploit all the parallel resources of a distributed environment, because we have a huge overhead introduced by the coordination phase.
  43. \end{frame}
  44. \begin{frame}
  45. \frametitle{The Starting Problem}
  46. Usually we pay coordination overhead in term of:
  47. \begin{itemize}
  48. \item Increased latency.
  49. \item Decreased throughput.
  50. \item Unavailability (in case of failures).
  51. \end{itemize}
  52. \end{frame}
  53. \begin{frame}
  54. \frametitle{Invariant Confluence}
  55. The authors of the paper discuss this new technique (or better analysis framework) that if applied, it will reduce in a considerable way the need of coordination between the Database entities, reducing the cost in terms of bandwidth and latency, increasing considerably the overall throughput of the system.
  56. \end{frame}
  57. \begin{frame}
  58. \frametitle{Invariant Confluence}
  59. The main idea here is not to introduce some new exotic way to improve the coordination task, but instead the authors predicate on the fact that there is a set of workloads that do not require coordination, and that can be executed in parallel. The programmer at the application level can then state in an explicit way the \emph{invariants}, special attributes of the tables that need coordination in case of concurrent operations executing on them.
  60. \end{frame}
  61. \begin{frame}
  62. \frametitle{The Model}
  63. The main concepts introduced:
  64. \begin{itemize}
  65. \item Invariants \pause
  66. \item Transactions \pause
  67. \item Replicas \pause
  68. \item (\emph{I-})Convergence \pause
  69. \item Merging
  70. \end{itemize}
  71. \end{frame}
  72. \begin{frame}
  73. \frametitle{Convergence}
  74. This is a figure that explains the main concept behind the idea of convergence:
  75. \includegraphics[width=\textwidth]{convergence}
  76. \end{frame}
  77. \begin{frame}
  78. \frametitle{Coordination-Free Execution}
  79. Here instead we show the basic evolution of a simple coordination free execution and the consequent merging operation:
  80. \includegraphics[width=\textwidth]{coordination-free}
  81. \end{frame}
  82. \begin{frame}
  83. \frametitle{Invariants}
  84. \begin{itemize}
  85. \item It is important to note that \textbf{coordination can only be avoided if all local commit decisions are globally valid.}\pause
  86. \item So the best approach to guarantee the application level consistency is to apply a convergence analysis and then identify the \emph{true conflicts}. The uncertain situations must be threated in a conservative approach. \pause
  87. \item This means that we rely on the analysis done by the programmer at the application level to guarantee the correctness. This is clearly a drawback.
  88. \end{itemize}
  89. \end{frame}
  90. \begin{frame}
  91. \frametitle{Invariants}
  92. Luckily there are some standard situations for the analysis of invariants that we can use as boilerplate in the building of the set of invariants of our application, this figure summarizes the main cases:
  93. \centering
  94. \includegraphics[width=0.85\textwidth,height=0.7\textheight]{invariants}
  95. \end{frame}
  96. \begin{frame}
  97. \frametitle{Benchmarking}
  98. \begin{itemize}
  99. \item The authors then proceeded to implement this new framework and test it with a standard benchmark, the TPC-C benchmark, that is said to be \say{the gold standard for database concurrency control both in research and industry.}
  100. \item They also used RAMP transactions, that are transactions that \say{employ limited multi-versioning and metadata to ensure that readers and writers can always proceed concurrently.}
  101. \item The selected language for the prototype is Scala, used for reason of compactness of the code.
  102. \end{itemize}
  103. \end{frame}
  104. \begin{frame}
  105. \frametitle{Benchmarking}
  106. In the next few slides there are some plots of the result obtained in the benchmarks by the authors. The New-Order label refers to the fact that the authors when an unique id assignment was needed, they assigned a \emph{temp-ID}, and only just before the commit a sequential \emph{real-ID} was assigned, and a table mapping \emph{tmp-ID} to \emph{real-ID} created.
  107. \end{frame}
  108. \begin{frame}
  109. \frametitle{Results}
  110. \begin{figure}
  111. \caption{TPC-C New-Order throughput across eight servers.}
  112. \centering
  113. \includegraphics[width=0.55\textwidth,height=0.73\textheight]{results1-1}
  114. \end{figure}
  115. \end{frame}
  116. \begin{frame}
  117. \frametitle{Results}
  118. \begin{figure}
  119. \caption{Coordination-avoiding New-Order scalability.}
  120. \centering
  121. \includegraphics[width=0.70\textwidth,height=0.70\textheight]{results1-2}
  122. \end{figure}
  123. \end{frame}
  124. \begin{frame}
  125. \frametitle{Conclusions}
  126. This paper demonstrates that ACID transactions and associated strong isolation levels dominated the field of database concurrency. This is a powerful abstractions that automatically guarantee consistency at the application level. In a distributed scenario where we want to achieve \textbf{high scalability}, we can sacrifice these abstractions and perform an \textbf{I-Confluence} analysis in order to exploit scalability through \textbf{coordination-free} transactions
  127. \end{frame}
  128. \section{Trekking Through Siberia}
  129. \begin{frame}
  130. \frametitle{Some information on the paper}
  131. \begin{itemize}
  132. \item \textbf{Title:} Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database.
  133. \item \textbf{Authors:} Ahmed Eldawy, Justin Levandoski, Per-Ake Larson.
  134. \item Presented at \textbf{2014 VLDB}.
  135. \end{itemize}
  136. \end{frame}
  137. \begin{frame}
  138. \frametitle{Introduction}
  139. With the drop in memory prices a set of in \textbf{main memory} database emerged. While for most of OLTP workloads often this solution is reasonable, due to the fact that often databases exhibit a skewed access patterns that divide records in hot (frequently accessed) and cold (rarely accessed) it is still convenient to find a way to maintain the hot records in memory and the cold ones on for example flash storage, that is still a lot less expensive than memory.
  140. \end{frame}
  141. \begin{frame}
  142. \frametitle{Introduction}
  143. In this paper it is presented \textbf{Project Siberia}, an extension to the \textbf{Hekaton} engine of Microsoft SQL Server that aims to pursue these objectives:
  144. \begin{itemize}
  145. \item Cold data classification.
  146. \item Cold data storage.
  147. \item Cold storage access reduction.
  148. \item Cold data access and migration mechanism (the focus of this paper is on this aspect).
  149. \end{itemize}
  150. \end{frame}
  151. \begin{frame}
  152. \frametitle{Hekaton}
  153. This figure shows how the storage and indexing is done in Hekaton:
  154. \centering
  155. \includegraphics[width=0.70\textwidth,height=0.70\textheight]{hekaton-storage}
  156. \end{frame}
  157. \begin{frame}
  158. \frametitle{Hekaton}
  159. Hekaton utilizes optimistic multi-version concurrency control (MVCC), it mainly leverage these features of timestamps to obtain this:
  160. \begin{itemize}
  161. \item Commit/End Time (useful to determine the serialization order).
  162. \item Valid Time.
  163. \item Logical Read Time (start time of the transaction).
  164. \end{itemize}
  165. \end{frame}
  166. \begin{frame}
  167. \frametitle{Some important data structures}
  168. \begin{figure}
  169. \caption{Structure of a record in the cold store.}
  170. \centering
  171. \includegraphics[width=0.30\textwidth,height=0.10\textheight]{cold-record}
  172. \end{figure}
  173. \begin{figure}
  174. \caption{Structure of a cached record.}
  175. \centering
  176. \includegraphics[width=0.75\textwidth,height=0.15\textheight]{cached-record}
  177. \end{figure}
  178. \begin{figure}
  179. \caption{Structure of a timestamp notice in the update memo.}
  180. \centering
  181. \includegraphics[width=0.70\textwidth,height=0.15\textheight]{timestamp-notice}
  182. \end{figure}
  183. \end{frame}
  184. \begin{frame}
  185. \frametitle{The main operations on the cold storage}
  186. \begin{itemize}
  187. \item \textbf{Insert:} done always in the hot storage.
  188. \item \textbf{Migration to cold storage:} is the only way to move a record from the hot storage to the cold one.
  189. \item \textbf{Delete.}
  190. \item \textbf{Updates:} a delete operation on the cold storage followed by an insertion in the hot storage.
  191. \item \textbf{Read.}
  192. \item \textbf{Update Memo and Cold Store Cleaning.}
  193. \end{itemize}
  194. \end{frame}
  195. \begin{frame}
  196. \frametitle{Focus on migration}
  197. \begin{figure}
  198. \caption{Contents of cold store, hot store, and update memo during migration of a record.}
  199. \centering
  200. \includegraphics[width=0.60\textwidth,height=0.60\textheight]{cold-migration}
  201. \end{figure}
  202. \end{frame}
  203. \begin{frame}
  204. \frametitle{Focus on delete}
  205. \begin{figure}
  206. \caption{Effect on the cold store and update memo of a record deletion.}
  207. \centering
  208. \includegraphics[width=0.60\textwidth,height=0.70\textheight]{cold-delete}
  209. \end{figure}
  210. \end{frame}
  211. \begin{frame}
  212. \frametitle{Takeaways}
  213. \begin{itemize}
  214. \item We nedd a new phase called \textbf{validation}, that checks just before a commit action that all the records used during the transactions still exist, are valid or are have not been modified.
  215. \item There is \textbf{no deletion} in the strict sense of the term. The to-delete records have they end timestamps changed, and the garbage collection remove the unused records (when all the transactions alive begun after the deletion).
  216. \end{itemize}
  217. \end{frame}
  218. \begin{frame}
  219. \frametitle{Benchmarks}
  220. The authors utilized two types of benchmarks:
  221. \begin{itemize}
  222. \item \textbf{YCSB Benchmark} (50GB single-table database, 1KB records) that is divided in:
  223. \begin{itemize}
  224. \item Read-heavy: 90\% reads and \% updates.
  225. \item Write-heavy: 50\% reads and 50\% writes.
  226. \item Read-only: 100\5 reads.
  227. \end{itemize}
  228. \item \textbf{Multi-step read/update workload} (1GB single-table database, 56 bytes records) that is divided in:
  229. \begin{itemize}
  230. \item Read-only.
  231. \item Update-only.
  232. \end{itemize}
  233. \end{itemize}
  234. \end{frame}
  235. \begin{frame}
  236. \frametitle{In-Memory Cold storage}
  237. These analysis is done in order to isolate the overhead caused by only the Siberia framework, eliminating the time due to I/O operations.
  238. \begin{figure}
  239. \caption{In-memory overhead of the Siberia framework.}
  240. \centering
  241. \includegraphics[width=0.7\textwidth,height=0.60\textheight]{in-memory-overhead}
  242. \end{figure}
  243. \end{frame}
  244. \begin{frame}
  245. \frametitle{Template}
  246. \end{frame}
  247. \begin{frame}
  248. \frametitle{Template}
  249. \end{frame}
  250. \begin{frame}
  251. \frametitle{License}
  252. \centering
  253. \includegraphics[width=0.3\textwidth]{license}\\
  254. This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of
  255. this license, visit \url{http://creativecommons.org/licenses/by/4.0/}
  256. or send a letter to Creative Commons, 444 Castro Street, Suite
  257. 900, Mountain View, California, 94041, USA.
  258. \end{frame}
  259. \end{document}