db2-presentation.tex 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348
  1. \documentclass{beamer}
  2. \usepackage[utf8]{inputenc}
  3. \usepackage{graphicx}
  4. \usepackage{dirtytalk}
  5. \usepackage{epstopdf}
  6. \usepackage{hyperref}
  7. \graphicspath{ {images/} }
  8. \usetheme{CambridgeUS}
  9. \usecolortheme{beaver}
  10. \AtBeginSection[]
  11. {
  12. \begin{frame}
  13. \frametitle{Table of Contents}
  14. \tableofcontents[currentsection]
  15. \end{frame}
  16. }
  17. \title{Databases 2 - Optional Presentation}
  18. \author{Andrea Gussoni}
  19. \institute{Politecnico di Milano}
  20. \date{July 15, 2016}
  21. \begin{document}
  22. \frame{\titlepage}
  23. \section{Coordination Avoidance}
  24. \begin{frame}
  25. \frametitle{Some information on the paper}
  26. \begin{itemize}
  27. \item \textbf{Title:} Coordination Avoidance in Database Systems.
  28. \item \textbf{Authors:} Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica.
  29. \item Presented at \textbf{2015 VLDB}.
  30. \end{itemize}
  31. \end{frame}
  32. \begin{frame}
  33. \frametitle{The Problem}
  34. At the present time, Database Systems in a distributed scenario are increasingly common. This means that the task of coordinating different entities is assuming a lot of importance.
  35. \end{frame}
  36. \begin{frame}
  37. \frametitle{The Problem}
  38. Usually \textbf{concurrency control} protocols are necessary because we want to guarantee the consistency of the application level data through the use of a database layer that check and solve the possible problems and conflicts. An example can be the use of a 2PL serialization technique that is often used in commercial DBMS.
  39. \end{frame}
  40. \begin{frame}
  41. \frametitle{The Problem}
  42. Mixing this with a distributed scenario means the necessity to introduce \textbf{complex algorithms} (such as 2PC) that coordinate the various entities involved in the transactions, \textbf{introducing latency}. Coordination also means that we cannot exploit all the parallel resources of a distributed environment, because we have a huge overhead introduced by the coordination phase.
  43. \end{frame}
  44. \begin{frame}
  45. \frametitle{The Problem}
  46. Usually we pay coordination overhead in term of:
  47. \begin{itemize}
  48. \item Increased latency.
  49. \item Decreased throughput.
  50. \item Unavailability (in case of failures).
  51. \end{itemize}
  52. \end{frame}
  53. \begin{frame}
  54. \frametitle{The Problem}
  55. \begin{figure}
  56. \caption{Microbenchmark performance of coordinated and coordination-free execution on eight separate multi-core servers.}
  57. \centering
  58. \includegraphics[width=0.85\textwidth,height=0.60\textheight]{2pl-free}
  59. \end{figure}
  60. \end{frame}
  61. \begin{frame}
  62. \frametitle{Invariant Confluence}
  63. The authors of the paper discuss this new technique (or better \textbf{analysis framework}) that if applied, it will reduce in a considerable way the need of coordination between the Database entities, reducing the cost in terms of bandwidth and latency, increasing considerably the overall throughput of the system.
  64. \end{frame}
  65. \begin{frame}
  66. \frametitle{Invariant Confluence}
  67. The main idea here is not to introduce some new exotic way to improve the coordination task, but instead the authors predicate on the fact that there is a set of workloads that do \textbf{not require coordination}, and that can be executed in parallel. The programmer at the application level can then state in an explicit way the \emph{invariants}, special attributes of the tables that need coordination in case of concurrent operations executing on them.
  68. \end{frame}
  69. \begin{frame}
  70. \frametitle{The Model}
  71. The main concepts introduced:
  72. \begin{itemize}
  73. \item Invariants \pause
  74. \item Transactions \pause
  75. \item Replicas \pause
  76. \item (\emph{I-})Convergence \pause
  77. \item Merging
  78. \end{itemize}
  79. \end{frame}
  80. \begin{frame}
  81. \frametitle{Convergence}
  82. This is a figure that explains the main concept behind the idea of convergence:
  83. \includegraphics[width=\textwidth]{convergence}
  84. \end{frame}
  85. \begin{frame}
  86. \frametitle{Coordination-Free Execution}
  87. Here instead we show the basic evolution of a simple coordination free execution and the consequent merging operation:
  88. \includegraphics[width=\textwidth]{coordination-free}
  89. \end{frame}
  90. \begin{frame}
  91. \frametitle{Invariants}
  92. \begin{itemize}
  93. \item It is important to note that \textbf{coordination can only be avoided if all local commit decisions are globally valid.}\pause
  94. \item So the best approach to guarantee the application level consistency is to apply a convergence analysis and then identify the \textbf{true conflicts}. The uncertain situations must be threated in a conservative approach. \pause
  95. \item This means that we rely on the \textbf{analysis} done by the programmer at the application level to guarantee the correctness. This is clearly a drawback.
  96. \end{itemize}
  97. \end{frame}
  98. \begin{frame}
  99. \frametitle{Invariants}
  100. Luckily there are some standard situations for the analysis of invariants that we can use as \textbf{boilerplate} in the building of the set of invariants of our application, this figure summarizes the main cases:
  101. \centering
  102. \includegraphics[width=0.85\textwidth,height=0.7\textheight]{invariants}
  103. \end{frame}
  104. \begin{frame}
  105. \frametitle{Benchmarking}
  106. \begin{itemize}
  107. \item The authors then proceeded to implement this new framework and test it with a standard benchmark, the \textbf{TPC-C} benchmark, that is said to be \say{the gold standard for database concurrency control both in research and industry.}
  108. \item They also used \textbf{RAMP} transactions, that are transactions that \say{employ limited multi-versioning and metadata to ensure that readers and writers can always proceed concurrently.}
  109. \item The selected language for the prototype is \textbf{Scala}, used for reason of compactness of the code.
  110. \end{itemize}
  111. \end{frame}
  112. \begin{frame}
  113. \frametitle{Benchmarking}
  114. In the next few slides there are some plots of the results obtained by the authors. The \textbf{New-Order} label refers to the fact that the authors, when an unique id assignment was needed, decided to assign a \emph{temp-ID}, and only just before the commit, a sequential (as required from the specifications of the benchamrk) \emph{real-ID} is assigned, and a table mapping \emph{tmp-ID} to \emph{real-ID} is created.
  115. \end{frame}
  116. \begin{frame}
  117. \frametitle{Results}
  118. \begin{figure}
  119. \caption{TPC-C New-Order throughput across eight servers.}
  120. \centering
  121. \includegraphics[width=0.55\textwidth,height=0.73\textheight]{results1-1}
  122. \end{figure}
  123. \end{frame}
  124. \begin{frame}
  125. \frametitle{Results}
  126. \begin{figure}
  127. \caption{Coordination-avoiding New-Order scalability.}
  128. \centering
  129. \includegraphics[width=0.70\textwidth,height=0.70\textheight]{results1-2}
  130. \end{figure}
  131. \end{frame}
  132. \begin{frame}
  133. \frametitle{Conclusions}
  134. This paper demonstrates that ACID transactions and associated strong isolation levels dominated the field of database concurrency. This is a \textbf{powerful abstraction} that automatically guarantee consistency at the application level. In a distributed scenario where we want to achieve \textbf{high scalability}, we can sacrifice these abstractions and perform an \textbf{I-Confluence} analysis in order to exploit scalability through \textbf{coordination-free} transactions
  135. \end{frame}
  136. \section{Trekking Through Siberia}
  137. \begin{frame}
  138. \frametitle{Some information on the paper}
  139. \begin{itemize}
  140. \item \textbf{Title:} Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database.
  141. \item \textbf{Authors:} Ahmed Eldawy, Justin Levandoski, Per-Ake Larson.
  142. \item Presented at \textbf{2014 VLDB}.
  143. \end{itemize}
  144. \end{frame}
  145. \begin{frame}
  146. \frametitle{Introduction}
  147. With the drop in memory prices a set of in \textbf{main memory} database emerged. While for most of OLTP workloads often this solution is reasonable, due to the fact that often databases exhibit a skewed access patterns that divide records in \textbf{hot} (frequently accessed) and \textbf{cold} (rarely accessed) it is still convenient to find a way to maintain the hot records in memory and the cold ones on for example flash storage, that is still a lot less expensive than memory.
  148. \end{frame}
  149. \begin{frame}
  150. \frametitle{Introduction}
  151. In this paper it is presented \textbf{Project Siberia}, an extension to the \textbf{Hekaton} engine of Microsoft SQL Server that aims to pursue these objectives:
  152. \begin{itemize}
  153. \item Cold data classification.
  154. \item Cold data storage.
  155. \item Cold storage access reduction.
  156. \item \textbf{Cold data access and migration mechanism} (the focus of this paper is on this aspect).
  157. \end{itemize}
  158. \end{frame}
  159. \begin{frame}
  160. \frametitle{Hekaton}
  161. This figure shows how the storage and indexing is done in Hekaton:
  162. \centering
  163. \includegraphics[width=0.70\textwidth,height=0.70\textheight]{hekaton-storage}
  164. \end{frame}
  165. \begin{frame}
  166. \frametitle{Hekaton}
  167. Hekaton utilizes optimistic multi-version concurrency control (MVCC), it mainly leverage these features of timestamps to obtain this:
  168. \begin{itemize}
  169. \item Commit/End Time (useful to determine the serialization order).
  170. \item Valid Time.
  171. \item Logical Read Time (start time of the transaction).
  172. \end{itemize}
  173. \end{frame}
  174. \begin{frame}
  175. \frametitle{Some important data structures}
  176. \begin{figure}
  177. \caption{Structure of a record in the cold store.}
  178. \centering
  179. \includegraphics[width=0.30\textwidth,height=0.10\textheight]{cold-record}
  180. \end{figure}
  181. \begin{figure}
  182. \caption{Structure of a cached record.}
  183. \centering
  184. \includegraphics[width=0.75\textwidth,height=0.15\textheight]{cached-record}
  185. \end{figure}
  186. \begin{figure}
  187. \caption{Structure of a timestamp notice in the update memo.}
  188. \centering
  189. \includegraphics[width=0.70\textwidth,height=0.15\textheight]{timestamp-notice}
  190. \end{figure}
  191. \end{frame}
  192. \begin{frame}
  193. \frametitle{The main operations on the cold storage}
  194. \begin{itemize}
  195. \item \textbf{Insert:} done always in the hot storage.
  196. \item \textbf{Migration to cold storage:} is the only way to move a record from the hot storage to the cold one.
  197. \item \textbf{Delete.}
  198. \item \textbf{Updates:} a delete operation on the cold storage followed by an insertion in the hot storage.
  199. \item \textbf{Read.}
  200. \item \textbf{Update Memo and Cold Store Cleaning.}
  201. \end{itemize}
  202. \end{frame}
  203. \begin{frame}
  204. \frametitle{Focus on migration}
  205. \begin{figure}
  206. \caption{Contents of cold store, hot store, and update memo during migration of a record.}
  207. \centering
  208. \includegraphics[width=0.60\textwidth,height=0.60\textheight]{cold-migration}
  209. \end{figure}
  210. \end{frame}
  211. \begin{frame}
  212. \frametitle{Focus on delete}
  213. \begin{figure}
  214. \caption{Effect on the cold store and update memo of a record deletion.}
  215. \centering
  216. \includegraphics[width=0.60\textwidth,height=0.70\textheight]{cold-delete}
  217. \end{figure}
  218. \end{frame}
  219. \begin{frame}
  220. \frametitle{Observations}
  221. \begin{itemize}
  222. \item We need a new phase called \textbf{validation}, that checks just before a commit action that all the records used during the transactions still exist, are valid and have not been modified by another concurrent transaction.
  223. \item There is \textbf{no deletion} in the strict sense of the term. The to-delete records have they end timestamps changed, and the garbage collection remove the unused records (when all the transactions alive begun after the deletion).
  224. \end{itemize}
  225. \end{frame}
  226. \begin{frame}
  227. \frametitle{Benchmarks}
  228. The authors utilized two types of benchmarks:
  229. \begin{itemize}
  230. \item \textbf{YCSB Benchmark} (50GB single-table database, 1KB records) that is divided in:
  231. \begin{itemize}
  232. \item Read-heavy: 90\% reads and \% updates.
  233. \item Write-heavy: 50\% reads and 50\% writes.
  234. \item Read-only: 100\% reads.
  235. \end{itemize}
  236. \item \textbf{Multi-step read/update workload} (1GB single-table database, 56 bytes records) that is divided in:
  237. \begin{itemize}
  238. \item Read-only.
  239. \item Update-only.
  240. \end{itemize}
  241. \end{itemize}
  242. \end{frame}
  243. \begin{frame}
  244. \frametitle{In-Memory Cold storage}
  245. This analysis is done in order to isolate the overhead strictly caused by the Siberia framework, eliminating the latency of the I/O operations:
  246. \begin{figure}
  247. \caption{In-memory overhead of the Siberia framework.}
  248. \centering
  249. \includegraphics[width=0.7\textwidth,height=0.60\textheight]{in-memory-overhead}
  250. \end{figure}
  251. \end{frame}
  252. \begin{frame}
  253. \frametitle{Migration}
  254. This analysis instead focuses on the performance degradation of various types of workload during a \textbf{live migration} to the cold storage of parts of the database:
  255. \begin{figure}
  256. \caption{In-memory overhead of the Siberia framework.}
  257. \centering
  258. \includegraphics[width=0.65\textwidth,height=0.55\textheight]{migration}
  259. \end{figure}
  260. \end{frame}
  261. \begin{frame}
  262. \frametitle{Read-only workload with I/O}
  263. This analysis focuses on the performance degradation of a read-only workload with cold storage on flash (a similar analysis has been done for an update-only workload):
  264. \begin{figure}
  265. \centering
  266. \includegraphics[width=0.8\textwidth,height=0.7\textheight]{io-read-only-a}
  267. \end{figure}
  268. \end{frame}
  269. \begin{frame}
  270. \frametitle{Read-only workload with I/O}
  271. \begin{figure}
  272. \centering
  273. \includegraphics[width=0.9\textwidth,height=0.8\textheight]{io-read-only-b}
  274. \end{figure}
  275. \end{frame}
  276. \begin{frame}
  277. \frametitle{YCSB benchmark}
  278. \begin{figure}
  279. \caption{YCSB write-heavy workload.}
  280. \centering
  281. \includegraphics[width=0.8\textwidth,height=0.7\textheight]{ycsb-write-heavy}
  282. \end{figure}
  283. \end{frame}
  284. \begin{frame}
  285. \frametitle{Conclusions}
  286. There is related research in progress in this direction:
  287. \begin{itemize}
  288. \item \textbf{Buffer pool:} page indirection on disk.
  289. \item \textbf{HyPer:} hybrid between OLTP and OLAP, optimizes data in chunks using different virtual memory pages, finds the cold data and compress for OLAP usage.
  290. \end{itemize}\pause
  291. \begin{block}{}
  292. The approach used in Siberia has the great advantage to have an access at \textbf{record level}, and for databases where the cold storage is between 10\% and 20\% of the whole database it has the great advantage of \textbf{not requiring additional structures} in memory (except the compact bloom-filters) for the cold data.
  293. \end{block}
  294. \end{frame}
  295. \begin{frame}
  296. \frametitle{License}
  297. \centering
  298. \includegraphics[width=0.3\textwidth]{license}\\
  299. This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of
  300. this license, visit \url{http://creativecommons.org/licenses/by/4.0/}
  301. or send a letter to Creative Commons, 444 Castro Street, Suite
  302. 900, Mountain View, California, 94041, USA.
  303. \end{frame}
  304. \end{document}