123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177 |
- \documentclass{beamer}
- \usepackage[utf8]{inputenc}
- \usepackage{graphicx}
- \usepackage{dirtytalk}
- \usepackage{epstopdf}
- \usepackage{hyperref}
- \graphicspath{ {images/} }
- \usetheme{CambridgeUS}
- \usecolortheme{beaver}
- \AtBeginSection[]
- {
- \begin{frame}
- \frametitle{Table of Contents}
- \tableofcontents[currentsection]
- \end{frame}
- }
- \title{Databases 2 - Optional Paper Presentation}
- \author{Andrea Gussoni}
- \institute{Politecnico di Milano}
- \date{July 15, 2016}
- \begin{document}
- \frame{\titlepage}
- \section{Coordination Avoidance}
- \begin{frame}
- \frametitle{Some information on the paper}
- \begin{itemize}
- \item \textbf{Title:} Coordination Avoidance in Database Systems.
- \item \textbf{Authors:} Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica.
- \item Presented at \textbf{2015 VLDB}.
- \end{itemize}
- \end{frame}
- \begin{frame}
- \frametitle{The Starting Problem}
- At the present time, Database Systems in a distributed scenario are increasingly common. This means that the task of coordinating different entities is assuming a lot of importance.
- \end{frame}
- \begin{frame}
- \frametitle{The Starting Problem}
- Usually \textbf{concurrency control} protocols are necessary because we want to guarantee the consistency of the application level data through the use of a database layer that check and solve the possible problems and conflicts. An example can be the use of a 2PL serialization technique that is often used in commercial DBMS.
- \end{frame}
- \begin{frame}
- \frametitle{The Starting Problem}
- Mixing this with a distributed scenario means the necessity to introduce complex algorithms (such as 2PC) that coordinate the various entities involved in the transactions, introducing latency. Coordination also means that we cannot exploit all the parallel resources of a distributed environment, because we have a huge overhead introduced by the coordination phase.
- \end{frame}
- \begin{frame}
- \frametitle{The Starting Problem}
- Usually we pay coordination overhead in term of:
- \begin{itemize}
- \item Increased latency.
- \item Decreased throughput.
- \item Unavailability (in case of failures).
- \end{itemize}
- \end{frame}
- \begin{frame}
- \frametitle{Invariant Confluence}
- The authors of the paper discuss this new technique (or better analysis framework) that if applied, it will reduce in a considerable way the need of coordination between the Database entities, reducing the cost in terms of bandwidth and latency, increasing considerably the overall throughput of the system.
- \end{frame}
- \begin{frame}
- \frametitle{Invariant Confluence}
- The main idea here is not to introduce some new exotic way to improve the coordination task, but instead the authors predicate on the fact that there is a set of workloads that do not require coordination, and that can be executed in parallel. The programmer at the application level can then state in an explicit way the \emph{invariants}, special attributes of the tables that need coordination in case of concurrent operations executing on them.
- \end{frame}
- \begin{frame}
- \frametitle{The Model}
- The main concepts introduced:
- \begin{itemize}
- \item Invariants \pause
- \item Transactions \pause
- \item Replicas \pause
- \item (\emph{I-})Convergence \pause
- \item Merging
- \end{itemize}
- \end{frame}
- \begin{frame}
- \frametitle{Convergence}
- This is a figure that explains the main concept behind the idea of convergence:
- \includegraphics[width=\textwidth]{convergence}
- \end{frame}
- \begin{frame}
- \frametitle{Coordination-Free Execution}
- Here instead we show the basic evolution of a simple coordination free execution and the consequent merging operation:
- \includegraphics[width=\textwidth]{coordination-free}
- \end{frame}
- \begin{frame}
- \frametitle{Invariants}
- \begin{itemize}
- \item It is important to note that \textbf{coordination can only be avoided if all local commit decisions are globally valid.}\pause
- \item So the best approach to guarantee the application level consistency is to apply a convergence analysis and then identify the \emph{true conflicts}. The uncertain situations must be threated in a conservative approach. \pause
- \item This means that we rely on the analysis done by the programmer at the application level to guarantee the correctness. This is clearly a drawback.
- \end{itemize}
- \end{frame}
- \begin{frame}
- \frametitle{Invariants}
- Luckily there are some standard situations for the analysis of invariants that we can use as boilerplate in the building of the set of invariants of our application, this figure summarizes the main cases:
- \centering
- \includegraphics[width=0.85\textwidth,height=0.7\textheight]{invariants}
- \end{frame}
- \begin{frame}
- \frametitle{Benchmarking}
- \begin{itemize}
- \item The authors then proceeded to implement this new framework and test it with a standard benchmark, the TPC-C benchmark, that is said to be \say{the gold standard for database concurrency control both in research and industry.}
- \item They also used RAMP transactions, that are transactions that \say{employ limited multi-versioning and metadata to ensure that readers and writers can always proceed concurrently.}
- \item The selected language for the prototype is Scala, used for reason of compactness of the code.
- \end{itemize}
- \end{frame}
- \begin{frame}
- \frametitle{Benchmarking}
- In the next few slides there are some plots of the result obtained in the benchmarks by the authors. The New-Order label refers to the fact that the authors when an unique id assignment was needed, they assigned a \emph{temp-ID}, and only just before the commit a sequential \emph{real-ID} was assigned, and a table mapping \emph{tmp-ID} to \emph{real-ID} created.
- \end{frame}
- \begin{frame}
- \frametitle{Results}
- \begin{figure}
- \caption{TPC-C New-Order throughput across eight servers.}
- \centering
- \includegraphics[width=0.55\textwidth,height=0.73\textheight]{results1-1}
- \end{figure}
- \end{frame}
- \begin{frame}
- \frametitle{Results}
- \begin{figure}
- \caption{Coordination-avoiding New-Order scalability.}
- \centering
- \includegraphics[width=0.70\textwidth,height=0.70\textheight]{results1-2}
- \end{figure}
- \end{frame}
- \begin{frame}
- \frametitle{Conclusions}
- This paper demonstrates that ACID transactions and associated strong isolation levels dominated the field of database concurrency. This is a powerful abstractions that automatically guarantee consistency at the application level. In a distributed scenario where we want to achieve \textbf{high scalability}, we can sacrifice these abstractions and perform an \textbf{I-Confluence} analysis in order to exploit scalability through \textbf{coordination-free} transactions
- \end{frame}
- \section{Trekking Through Siberia}
- \begin{frame}
- \frametitle{Some information on the paper}
- \begin{itemize}
- \item \textbf{Title:} Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database.
- \item \textbf{Authors:} Ahmed Eldawy, Justin Levandoski, Per-Ake Larson.
- \item Presented at \textbf{2014 VLDB}.
- \end{itemize}
- \end{frame}
- \begin{frame}
- \frametitle{Template}
- \end{frame}
- \begin{frame}
- \frametitle{License}
- \centering
- \includegraphics[width=0.3\textwidth]{license}\\
- This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of
- this license, visit \url{http://creativecommons.org/licenses/by/4.0/}
- or send a letter to Creative Commons, 444 Castro Street, Suite
- 900, Mountain View, California, 94041, USA.
- \end{frame}
- \end{document}
|