9 anos atrás · 4e4464f904
--- a/source/db2-presentation.tex
+++ b/source/db2-presentation.tex
@@ -48,7 +48,7 @@ Usually \textbf{concurrency control} protocols are necessary because we want to
 
				 
			
 
				 \begin{frame}
			
 
				 \frametitle{The Problem}
			
 
				-Mixing this with a distributed scenario means the necessity to introduce complex algorithms (such as 2PC) that coordinate the various entities involved in the transactions, introducing latency. Coordination also means that we cannot exploit all the parallel resources of a distributed environment, because we have a huge overhead introduced by the coordination phase.
			
 
				+Mixing this with a distributed scenario means the necessity to introduce \textbf{complex algorithms} (such as 2PC) that coordinate the various entities involved in the transactions, \textbf{introducing latency}. Coordination also means that we cannot exploit all the parallel resources of a distributed environment, because we have a huge overhead introduced by the coordination phase.
			
 
				 \end{frame}
			
 
				 
			
 
				 \begin{frame}
			
@@ -72,12 +72,12 @@ Usually we pay coordination overhead in term of:
 
				 
			
 
				 \begin{frame}
			
 
				 \frametitle{Invariant Confluence}
			
 
				-The authors of the paper discuss this new technique (or better analysis framework) that if applied, it will reduce in a considerable way the need of coordination between the Database entities, reducing the cost in terms of bandwidth and latency, increasing considerably the overall throughput of the system.
			
 
				+The authors of the paper discuss this new technique (or better \textbf{analysis framework}) that if applied, it will reduce in a considerable way the need of coordination between the Database entities, reducing the cost in terms of bandwidth and latency, increasing considerably the overall throughput of the system.
			
 
				 \end{frame}
			
 
				 
			
 
				 \begin{frame}
			
 
				 \frametitle{Invariant Confluence}
			
 
				-The main idea here is not to introduce some new exotic way to improve the coordination task, but instead the authors predicate on the fact that there is a set of workloads that do not require coordination, and that can be executed in parallel. The programmer at the application level can then state in an explicit way the \emph{invariants}, special attributes of the tables that need coordination in case of concurrent operations executing on them.
			
 
				+The main idea here is not to introduce some new exotic way to improve the coordination task, but instead the authors predicate on the fact that there is a set of workloads that do \textbf{not require coordination}, and that can be executed in parallel. The programmer at the application level can then state in an explicit way the \emph{invariants}, special attributes of the tables that need coordination in case of concurrent operations executing on them.
			
 
				 \end{frame}
			
 
				 
			
 
				 \begin{frame}
			
@@ -109,13 +109,13 @@ Here instead we show the basic evolution of a simple coordination free execution
 
				 \begin{itemize}
			
 
				   \item It is important to note that \textbf{coordination can only be avoided if all local commit decisions are globally valid.}\pause
			
 
				   \item So the best approach to guarantee the application level consistency is to apply a convergence analysis and then identify the \emph{true conflicts}. The uncertain situations must be threated in a conservative approach. \pause
			
 
				-  \item This means that we rely on the analysis done by the programmer at the application level to guarantee the correctness. This is clearly a drawback.
			
 
				+  \item This means that we rely on the \textbf{analysis} done by the programmer at the application level to guarantee the correctness. This is clearly a drawback.
			
 
				 \end{itemize}
			
 
				 \end{frame}
			
 
				 
			
 
				 \begin{frame}
			
 
				 \frametitle{Invariants}
			
 
				-Luckily there are some standard situations for the analysis of invariants that we can use as boilerplate in the building of the set of invariants of our application, this figure summarizes the main cases:
			
 
				+Luckily there are some standard situations for the analysis of invariants that we can use as \textbf{boilerplate} in the building of the set of invariants of our application, this figure summarizes the main cases:
			
 
				 \centering
			
 
				 \includegraphics[width=0.85\textwidth,height=0.7\textheight]{invariants}
			
 
				 \end{frame}
			
@@ -123,9 +123,9 @@ Luckily there are some standard situations for the analysis of invariants that w
 
				 \begin{frame}
			
 
				 \frametitle{Benchmarking}
			
 
				 \begin{itemize}
			
 
				-  \item The authors then proceeded to implement this new framework and test it with a standard benchmark, the TPC-C benchmark, that is said to be \say{the gold standard for database concurrency control both in research and industry.}
			
 
				-  \item They also used RAMP transactions, that are transactions that \say{employ limited multi-versioning and metadata to ensure that readers and writers can always proceed concurrently.}
			
 
				-  \item The selected language for the prototype is Scala, used for reason of compactness of the code.
			
 
				+  \item The authors then proceeded to implement this new framework and test it with a standard benchmark, the \textbf{TPC-C} benchmark, that is said to be \say{the gold standard for database concurrency control both in research and industry.}
			
 
				+  \item They also used \textbf{RAMP} transactions, that are transactions that \say{employ limited multi-versioning and metadata to ensure that readers and writers can always proceed concurrently.}
			
 
				+  \item The selected language for the prototype is \textbf{Scala}, used for reason of compactness of the code.
			
 
				 \end{itemize}
			
 
				 \end{frame}
			
 
				 
			
@@ -154,7 +154,7 @@ In the next few slides there are some plots of the results obtained by the autho
 
				 
			
 
				 \begin{frame}
			
 
				 \frametitle{Conclusions}
			
 
				-This paper demonstrates that ACID transactions and associated strong isolation levels dominated the field of database concurrency. This is a powerful abstractions that automatically guarantee consistency at the application level. In a distributed scenario where we want to achieve \textbf{high scalability}, we can sacrifice these abstractions and perform an \textbf{I-Confluence} analysis in order to exploit scalability through \textbf{coordination-free} transactions
			
 
				+This paper demonstrates that ACID transactions and associated strong isolation levels dominated the field of database concurrency. This is a \textbf{powerful abstraction} that automatically guarantee consistency at the application level. In a distributed scenario where we want to achieve \textbf{high scalability}, we can sacrifice these abstractions and perform an \textbf{I-Confluence} analysis in order to exploit scalability through \textbf{coordination-free} transactions
			
 
				 \end{frame}
			
 
				 
			
 
				 
			
@@ -171,7 +171,7 @@ This paper demonstrates that ACID transactions and associated strong isolation l
 
				 
			
 
				 \begin{frame}
			
 
				 \frametitle{Introduction}
			
 
				-With the drop in memory prices a set of in \textbf{main memory} database emerged. While for most of OLTP workloads often this solution is reasonable, due to the fact that often databases exhibit a skewed access patterns that divide records in hot (frequently accessed) and cold (rarely accessed) it is still convenient to find a way to maintain the hot records in memory and the cold ones on for example flash storage, that is still a lot less expensive than memory.
			
 
				+With the drop in memory prices a set of in \textbf{main memory} database emerged. While for most of OLTP workloads often this solution is reasonable, due to the fact that often databases exhibit a skewed access patterns that divide records in \textbf{hot} (frequently accessed) and \textbf{cold} (rarely accessed) it is still convenient to find a way to maintain the hot records in memory and the cold ones on for example flash storage, that is still a lot less expensive than memory.
			
 
				 \end{frame}
			
 
				 
			
 
				 \begin{frame}
			
@@ -181,7 +181,7 @@ In this paper it is presented \textbf{Project Siberia}, an extension to the \tex
 
				   \item Cold data classification.
			
 
				   \item Cold data storage.
			
 
				   \item Cold storage access reduction.
			
 
				-  \item Cold data access and migration mechanism (the focus of this paper is on this aspect).
			
 
				+  \item \textbf{Cold data access and migration mechanism} (the focus of this paper is on this aspect).
			
 
				 \end{itemize}
			
 
				 \end{frame}
			
 
				 
			
@@ -252,7 +252,7 @@ Hekaton utilizes optimistic multi-version concurrency control (MVCC), it mainly
 
				 \end{frame}
			
 
				 
			
 
				 \begin{frame}
			
 
				-\frametitle{Takeaways}
			
 
				+\frametitle{Observations}
			
 
				 \begin{itemize}
			
 
				   \item We need a new phase called \textbf{validation}, that checks just before a commit action that all the records used during the transactions still exist, are valid and have not been modified by another concurrent transaction.
			
 
				   \item There is \textbf{no deletion} in the strict sense of the term. The to-delete records have they end timestamps changed, and the garbage collection remove the unused records (when all the transactions alive begun after the deletion).
			
@@ -289,7 +289,7 @@ This analysis is done in order to isolate the overhead strictly caused by the Si
 
				 
			
 
				 \begin{frame}
			
 
				 \frametitle{Migration}
			
 
				-This analysis instead focuses on the performance degradation of various types of workload during a live migration to the cold storage of parts of the database:
			
 
				+This analysis instead focuses on the performance degradation of various types of workload during a \textbf{live migration} to the cold storage of parts of the database:
			
 
				 \begin{figure}
			
 
				 \caption{In-memory overhead of the Siberia framework.}
			
 
				 \centering
			
@@ -331,7 +331,7 @@ There is related research in progress in this direction:
 
				   \item \textbf{HyPer:} hybrid between OLTP and OLAP, optimizes data in chunks using different virtual memory pages, finds the cold data and compress for OLAP usage.
			
 
				 \end{itemize}\pause
			
 
				 \begin{block}{}
			
 
				-  The approach used in Siberia has the great advantage to have an access at record level, and for databases where the cold storage is between 10\% and 20\% of the whole database it has the great advantage of not requiring additional structures in memory (except the compact bloom-filters) for the cold data.
			
 
				+  The approach used in Siberia has the great advantage to have an access at \textbf{record level}, and for databases where the cold storage is between 10\% and 20\% of the whole database it has the great advantage of \textbf{not requiring additional structures} in memory (except the compact bloom-filters) for the cold data.
			
 
				 \end{block}
			
 
				 \end{frame}