瀏覽代碼

[DB2] Added lesson3, improved lesson2

Federico Amedeo Izzo 9 年之前
父節點
當前提交
7e0489a1bd
共有 3 個文件被更改,包括 161 次插入37 次删除
  1. 41 35
      Data Bases 2/lesson_02.md
  2. 118 0
      Data Bases 2/lesson_03.md
  3. 2 2
      README.md

+ 41 - 35
Data Bases 2/lesson_02.md

@@ -1,7 +1,7 @@
 # DB2 - lesson 02
 #### Paraboschi
 ##### 12 October 2015
-## Concurrency Control
+## Concurrency Control pt.I
 
 ### Advantages of Concurrency
 
@@ -31,35 +31,39 @@ e(Tx) = end transaction x
 
 ### Problems due to concurrency
 Given these two transactions:
->T1: UPDATE account  
+```
+T1: UPDATE account  
      SET balance = balance + 3  
      WHERE client = 'Smith'  
 
->T2: UPDATE account  
+T2: UPDATE account  
      SET balance = balance + 3  
      WHERE client = 'Smith'  
+```
 
 #### Execution with lost UPDATE
 As the name states, one or more changes to the data are lost.  
 The error is produced by:
 - R1 R2 W1 W2
 - R1 R2 W2 W1  
-
-> D=100  
- T1: R(D,V1)  
- V1 = V1 + 3  
- T2: R(D,V2)  
- V2 = V2 + 6  
- T1: W(V1,D)  
- T2: W(V2,D)  
- D=103  
- D=106!
+```
+D=100  
+T1: R(D,V1)  
+V1 = V1 + 3  
+T2: R(D,V2)  
+V2 = V2 + 6  
+T1: W(V1,D)  
+T2: W(V2,D)  
+D=103  
+D=106!
+```
 
 #### Dirty read
 The read of the second transaction happen before the rollback of T1,  
 therefore a wrong value is used for T2.  
 - R1 W1 R2 abort1 W2
-> D=100  
+```
+D=100  
 T1: R(D,V1)  
 T1: V1 = V1 + 3  
 T1: W(V1,D) D=103  
@@ -67,33 +71,40 @@ T2: R(D,V2)
 T1: ROLLBACK  
 T2: V2 = V2 + 6  
 T2: W(V2,D) D=109!
+```
 
 #### Nonrepeatable read
 The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime.
 - R1 R2 W2 R1
->D=100  
+```
+D=100  
 T1: R(D,V1)  
 T2: R(D,V2)  
 T2: V2 = V2 + 6  
 T2: W(V2,D) D=106  
 T1: R(D,V3) V3<>V1!
+```
 
 #### Ghost update
 T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y.
 - R1 R1 R2 R2 W2 W2 R1
-> X+Y+Z=100, X=50, Y=30, Z=20  
+```
+X+Y+Z=100, X=50, Y=30, Z=20  
 T1: R(X,V1), R(Y,V2)  
 T2: R(Y,V3), R(Z,V4)  
 T2: V3 = V3 + 10, V4=V4-10  
 T2: W(V3,Y), W(V4,Z) (Y=40, Z=10)  
 T1: R(Z,V5) (for T1, V1+V2+V5=90!)
+```
 
 #### Phantom insert
 This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query.
 - R1 W2 (new data) R1
->T1: C=AVG(B:A=1)  
+```
+T1: C=AVG(B:A=1)  
 T2: Insert (A=1,B=2)  
 T1: C=AVG(B: A=1)  
+```
 
 ### Schedule
 Sequence of input/output operations performed by concurrent transactions.
@@ -110,9 +121,9 @@ $r_2,w_2\in T_2$
 -  __Serial schedule__: the actions of each transaction occur in contiguous sequences
 - __Serializable schedule__: Produces the same results as some serial schedule on the same transactions (by *schedule equivalence*)
 - The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking, because scheduling must happen in real-time and the more is optimized my sheduling the more computational power I will need to obtain it.
-### CSR and VSR
+## CSR and VSR
 $CSR\subset VSR$
-#### View-serializability
+### View-serializability
 ###### NOTE: what is a read-from operation?
 -  $r_i(x)$ *reads-from* $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S
 
@@ -132,7 +143,7 @@ But is vast and costly to evaluate
 #### CSR
 Is a subset of VSR solutions, used because it contains costs.
 
-##### Example of View-serializability
+#### Example of View-serializability
 ```
 S3: w0(x) r2(x) r1(x) w2(x) w2(z)
 S4: w0(X) r1(x) r2(x) w2(x) w2(z)
@@ -150,7 +161,7 @@ Meanwhile S5 and S6 are view serializable because in both schedules:
 - r2(x) reads from w1(x)
 - w1(z) is the final write
 
-##### Another Example
+#### Another Example
 ```
 S7: r1(x) r2(x) w1(x) w2(x)
 S8: r1(X) r2(x) w2(x) r1(x)
@@ -162,22 +173,25 @@ S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
 - S9 correspond to a ghost update
 - They are all non view serializable
 
-##### Complexity
+#### Complexity
 Deciding view-equivalence of two given schedules can be done in polynomial time  
 Deciding View-serializability of a generic schedule is a NP-complete problem
 
-#### CSR
+### CSR
 
-An action ai is conflicting with aj (i!=j) if both are operations on common data and at least one of them is a write operation.
+An action ai is __conflicting__ with aj (i!=j) if both are operations on common data and at least one of them is a write operation.
 - read-write conflicts (rw, wr)
 - write-write conflicts (ww)
 
-Two schedules are conflict-equivalent if they contain the same operations and all conflicting operation pair occur in the same order.
-One schedule is conflict-serializable if it is conflict-equivalent to a serial schedule.
+__Conflict-equivalent schedules__ $S_i\approx_c S_j$:
+- $S_i$ and $S_J$ contain the same operations
+- all conficting operations pairs occur in the same order
+
+One schedule is __conflict-serializable__ if it is *conflict-equivalent* to a serial schedule.
 
 CSR is the set of conflict-equivalent schedules.
 
-##### CSR and VSR
+#### CSR and VSR
 
 Every conflict-serializable schedule is also view-serializable, but the converse is not necessarily true
 
@@ -188,11 +202,3 @@ Let S1 and S2 be two conflict-equivalent schedules:
 - They have the same reads-from relations, if they didn't, there would be at least one read-write pair with a different order
 
 So this implies that S1 and S2 are also view-equivalent.
-
-#### Testing conflict-serializability
-
-It is done with a conflict graph that has:
-- One node for each transaction Ti
-- One arc from Ti to Tj if it exists at least one conflict between an action of Ti and an action of Tj such as ai precedes aj.
-
-A schedule is in CSR iff its conflict graph is acyclic.

+ 118 - 0
Data Bases 2/lesson_03.md

@@ -0,0 +1,118 @@
+# DB2 - lesson 03
+#### Paraboschi
+##### 13 October 2015
+## Concurrency Control pt.II
+
+### Conflict Graph
+It is used to test __conflict-serializability__
+
+#### Structure:
+- One node for each transaction $T_i$
+- One arc from $T_i$ to $T_j$ if it exists at least one conflict between an action of $T_i$ and an action of $T_j$ such as $a_i$ precedes $a_j$.
+
+A schedule is in __CSR__ iff its __conflict graph__ is __acyclic__.
+
+#### Properties:
+- if $S$'s graph is *acyclic*, then it has a
+    - __topological sort__: An ordering of the nodes such that the graph only contains arcs (i,j) with i<j
+- The serial schedule whose transactions are ordered according to the *topological sort* is __conflict-equivalent__ to $S$, because for all conflicting pairs (i,j) it is always i<j
+- In general there can be many topological sorts, i.e. serializations for the same acyclic graph
+
+#### Concurrency control in practice
+The *conflict-graph* technique would be efficient if we knew the graph from the beginning, but we don't.  
+
+A scheduler must work __incrementally__: decide for each operation to execute it or not.
+
+It is not *feasible* to mantain the graph, update it, and verify its acyclicity at each operation request.
+
+### Locking
+
+It's the most common method in commercial systems
+
+A transaction is well-formed wrt locking if:
+- __read__ operations are preceded by __r_lock__ (shared lock) and followed by __unlock__
+- __write__ operations are preceded by __w_lock__ (exclusive lock) and followed by __unlock__
+
+When a transaction first reads and then writes and object, it can:
+- Use a __w_lock__
+- Modify a __r_lock__ into a __w_lock__ (lock escalation)
+
+#### Lock primitives
+-  __r_lock__: read lock
+-  __w_lock__: write lock
+- __unlock__
+#### Possible states of an object
+- __free__
+- __r_locked__: locked by a reader
+- __w_locked__: locked by a writer
+
+#### Behaviour of the lock manager
+
+The lock manager receives the primitives from the transactions and grants resources according to the __conflict table__
+- When a __lock__ request is granted, the resource is acquired
+- When an __unlock__ is executed, the resource becomes available.
+
+Request|free|r_locked|w_locked
+---|---|---|---
+__r_lock__|OK - __r_locked__|OK - __r_locked__|NO - __w_locked__
+__w_lock__|OK - __w_locked__|OK - __r_locked__|NO - __w_locked__
+__unlock__|__ERROR__|OK - __DEPENDS__|OK - __FREE__
+
+### Two-Phase Locking
+Requirements:
+- A transaction cannot acquire any other lock after releasing a lock
+- Locks on a transaction can be released only after commit/abort operations
+
+A scheduler which:
+- Uses well-formed transactions
+- grants locks according to conflicts
+- is Two-Phase
+Produces the schedule class called __2PL__
+
+Schedules is __2PL__ are __serializable__
+
+#### 2PL and CSR
+Every __2PL__ schedule is also *conflict-serializable*, but the converse is not necessarily true.
+
+##### Counter example  
+$r_1(x)w_1(x)r_2(x)w_2(x)r_3(y)w_1(y)$  
+- It violates 2PL  
+
+$r_1(x)w_1(x)$ |T1 rel $r_2(x)w_2(x)r_3(y)$ |T1 acq $w_1(y)$  
+- It is conflict-serializable  
+T3 < T1 < T2
+
+#### 2PL implies CSR
+$$2PL\subset CSR\subset VSR$$
+- Consider for each transaction the moment in which it has all resources and is going to release the first one
+- We sort the transactions by this temporal value and consider the corresponding serial schedule
+- We want to prove that this schedule is conflict-equivalent to $S$
+    - We then consider a conflict between an action from $t_i$ and an action from the $t_j's$ with $i<j$
+    - Can they occur in the reverse order in $S$?
+    - No, because then $t_j$ should have released the resource in question before $t_i$ has acquired it.
+
+#### Strict 2PL
+
+We were still using the hypotesis of commit-projection
+
+To remove this hypotesis we need to add a constraint to __2PL__, thus obtaining __strict 2PL__
+
+>Locks on a transaction can be released only after commit/rollback
+
+This version of 2PL is used in commercial DBMSs
+
+#### Implementation of 2-Phase Locking
+
+Lock tables are in reality __main memory data structures__
+- Resource state can be:
+    - free
+    - read-locked
+    - write-locked
+- Every resource has also a __read counter__
+- Some late '90 systems only supported exclusive locks (binary info, no counter)
+
+A transaction asking for a lock is either franted a lock or __queued and suspended__,  
+ the queue is FIFO, there is danger of:
+ - __Deadlock__: endless wait
+ - __Starvation__: individual transaction waiting forever  
+ Starvation can occur for write transactions waiting for resources which are higly used for reading (e.g. index roots)

+ 2 - 2
README.md

@@ -9,7 +9,7 @@ The idea is the following:
 ### Status
 Subject|completed to|last lesson
 ---|---|---
-Artificial Intelligence | lesson 4|lesson 4
-Data Bases 2|lesson 2|lesson 2
+Artificial Intelligence|lesson 4|lesson 4
+Data Bases 2|lesson 3|lesson 3
 Formal Languages and Compilers| lesson 3 | lesson 5
 Software Engineering 2 | missing content| lesson 4