ソースを参照

[DB2] Added lesson3, improved lesson2

Federico Amedeo Izzo 9 年 前
コミット
7e0489a1bd
3 ファイル変更161 行追加37 行削除
  1. 41 35
      Data Bases 2/lesson_02.md
  2. 118 0
      Data Bases 2/lesson_03.md
  3. 2 2
      README.md

+ 41 - 35
Data Bases 2/lesson_02.md

@@ -1,7 +1,7 @@
 # DB2 - lesson 02
 # DB2 - lesson 02
 #### Paraboschi
 #### Paraboschi
 ##### 12 October 2015
 ##### 12 October 2015
-## Concurrency Control
+## Concurrency Control pt.I
 
 
 ### Advantages of Concurrency
 ### Advantages of Concurrency
 
 
@@ -31,35 +31,39 @@ e(Tx) = end transaction x
 
 
 ### Problems due to concurrency
 ### Problems due to concurrency
 Given these two transactions:
 Given these two transactions:
->T1: UPDATE account  
+```
+T1: UPDATE account  
      SET balance = balance + 3  
      SET balance = balance + 3  
      WHERE client = 'Smith'  
      WHERE client = 'Smith'  
 
 
->T2: UPDATE account  
+T2: UPDATE account  
      SET balance = balance + 3  
      SET balance = balance + 3  
      WHERE client = 'Smith'  
      WHERE client = 'Smith'  
+```
 
 
 #### Execution with lost UPDATE
 #### Execution with lost UPDATE
 As the name states, one or more changes to the data are lost.  
 As the name states, one or more changes to the data are lost.  
 The error is produced by:
 The error is produced by:
 - R1 R2 W1 W2
 - R1 R2 W1 W2
 - R1 R2 W2 W1  
 - R1 R2 W2 W1  
-
-> D=100  
- T1: R(D,V1)  
- V1 = V1 + 3  
- T2: R(D,V2)  
- V2 = V2 + 6  
- T1: W(V1,D)  
- T2: W(V2,D)  
- D=103  
- D=106!
+```
+D=100  
+T1: R(D,V1)  
+V1 = V1 + 3  
+T2: R(D,V2)  
+V2 = V2 + 6  
+T1: W(V1,D)  
+T2: W(V2,D)  
+D=103  
+D=106!
+```
 
 
 #### Dirty read
 #### Dirty read
 The read of the second transaction happen before the rollback of T1,  
 The read of the second transaction happen before the rollback of T1,  
 therefore a wrong value is used for T2.  
 therefore a wrong value is used for T2.  
 - R1 W1 R2 abort1 W2
 - R1 W1 R2 abort1 W2
-> D=100  
+```
+D=100  
 T1: R(D,V1)  
 T1: R(D,V1)  
 T1: V1 = V1 + 3  
 T1: V1 = V1 + 3  
 T1: W(V1,D) D=103  
 T1: W(V1,D) D=103  
@@ -67,33 +71,40 @@ T2: R(D,V2)
 T1: ROLLBACK  
 T1: ROLLBACK  
 T2: V2 = V2 + 6  
 T2: V2 = V2 + 6  
 T2: W(V2,D) D=109!
 T2: W(V2,D) D=109!
+```
 
 
 #### Nonrepeatable read
 #### Nonrepeatable read
 The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime.
 The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime.
 - R1 R2 W2 R1
 - R1 R2 W2 R1
->D=100  
+```
+D=100  
 T1: R(D,V1)  
 T1: R(D,V1)  
 T2: R(D,V2)  
 T2: R(D,V2)  
 T2: V2 = V2 + 6  
 T2: V2 = V2 + 6  
 T2: W(V2,D) D=106  
 T2: W(V2,D) D=106  
 T1: R(D,V3) V3<>V1!
 T1: R(D,V3) V3<>V1!
+```
 
 
 #### Ghost update
 #### Ghost update
 T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y.
 T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y.
 - R1 R1 R2 R2 W2 W2 R1
 - R1 R1 R2 R2 W2 W2 R1
-> X+Y+Z=100, X=50, Y=30, Z=20  
+```
+X+Y+Z=100, X=50, Y=30, Z=20  
 T1: R(X,V1), R(Y,V2)  
 T1: R(X,V1), R(Y,V2)  
 T2: R(Y,V3), R(Z,V4)  
 T2: R(Y,V3), R(Z,V4)  
 T2: V3 = V3 + 10, V4=V4-10  
 T2: V3 = V3 + 10, V4=V4-10  
 T2: W(V3,Y), W(V4,Z) (Y=40, Z=10)  
 T2: W(V3,Y), W(V4,Z) (Y=40, Z=10)  
 T1: R(Z,V5) (for T1, V1+V2+V5=90!)
 T1: R(Z,V5) (for T1, V1+V2+V5=90!)
+```
 
 
 #### Phantom insert
 #### Phantom insert
 This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query.
 This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query.
 - R1 W2 (new data) R1
 - R1 W2 (new data) R1
->T1: C=AVG(B:A=1)  
+```
+T1: C=AVG(B:A=1)  
 T2: Insert (A=1,B=2)  
 T2: Insert (A=1,B=2)  
 T1: C=AVG(B: A=1)  
 T1: C=AVG(B: A=1)  
+```
 
 
 ### Schedule
 ### Schedule
 Sequence of input/output operations performed by concurrent transactions.
 Sequence of input/output operations performed by concurrent transactions.
@@ -110,9 +121,9 @@ $r_2,w_2\in T_2$
 -  __Serial schedule__: the actions of each transaction occur in contiguous sequences
 -  __Serial schedule__: the actions of each transaction occur in contiguous sequences
 - __Serializable schedule__: Produces the same results as some serial schedule on the same transactions (by *schedule equivalence*)
 - __Serializable schedule__: Produces the same results as some serial schedule on the same transactions (by *schedule equivalence*)
 - The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking, because scheduling must happen in real-time and the more is optimized my sheduling the more computational power I will need to obtain it.
 - The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking, because scheduling must happen in real-time and the more is optimized my sheduling the more computational power I will need to obtain it.
-### CSR and VSR
+## CSR and VSR
 $CSR\subset VSR$
 $CSR\subset VSR$
-#### View-serializability
+### View-serializability
 ###### NOTE: what is a read-from operation?
 ###### NOTE: what is a read-from operation?
 -  $r_i(x)$ *reads-from* $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S
 -  $r_i(x)$ *reads-from* $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S
 
 
@@ -132,7 +143,7 @@ But is vast and costly to evaluate
 #### CSR
 #### CSR
 Is a subset of VSR solutions, used because it contains costs.
 Is a subset of VSR solutions, used because it contains costs.
 
 
-##### Example of View-serializability
+#### Example of View-serializability
 ```
 ```
 S3: w0(x) r2(x) r1(x) w2(x) w2(z)
 S3: w0(x) r2(x) r1(x) w2(x) w2(z)
 S4: w0(X) r1(x) r2(x) w2(x) w2(z)
 S4: w0(X) r1(x) r2(x) w2(x) w2(z)
@@ -150,7 +161,7 @@ Meanwhile S5 and S6 are view serializable because in both schedules:
 - r2(x) reads from w1(x)
 - r2(x) reads from w1(x)
 - w1(z) is the final write
 - w1(z) is the final write
 
 
-##### Another Example
+#### Another Example
 ```
 ```
 S7: r1(x) r2(x) w1(x) w2(x)
 S7: r1(x) r2(x) w1(x) w2(x)
 S8: r1(X) r2(x) w2(x) r1(x)
 S8: r1(X) r2(x) w2(x) r1(x)
@@ -162,22 +173,25 @@ S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
 - S9 correspond to a ghost update
 - S9 correspond to a ghost update
 - They are all non view serializable
 - They are all non view serializable
 
 
-##### Complexity
+#### Complexity
 Deciding view-equivalence of two given schedules can be done in polynomial time  
 Deciding view-equivalence of two given schedules can be done in polynomial time  
 Deciding View-serializability of a generic schedule is a NP-complete problem
 Deciding View-serializability of a generic schedule is a NP-complete problem
 
 
-#### CSR
+### CSR
 
 
-An action ai is conflicting with aj (i!=j) if both are operations on common data and at least one of them is a write operation.
+An action ai is __conflicting__ with aj (i!=j) if both are operations on common data and at least one of them is a write operation.
 - read-write conflicts (rw, wr)
 - read-write conflicts (rw, wr)
 - write-write conflicts (ww)
 - write-write conflicts (ww)
 
 
-Two schedules are conflict-equivalent if they contain the same operations and all conflicting operation pair occur in the same order.
-One schedule is conflict-serializable if it is conflict-equivalent to a serial schedule.
+__Conflict-equivalent schedules__ $S_i\approx_c S_j$:
+- $S_i$ and $S_J$ contain the same operations
+- all conficting operations pairs occur in the same order
+
+One schedule is __conflict-serializable__ if it is *conflict-equivalent* to a serial schedule.
 
 
 CSR is the set of conflict-equivalent schedules.
 CSR is the set of conflict-equivalent schedules.
 
 
-##### CSR and VSR
+#### CSR and VSR
 
 
 Every conflict-serializable schedule is also view-serializable, but the converse is not necessarily true
 Every conflict-serializable schedule is also view-serializable, but the converse is not necessarily true
 
 
@@ -188,11 +202,3 @@ Let S1 and S2 be two conflict-equivalent schedules:
 - They have the same reads-from relations, if they didn't, there would be at least one read-write pair with a different order
 - They have the same reads-from relations, if they didn't, there would be at least one read-write pair with a different order
 
 
 So this implies that S1 and S2 are also view-equivalent.
 So this implies that S1 and S2 are also view-equivalent.
-
-#### Testing conflict-serializability
-
-It is done with a conflict graph that has:
-- One node for each transaction Ti
-- One arc from Ti to Tj if it exists at least one conflict between an action of Ti and an action of Tj such as ai precedes aj.
-
-A schedule is in CSR iff its conflict graph is acyclic.

+ 118 - 0
Data Bases 2/lesson_03.md

@@ -0,0 +1,118 @@
+# DB2 - lesson 03
+#### Paraboschi
+##### 13 October 2015
+## Concurrency Control pt.II
+
+### Conflict Graph
+It is used to test __conflict-serializability__
+
+#### Structure:
+- One node for each transaction $T_i$
+- One arc from $T_i$ to $T_j$ if it exists at least one conflict between an action of $T_i$ and an action of $T_j$ such as $a_i$ precedes $a_j$.
+
+A schedule is in __CSR__ iff its __conflict graph__ is __acyclic__.
+
+#### Properties:
+- if $S$'s graph is *acyclic*, then it has a
+    - __topological sort__: An ordering of the nodes such that the graph only contains arcs (i,j) with i<j
+- The serial schedule whose transactions are ordered according to the *topological sort* is __conflict-equivalent__ to $S$, because for all conflicting pairs (i,j) it is always i<j
+- In general there can be many topological sorts, i.e. serializations for the same acyclic graph
+
+#### Concurrency control in practice
+The *conflict-graph* technique would be efficient if we knew the graph from the beginning, but we don't.  
+
+A scheduler must work __incrementally__: decide for each operation to execute it or not.
+
+It is not *feasible* to mantain the graph, update it, and verify its acyclicity at each operation request.
+
+### Locking
+
+It's the most common method in commercial systems
+
+A transaction is well-formed wrt locking if:
+- __read__ operations are preceded by __r_lock__ (shared lock) and followed by __unlock__
+- __write__ operations are preceded by __w_lock__ (exclusive lock) and followed by __unlock__
+
+When a transaction first reads and then writes and object, it can:
+- Use a __w_lock__
+- Modify a __r_lock__ into a __w_lock__ (lock escalation)
+
+#### Lock primitives
+-  __r_lock__: read lock
+-  __w_lock__: write lock
+- __unlock__
+#### Possible states of an object
+- __free__
+- __r_locked__: locked by a reader
+- __w_locked__: locked by a writer
+
+#### Behaviour of the lock manager
+
+The lock manager receives the primitives from the transactions and grants resources according to the __conflict table__
+- When a __lock__ request is granted, the resource is acquired
+- When an __unlock__ is executed, the resource becomes available.
+
+Request|free|r_locked|w_locked
+---|---|---|---
+__r_lock__|OK - __r_locked__|OK - __r_locked__|NO - __w_locked__
+__w_lock__|OK - __w_locked__|OK - __r_locked__|NO - __w_locked__
+__unlock__|__ERROR__|OK - __DEPENDS__|OK - __FREE__
+
+### Two-Phase Locking
+Requirements:
+- A transaction cannot acquire any other lock after releasing a lock
+- Locks on a transaction can be released only after commit/abort operations
+
+A scheduler which:
+- Uses well-formed transactions
+- grants locks according to conflicts
+- is Two-Phase
+Produces the schedule class called __2PL__
+
+Schedules is __2PL__ are __serializable__
+
+#### 2PL and CSR
+Every __2PL__ schedule is also *conflict-serializable*, but the converse is not necessarily true.
+
+##### Counter example  
+$r_1(x)w_1(x)r_2(x)w_2(x)r_3(y)w_1(y)$  
+- It violates 2PL  
+
+$r_1(x)w_1(x)$ |T1 rel $r_2(x)w_2(x)r_3(y)$ |T1 acq $w_1(y)$  
+- It is conflict-serializable  
+T3 < T1 < T2
+
+#### 2PL implies CSR
+$$2PL\subset CSR\subset VSR$$
+- Consider for each transaction the moment in which it has all resources and is going to release the first one
+- We sort the transactions by this temporal value and consider the corresponding serial schedule
+- We want to prove that this schedule is conflict-equivalent to $S$
+    - We then consider a conflict between an action from $t_i$ and an action from the $t_j's$ with $i<j$
+    - Can they occur in the reverse order in $S$?
+    - No, because then $t_j$ should have released the resource in question before $t_i$ has acquired it.
+
+#### Strict 2PL
+
+We were still using the hypotesis of commit-projection
+
+To remove this hypotesis we need to add a constraint to __2PL__, thus obtaining __strict 2PL__
+
+>Locks on a transaction can be released only after commit/rollback
+
+This version of 2PL is used in commercial DBMSs
+
+#### Implementation of 2-Phase Locking
+
+Lock tables are in reality __main memory data structures__
+- Resource state can be:
+    - free
+    - read-locked
+    - write-locked
+- Every resource has also a __read counter__
+- Some late '90 systems only supported exclusive locks (binary info, no counter)
+
+A transaction asking for a lock is either franted a lock or __queued and suspended__,  
+ the queue is FIFO, there is danger of:
+ - __Deadlock__: endless wait
+ - __Starvation__: individual transaction waiting forever  
+ Starvation can occur for write transactions waiting for resources which are higly used for reading (e.g. index roots)

+ 2 - 2
README.md

@@ -9,7 +9,7 @@ The idea is the following:
 ### Status
 ### Status
 Subject|completed to|last lesson
 Subject|completed to|last lesson
 ---|---|---
 ---|---|---
-Artificial Intelligence | lesson 4|lesson 4
-Data Bases 2|lesson 2|lesson 2
+Artificial Intelligence|lesson 4|lesson 4
+Data Bases 2|lesson 3|lesson 3
 Formal Languages and Compilers| lesson 3 | lesson 5
 Formal Languages and Compilers| lesson 3 | lesson 5
 Software Engineering 2 | missing content| lesson 4
 Software Engineering 2 | missing content| lesson 4