|
@@ -0,0 +1,137 @@
|
|
|
+# DB2 - lesson 02
|
|
|
+#### Paraboschi
|
|
|
+##### 12 October 2015
|
|
|
+## Concurrency Control
|
|
|
+Tens, hundredss, thousands of transactions per second cannot be executed serially
|
|
|
+
|
|
|
+Concurrency need to be controlled to avoid anomalies.
|
|
|
+
|
|
|
+### Concurrent executions
|
|
|
+b(Tx) = begin transaction x
|
|
|
+e(Tx) = end transaction x
|
|
|
+- __serial__ b(T1) e(T1) b(T2) e(T2)
|
|
|
+- or b(T2) e(T2) b(T1) e(T1)
|
|
|
+- __interleaved__ b(T1) b(T2) e(T1) e(T2)
|
|
|
+- __nested__ b(T1) b(T2) e(T2) e(T1)
|
|
|
+
|
|
|
+### Problems due to concurrency
|
|
|
+Given these two transactions:
|
|
|
+>T1: UPDATE account
|
|
|
+ SET balance = balance + 3
|
|
|
+ WHERE client = 'Smith'
|
|
|
+
|
|
|
+>T2: UPDATE account
|
|
|
+ SET balance = balance + 3
|
|
|
+ WHERE client = 'Smith'
|
|
|
+
|
|
|
+#### Execution with lost UPDATE
|
|
|
+As the name states, one o two changes to the data is lost.
|
|
|
+The error is produced by:
|
|
|
+- R1 R2 W1 W2
|
|
|
+- R1 R2 W2 W1
|
|
|
+
|
|
|
+> D=100
|
|
|
+ T1: R(D,V1)
|
|
|
+ V1 = V1 + 3
|
|
|
+ T2: R(D,V2)
|
|
|
+ V2 = V2 + 6
|
|
|
+ T1: W(V1,D)
|
|
|
+ T2: W(V2,D)
|
|
|
+ D=103
|
|
|
+ D=106!
|
|
|
+
|
|
|
+#### Dirty read
|
|
|
+The read of the second transaction happen before the rollback of T1,
|
|
|
+therefore a wrong value is used for T2.
|
|
|
+- R1 W1 R2 abort1 W2
|
|
|
+> D=100
|
|
|
+T1: R(D,V1)
|
|
|
+T1: V1 = V1 + 3
|
|
|
+T1: W(V1,D) D=103
|
|
|
+T2: R(D,V2)
|
|
|
+T1: ROLLBACK
|
|
|
+T2: V2 = V2 + 6
|
|
|
+T2: W(V2,D) D=109!
|
|
|
+
|
|
|
+#### Nonrepeatable read
|
|
|
+The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime.
|
|
|
+- R1 R2 W2 R1
|
|
|
+>D=100
|
|
|
+T1: R(D,V1)
|
|
|
+T2: R(D,V2)
|
|
|
+T2: V2 = V2 + 6
|
|
|
+T2: W(V2,D) D=106
|
|
|
+T1: R(D,V3) V3<>V1!
|
|
|
+
|
|
|
+#### Ghost update
|
|
|
+T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y.
|
|
|
+- R1 R1 R2 R2 W2 W2 R1
|
|
|
+> X+Y+Z=100, X=50, Y=30, Z=20
|
|
|
+T1: R(X,V1), R(Y,V2)
|
|
|
+T2: R(Y,V3), R(Z,V4)
|
|
|
+T2: V3 = V3 + 10, V4=V4-10
|
|
|
+T2: W(V3,Y), W(V4,Z) (Y=40, Z=10)
|
|
|
+T1: R(Z,V5) (for T1, V1+V2+V5=90!)
|
|
|
+
|
|
|
+#### Phantom insert
|
|
|
+This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query.
|
|
|
+- R1 W2 (new data) R1
|
|
|
+>T1: C=AVG(B:A=1)
|
|
|
+T2: Insert (A=1,B=2)
|
|
|
+T1: C=AVG(B: A=1)
|
|
|
+
|
|
|
+### Schedule
|
|
|
+Sequence of input/output operations performed by concurrent transactions.
|
|
|
+es:
|
|
|
+>S<sub>1</sub>: r<sub>1</sub>(x) r<sub>2</sub>(z) w<sub>1</sub>(x) w<sub>2</sub>
|
|
|
+r<sub>1</sub>,w<sub>1</sub>
|
|
|
+
|
|
|
+$r_1,w_1\in T_1$
|
|
|
+$r_2,w_2\in T_2$
|
|
|
+
|
|
|
+#### Principles of Concurrent Control
|
|
|
+- __Goal__: to reject schedules that cause anomalies
|
|
|
+- __Scheduler__: component that accepts or rejects the operations requested by the transactions
|
|
|
+- __Serial schedule__: the actions of each transaction occur in contiguous sequences
|
|
|
+- __Serializable schedule__: Produces the same results as some serial schedule on the same transactions (by *schedule equivalence*)
|
|
|
+- The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking (???)
|
|
|
+### CSR and VSR
|
|
|
+$CSR\subset VSR$
|
|
|
+#### View-serializability
|
|
|
+###### NOTE: what is a read-from operation?
|
|
|
+- $r_i(x)$ *reads-from* $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S
|
|
|
+
|
|
|
+- $w_j(x)$ is a *final write* if it is the last write on x that occurs in S
|
|
|
+
|
|
|
+Two schedules are __view-equivalent__ $S_i {\approx}_v S_j$ if they have the same reads-from relations and the same final writes.
|
|
|
+
|
|
|
+A schedule is __view-serializable__ if it is equivalent to a serial schedule.
|
|
|
+
|
|
|
+__VSR__ is the set of *view-serializable* schedules.
|
|
|
+
|
|
|
+#### VSR
|
|
|
+defines the schedule which are:
|
|
|
+- serializable
|
|
|
+- anomaly-free
|
|
|
+But is vast and costly to evaluate
|
|
|
+#### CSR
|
|
|
+Is a subset of VSR solutions, used because it contains costs.
|
|
|
+
|
|
|
+##### Example of View-serializability
|
|
|
+```
|
|
|
+S3: w0(x) r2(x) r2(x) w2(x) w2(z)
|
|
|
+S4: w0(X) r1(x) r2(x) w2(x) w2(z)
|
|
|
+S5: w0(x) r2(x) r2(x) w2(x) w2(z) //incorrect
|
|
|
+S6: w0(X) r1(x) r2(x) w2(x) w2(z) //incorrect
|
|
|
+```
|
|
|
+##### Another Example
|
|
|
+```
|
|
|
+S7: r1(x) r2(x) w1(x) w2(x)
|
|
|
+S8: r1(X) r2(x) w2(x) r1(x)
|
|
|
+S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
|
|
|
+```
|
|
|
+#### Complexity
|
|
|
+Deciding view-equivalence of two given schedules can be done in polynomial time
|
|
|
+Deciding View-serializability of a generic schedule is a NP-complete problem
|
|
|
+
|
|
|
+Every conflict serializable schedule is also view-serializable, but the converse is not necessarily true
|