# DB2 - lesson 02 #### Paraboschi ##### 12 October 2015 ## Concurrency Control Tens, hundredss, thousands of transactions per second cannot be executed serially Concurrency need to be controlled to avoid anomalies. ### Concurrent executions b(Tx) = begin transaction x e(Tx) = end transaction x - __serial__ b(T1) e(T1) b(T2) e(T2) - or b(T2) e(T2) b(T1) e(T1) - __interleaved__ b(T1) b(T2) e(T1) e(T2) - __nested__ b(T1) b(T2) e(T2) e(T1) ### Problems due to concurrency Given these two transactions: >T1: UPDATE account SET balance = balance + 3 WHERE client = 'Smith' >T2: UPDATE account SET balance = balance + 3 WHERE client = 'Smith' #### Execution with lost UPDATE As the name states, one or more changes to the data are lost. The error is produced by: - R1 R2 W1 W2 - R1 R2 W2 W1 > D=100 T1: R(D,V1) V1 = V1 + 3 T2: R(D,V2) V2 = V2 + 6 T1: W(V1,D) T2: W(V2,D) D=103 D=106! #### Dirty read The read of the second transaction happen before the rollback of T1, therefore a wrong value is used for T2. - R1 W1 R2 abort1 W2 > D=100 T1: R(D,V1) T1: V1 = V1 + 3 T1: W(V1,D) D=103 T2: R(D,V2) T1: ROLLBACK T2: V2 = V2 + 6 T2: W(V2,D) D=109! #### Nonrepeatable read The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime. - R1 R2 W2 R1 >D=100 T1: R(D,V1) T2: R(D,V2) T2: V2 = V2 + 6 T2: W(V2,D) D=106 T1: R(D,V3) V3<>V1! #### Ghost update T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y. - R1 R1 R2 R2 W2 W2 R1 > X+Y+Z=100, X=50, Y=30, Z=20 T1: R(X,V1), R(Y,V2) T2: R(Y,V3), R(Z,V4) T2: V3 = V3 + 10, V4=V4-10 T2: W(V3,Y), W(V4,Z) (Y=40, Z=10) T1: R(Z,V5) (for T1, V1+V2+V5=90!) #### Phantom insert This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query. - R1 W2 (new data) R1 >T1: C=AVG(B:A=1) T2: Insert (A=1,B=2) T1: C=AVG(B: A=1) ### Schedule Sequence of input/output operations performed by concurrent transactions. es: >S1: r1(x) r2(z) w1(x) w2 r1,w1 $r_1,w_1\in T_1$ $r_2,w_2\in T_2$ #### Principles of Concurrent Control - __Goal__: to reject schedules that cause anomalies - __Scheduler__: component that accepts or rejects the operations requested by the transactions - __Serial schedule__: the actions of each transaction occur in contiguous sequences - __Serializable schedule__: Produces the same results as some serial schedule on the same transactions (by *schedule equivalence*) - The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking, because scheduling must happen in real-time and the more is optimized my sheduling the more computational power I will need to obtain it. ### CSR and VSR $CSR\subset VSR$ #### View-serializability ###### NOTE: what is a read-from operation? - $r_i(x)$ *reads-from* $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S - $w_j(x)$ is a *final write* if it is the last write on x that occurs in S Two schedules are __view-equivalent__ $S_i {\approx}_v S_j$ if they have the same reads-from relations and the same final writes. A schedule is __view-serializable__ if it is equivalent to a serial schedule. __VSR__ is the set of *view-serializable* schedules. #### VSR defines the schedules which are: - serializable - anomaly-free But is vast and costly to evaluate #### CSR Is a subset of VSR solutions, used because it contains costs. ##### Example of View-serializability ``` S3: w0(x) r2(x) r1(x) w2(x) w2(z) S4: w0(X) r1(x) r2(x) w2(x) w2(z) S5: w0(x) r1(x) w1(x) r2(x) w1(z) S6: w0(x) r1(x) w1(x) w1(z) r2(z) ``` In this example S3 is view serializable to S4 because both schedules have the following properties: - r2(x) reads from w0(x) - r1(x) reads from w0(x) - w2(z) is the final write Meanwhile S5 and S6 are view serializable because in both schedules: - r1(x) reads from w0(x) - r2(x) reads from w1(x) - w1(z) is the final write ##### Another Example ``` S7: r1(x) r2(x) w1(x) w2(x) S8: r1(X) r2(x) w2(x) r1(x) S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z) ``` - S7 correspond to a lost update - S8 correspond to a non repeatable read - S9 correspond to a ghost update - They are all non view serializable ##### Complexity Deciding view-equivalence of two given schedules can be done in polynomial time Deciding View-serializability of a generic schedule is a NP-complete problem #### CSR An action ai is conflicting with aj (i!=j) if both are operations on common data and at least one of them is a write operation. - read-write conflicts (rw, wr) - write-write conflicts (ww) Two schedules are conflict-equivalent if they contain the same operations and all conflicting operation pair occur in the same order. One schedule is conflict-serializable if it is conflict-equivalent to a serial schedule. CSR is the set of conflict-equivalent schedules. ##### CSR and VSR Every conflict-serializable schedule is also view-serializable, but the converse is not necessarily true In order to prove that $\;CSR\subset VSR\;$ we have to prove that conflict-equivalence implies view-equivalence. Let S1 and S2 be two conflict-equivalent schedules: - They have the same final writes, if they didn't, there would be at least two writes with a different order - They have the same reads-from relations, if they didn't, there would be at least one read-write pair with a different order So this implies that S1 and S2 are also view-equivalent. #### Testing conflict-serializability It is done with a conflict graph that has: - One node for each transaction Ti - One arc from Ti to Tj if it exists at least one conflict between an action of Ti and an action of Tj such as ai precedes aj. A schedule is in CSR iff its conflict graph is acyclic.