# DB2 - lesson 02 #### Paraboschi ##### 12 October 2015 ## Concurrency Control pt.I ### Advantages of Concurrency Modern computers are capable of executing different processes in a concurrent way, Concurrency is also important in DBMS because it allows multiple Transactions at the same time ex: Two students enroll contemporarily on the Politecnico site. Concurrency is made by - time sharing on a single processor - or sharing the processes on multiple CPU cores. ### Concurrenct executions A problem may rise when we heve two or more concurrent operations modifying the same data. Tens, hundredss, thousands of transactions per second cannot be executed serially Concurrency need to be controlled to avoid anomalies. ### Types of concurrency b(Tx) = begin transaction x e(Tx) = end transaction x - __serial__ b(T1) e(T1) b(T2) e(T2) - or b(T2) e(T2) b(T1) e(T1) - __interleaved__ b(T1) b(T2) e(T1) e(T2) - __nested__ b(T1) b(T2) e(T2) e(T1) ### Problems due to concurrency Given these two transactions: ``` T1: UPDATE account SET balance = balance + 3 WHERE client = 'Smith' T2: UPDATE account SET balance = balance + 3 WHERE client = 'Smith' ``` #### Update Loss As the name states, one or more changes to the data are lost. The error is produced by: - R1 R2 W1 W2 - R1 R2 W2 W1 ``` D=100 T1: R(D,V1) V1 = V1 + 3 T2: R(D,V2) V2 = V2 + 6 T1: W(V1,D) T2: W(V2,D) D=103 D=106! ``` #### Dirty read The read of the second transaction happen before the rollback of T1, therefore a wrong value is used for T2. - R1 W1 R2 abort1 W2 ``` D=100 T1: R(D,V1) T1: V1 = V1 + 3 T1: W(V1,D) D=103 T2: R(D,V2) T1: ROLLBACK T2: V2 = V2 + 6 T2: W(V2,D) D=109! ``` #### Nonrepeatable read The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime. - R1 R2 W2 R1 ``` D=100 T1: R(D,V1) T2: R(D,V2) T2: V2 = V2 + 6 T2: W(V2,D) D=106 T1: R(D,V3) V3<>V1! ``` #### Ghost update T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y. - R1 R1 R2 R2 W2 W2 R1 ``` X+Y+Z=100, X=50, Y=30, Z=20 T1: R(X,V1), R(Y,V2) T2: R(Y,V3), R(Z,V4) T2: V3 = V3 + 10, V4=V4-10 T2: W(V3,Y), W(V4,Z) (Y=40, Z=10) T1: R(Z,V5) (for T1, V1+V2+V5=90!) ``` #### Phantom insert This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query. - R1 W2 (new data) R1 ``` T1: C=AVG(B:A=1) T2: Insert (A=1,B=2) T1: C=AVG(B: A=1) ``` ## Concurrency Control __Schedule__: Sequence of input/output operations performed by concurrent transactions. es: >S1: r1(x) r2(z) w1(x) w2 r1,w1 $r_1,w_1\in T_1$ $r_2,w_2\in T_2$ ### Principles of Concurrency Control - __Goal__: to reject schedules that cause anomalies - __Scheduler__: component that accepts or rejects the operations requested by the transactions - __Serial schedule__: the actions of each transaction occur in contiguous sequences - __Serializable schedule__: Produces the same results as some serial schedule on the same transactions (by *schedule equivalence*) - The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking, because scheduling must happen in real-time and the more is optimized my sheduling the more computational power I will need to obtain it. ### Concurrency Control techniques: - __Theoretical__: CSR and VSR, $CSR\subset VSR$ - __Practical__: 2PL, 2PL Strict, TS Mono, TS Multi ### View-serializability ###### NOTE: what is a read-from operation? - $r_i(x)$ *reads-from* $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S - $w_j(x)$ is a *final write* if it is the last write on x that occurs in S Two schedules are __view-equivalent__ $S_i {\approx}_v S_j$ if they have the same reads-from relations and the same final writes. A schedule is __view-serializable__ if it is equivalent to a serial schedule. __VSR__ is the set of *view-serializable* schedules. #### VSR defines the schedules which are: - serializable - anomaly-free But is vast and costly to evaluate #### CSR Is a subset of VSR solutions, used because it contains costs. #### Example of View-serializability ``` S3: w0(x) r2(x) r1(x) w2(x) w2(z) S4: w0(X) r1(x) r2(x) w2(x) w2(z) S5: w0(x) r1(x) w1(x) r2(x) w1(z) S6: w0(x) r1(x) w1(x) w1(z) r2(z) ``` In this example S3 is view serializable to S4 because both schedules have the following properties: - r2(x) reads from w0(x) - r1(x) reads from w0(x) - w2(z) is the final write Meanwhile S5 and S6 are view serializable because in both schedules: - r1(x) reads from w0(x) - r2(x) reads from w1(x) - w1(z) is the final write #### Another Example ``` S7: r1(x) r2(x) w1(x) w2(x) S8: r1(X) r2(x) w2(x) r1(x) S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z) ``` - S7 correspond to a lost update - S8 correspond to a non repeatable read - S9 correspond to a ghost update - They are all non view serializable #### Complexity Deciding view-equivalence of two given schedules can be done in polynomial time Deciding View-serializability of a generic schedule is a NP-complete problem ### CSR An action ai is __conflicting__ with aj (i!=j) if both are operations on common data and at least one of them is a write operation. - read-write conflicts (rw, wr) - write-write conflicts (ww) __Conflict-equivalent schedules__ $S_i\approx_c S_j$: - $S_i$ and $S_J$ contain the same operations - all conficting operations pairs occur in the same order One schedule is __conflict-serializable__ if it is *conflict-equivalent* to a serial schedule. CSR is the set of conflict-equivalent schedules. #### CSR and VSR Every conflict-serializable schedule is also view-serializable, but the converse is not necessarily true In order to prove that $\;CSR\subset VSR\;$ we have to prove that conflict-equivalence implies view-equivalence. Let S1 and S2 be two conflict-equivalent schedules: - They have the same final writes, if they didn't, there would be at least two writes with a different order - They have the same reads-from relations, if they didn't, there would be at least one read-write pair with a different order So this implies that S1 and S2 are also view-equivalent.