# DB2 - lesson 02
#### Paraboschi
##### 12 October 2015
## Concurrency Control
Tens, hundredss, thousands of transactions per second cannot be executed serially
Concurrency need to be controlled to avoid anomalies.
### Concurrent executions
b(Tx) = begin transaction x
e(Tx) = end transaction x
- __serial__ b(T1) e(T1) b(T2) e(T2)
- or b(T2) e(T2) b(T1) e(T1)
- __interleaved__ b(T1) b(T2) e(T1) e(T2)
- __nested__ b(T1) b(T2) e(T2) e(T1)
### Problems due to concurrency
Given these two transactions:
>T1: UPDATE account
SET balance = balance + 3
WHERE client = 'Smith'
>T2: UPDATE account
SET balance = balance + 3
WHERE client = 'Smith'
#### Execution with lost UPDATE
As the name states, one o two changes to the data is lost.
The error is produced by:
- R1 R2 W1 W2
- R1 R2 W2 W1
> D=100
T1: R(D,V1)
V1 = V1 + 3
T2: R(D,V2)
V2 = V2 + 6
T1: W(V1,D)
T2: W(V2,D)
D=103
D=106!
#### Dirty read
The read of the second transaction happen before the rollback of T1,
therefore a wrong value is used for T2.
- R1 W1 R2 abort1 W2
> D=100
T1: R(D,V1)
T1: V1 = V1 + 3
T1: W(V1,D) D=103
T2: R(D,V2)
T1: ROLLBACK
T2: V2 = V2 + 6
T2: W(V2,D) D=109!
#### Nonrepeatable read
The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime.
- R1 R2 W2 R1
>D=100
T1: R(D,V1)
T2: R(D,V2)
T2: V2 = V2 + 6
T2: W(V2,D) D=106
T1: R(D,V3) V3<>V1!
#### Ghost update
T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y.
- R1 R1 R2 R2 W2 W2 R1
> X+Y+Z=100, X=50, Y=30, Z=20
T1: R(X,V1), R(Y,V2)
T2: R(Y,V3), R(Z,V4)
T2: V3 = V3 + 10, V4=V4-10
T2: W(V3,Y), W(V4,Z) (Y=40, Z=10)
T1: R(Z,V5) (for T1, V1+V2+V5=90!)
#### Phantom insert
This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query.
- R1 W2 (new data) R1
>T1: C=AVG(B:A=1)
T2: Insert (A=1,B=2)
T1: C=AVG(B: A=1)
### Schedule
Sequence of input/output operations performed by concurrent transactions.
es:
>S1: r1(x) r2(z) w1(x) w2
r1,w1
$r_1,w_1\in T_1$
$r_2,w_2\in T_2$
#### Principles of Concurrent Control
- __Goal__: to reject schedules that cause anomalies
- __Scheduler__: component that accepts or rejects the operations requested by the transactions
- __Serial schedule__: the actions of each transaction occur in contiguous sequences
- __Serializable schedule__: Produces the same results as some serial schedule on the same transactions (by *schedule equivalence*)
- The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking (???)
### CSR and VSR
$CSR\subset VSR$
#### View-serializability
###### NOTE: what is a read-from operation?
- $r_i(x)$ *reads-from* $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S
- $w_j(x)$ is a *final write* if it is the last write on x that occurs in S
Two schedules are __view-equivalent__ $S_i {\approx}_v S_j$ if they have the same reads-from relations and the same final writes.
A schedule is __view-serializable__ if it is equivalent to a serial schedule.
__VSR__ is the set of *view-serializable* schedules.
#### VSR
defines the schedule which are:
- serializable
- anomaly-free
But is vast and costly to evaluate
#### CSR
Is a subset of VSR solutions, used because it contains costs.
##### Example of View-serializability
```
S3: w0(x) r2(x) r2(x) w2(x) w2(z)
S4: w0(X) r1(x) r2(x) w2(x) w2(z)
S5: w0(x) r2(x) r2(x) w2(x) w2(z) //incorrect
S6: w0(X) r1(x) r2(x) w2(x) w2(z) //incorrect
```
##### Another Example
```
S7: r1(x) r2(x) w1(x) w2(x)
S8: r1(X) r2(x) w2(x) r1(x)
S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
```
#### Complexity
Deciding view-equivalence of two given schedules can be done in polynomial time
Deciding View-serializability of a generic schedule is a NP-complete problem
Every conflict serializable schedule is also view-serializable, but the converse is not necessarily true