lesson_02.md 4.0 KB

DB2 - lesson 02

Paraboschi

12 October 2015

Concurrency Control

Tens, hundredss, thousands of transactions per second cannot be executed serially

Concurrency need to be controlled to avoid anomalies.

Concurrent executions

b(Tx) = begin transaction x
e(Tx) = end transaction x

  • serial b(T1) e(T1) b(T2) e(T2)
  • or b(T2) e(T2) b(T1) e(T1)
  • interleaved b(T1) b(T2) e(T1) e(T2)
  • nested b(T1) b(T2) e(T2) e(T1)

Problems due to concurrency

Given these two transactions:

T1: UPDATE account

 SET balance = balance + 3  
 WHERE client = 'Smith'  

T2: UPDATE account

 SET balance = balance + 3  
 WHERE client = 'Smith'  

Execution with lost UPDATE

As the name states, one o two changes to the data is lost.
The error is produced by:

  • R1 R2 W1 W2
  • R1 R2 W2 W1

D=100
T1: R(D,V1)
V1 = V1 + 3
T2: R(D,V2)
V2 = V2 + 6
T1: W(V1,D)
T2: W(V2,D)
D=103
D=106!

Dirty read

The read of the second transaction happen before the rollback of T1,
therefore a wrong value is used for T2.

  • R1 W1 R2 abort1 W2 > D=100
    T1: R(D,V1)
    T1: V1 = V1 + 3
    T1: W(V1,D) D=103
    T2: R(D,V2)
    T1: ROLLBACK
    T2: V2 = V2 + 6
    T2: W(V2,D) D=109!

Nonrepeatable read

The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime.

  • R1 R2 W2 R1 >D=100
    T1: R(D,V1)
    T2: R(D,V2)
    T2: V2 = V2 + 6
    T2: W(V2,D) D=106
    T1: R(D,V3) V3<>V1!

Ghost update

T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y.

  • R1 R1 R2 R2 W2 W2 R1 > X+Y+Z=100, X=50, Y=30, Z=20
    T1: R(X,V1), R(Y,V2)
    T2: R(Y,V3), R(Z,V4)
    T2: V3 = V3 + 10, V4=V4-10
    T2: W(V3,Y), W(V4,Z) (Y=40, Z=10)
    T1: R(Z,V5) (for T1, V1+V2+V5=90!)

Phantom insert

This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query.

  • R1 W2 (new data) R1 >T1: C=AVG(B:A=1)
    T2: Insert (A=1,B=2)
    T1: C=AVG(B: A=1)

Schedule

Sequence of input/output operations performed by concurrent transactions. es:

S1: r1(x) r2(z) w1(x) w2
r1,w1

$r_1,w_1\in T_1$ $r_2,w_2\in T_2$

Principles of Concurrent Control

  • Goal: to reject schedules that cause anomalies
  • Scheduler: component that accepts or rejects the operations requested by the transactions
  • Serial schedule: the actions of each transaction occur in contiguous sequences
  • Serializable schedule: Produces the same results as some serial schedule on the same transactions (by schedule equivalence)
  • The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking (???)

    CSR and VSR

    $CSR\subset VSR$

    View-serializability

    NOTE: what is a read-from operation?
  • $r_i(x)$ reads-from $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S

  • $w_j(x)$ is a final write if it is the last write on x that occurs in S

Two schedules are view-equivalent $S_i {\approx}_v S_j$ if they have the same reads-from relations and the same final writes.

A schedule is view-serializable if it is equivalent to a serial schedule.

VSR is the set of view-serializable schedules.

VSR

defines the schedule which are:

  • serializable
  • anomaly-free But is vast and costly to evaluate

    CSR

    Is a subset of VSR solutions, used because it contains costs.

Example of View-serializability
S3: w0(x) r2(x) r2(x) w2(x) w2(z)
S4: w0(X) r1(x) r2(x) w2(x) w2(z)
S5: w0(x) r2(x) r2(x) w2(x) w2(z) //incorrect
S6: w0(X) r1(x) r2(x) w2(x) w2(z) //incorrect
Another Example
S7: r1(x) r2(x) w1(x) w2(x)
S8: r1(X) r2(x) w2(x) r1(x)
S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)

Complexity

Deciding view-equivalence of two given schedules can be done in polynomial time Deciding View-serializability of a generic schedule is a NP-complete problem

Every conflict serializable schedule is also view-serializable, but the converse is not necessarily true