DB2 - lesson 02

Paraboschi

12 October 2015

Concurrency Control

Tens, hundredss, thousands of transactions per second cannot be executed serially

Concurrency need to be controlled to avoid anomalies.

Concurrent executions

b(Tx) = begin transaction x
e(Tx) = end transaction x

serial b(T1) e(T1) b(T2) e(T2)
or b(T2) e(T2) b(T1) e(T1)
interleaved b(T1) b(T2) e(T1) e(T2)
nested b(T1) b(T2) e(T2) e(T1)

Problems due to concurrency

Given these two transactions:

T1: UPDATE account

 SET balance = balance + 3  
 WHERE client = 'Smith'

T2: UPDATE account

 SET balance = balance + 3  
 WHERE client = 'Smith'

Execution with lost UPDATE

As the name states, one or more changes to the data are lost.
The error is produced by:

R1 R2 W1 W2
R1 R2 W2 W1

D=100
T1: R(D,V1)
V1 = V1 + 3
T2: R(D,V2)
V2 = V2 + 6
T1: W(V1,D)
T2: W(V2,D)
D=103
D=106!

Dirty read

The read of the second transaction happen before the rollback of T1,
therefore a wrong value is used for T2.

R1 W1 R2 abort1 W2 > D=100
T1: R(D,V1)
T1: V1 = V1 + 3
T1: W(V1,D) D=103
T2: R(D,V2)
T1: ROLLBACK
T2: V2 = V2 + 6
T2: W(V2,D) D=109!

Nonrepeatable read

The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime.

R1 R2 W2 R1 >D=100
T1: R(D,V1)
T2: R(D,V2)
T2: V2 = V2 + 6
T2: W(V2,D) D=106
T1: R(D,V3) V3<>V1!

Ghost update

T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y.

R1 R1 R2 R2 W2 W2 R1 > X+Y+Z=100, X=50, Y=30, Z=20
T1: R(X,V1), R(Y,V2)
T2: R(Y,V3), R(Z,V4)
T2: V3 = V3 + 10, V4=V4-10
T2: W(V3,Y), W(V4,Z) (Y=40, Z=10)
T1: R(Z,V5) (for T1, V1+V2+V5=90!)

Phantom insert

This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query.

R1 W2 (new data) R1 >T1: C=AVG(B:A=1)
T2: Insert (A=1,B=2)
T1: C=AVG(B: A=1)

Schedule

Sequence of input/output operations performed by concurrent transactions. es:

S₁: r₁(x) r₂(z) w₁(x) w₂
r₁,w₁

$r_1,w_1\in T_1$ $r_2,w_2\in T_2$

Principles of Concurrent Control

Goal: to reject schedules that cause anomalies
Scheduler: component that accepts or rejects the operations requested by the transactions
Serial schedule: the actions of each transaction occur in contiguous sequences
Serializable schedule: Produces the same results as some serial schedule on the same transactions (by schedule equivalence)
The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking, because scheduling must happen in real-time and the more is optimized my sheduling the more computational power I will need to obtain it.

CSR and VSR

$CSR\subset VSR$

View-serializability

NOTE: what is a read-from operation?
$r_i(x)$ reads-from $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S
$w_j(x)$ is a final write if it is the last write on x that occurs in S

Two schedules are view-equivalent $S_i {\approx}_v S_j$ if they have the same reads-from relations and the same final writes.

A schedule is view-serializable if it is equivalent to a serial schedule.

VSR is the set of view-serializable schedules.

VSR

defines the schedules which are:

serializable
anomaly-free
But is vast and costly to evaluate

CSR

Is a subset of VSR solutions, used because it contains costs.

Example of View-serializability

S3: w0(x) r2(x) r1(x) w2(x) w2(z)
S4: w0(X) r1(x) r2(x) w2(x) w2(z)
S5: w0(x) r1(x) w1(x) r2(x) w1(z)
S6: w0(x) r1(x) w1(x) w1(z) r2(z)

In this example S3 is view serializable to S4 because both schedules have the following properties:

r2(x) reads from w0(x)
r1(x) reads from w0(x)
w2(z) is the final write

Meanwhile S5 and S6 are view serializable because in both schedules:

r1(x) reads from w0(x)
r2(x) reads from w1(x)
w1(z) is the final write

Another Example

S7: r1(x) r2(x) w1(x) w2(x)
S8: r1(X) r2(x) w2(x) r1(x)
S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)

S7 correspond to a lost update
S8 correspond to a non repeatable read
S9 correspond to a ghost update
They are all non view serializable

Complexity

Deciding view-equivalence of two given schedules can be done in polynomial time
Deciding View-serializability of a generic schedule is a NP-complete problem

CSR

An action ai is conflicting with aj (i!=j) if both are operations on common data and at least one of them is a write operation.

read-write conflicts (rw, wr)
write-write conflicts (ww)

Two schedules are conflict-equivalent if they contain the same operations and all conflicting operation pair occur in the same order. One schedule is conflict-serializable if it is conflict-equivalent to a serial schedule.

CSR is the set of conflict-equivalent schedules.

CSR and VSR

Every conflict-serializable schedule is also view-serializable, but the converse is not necessarily true

In order to prove that $\;CSR\subset VSR\;$ we have to prove that conflict-equivalence implies view-equivalence.

Let S1 and S2 be two conflict-equivalent schedules:

They have the same final writes, if they didn't, there would be at least two writes with a different order
They have the same reads-from relations, if they didn't, there would be at least one read-write pair with a different order

So this implies that S1 and S2 are also view-equivalent.

Testing conflict-serializability

It is done with a conflict graph that has:

One node for each transaction Ti
One arc from Ti to Tj if it exists at least one conflict between an action of Ti and an action of Tj such as ai precedes aj.

A schedule is in CSR iff its conflict graph is acyclic.

lesson_02.md 5.9 KB Cronologia Originale

DB2 - lesson 02

Paraboschi

12 October 2015

Concurrency Control

Concurrent executions

Problems due to concurrency

Execution with lost UPDATE

Dirty read

Nonrepeatable read

Ghost update

Phantom insert

Schedule

Principles of Concurrent Control

CSR and VSR

View-serializability

NOTE: what is a read-from operation?

VSR

CSR

Example of View-serializability

Another Example

Complexity

CSR

CSR and VSR

Testing conflict-serializability

lesson_02.md 5.9 KB

Cronologia Originale