Tens, hundredss, thousands of transactions per second cannot be executed serially
Concurrency need to be controlled to avoid anomalies.
b(Tx) = begin transaction x
e(Tx) = end transaction x
Given these two transactions:
T1: UPDATE account
SET balance = balance + 3 WHERE client = 'Smith'
T2: UPDATE account
SET balance = balance + 3 WHERE client = 'Smith'
As the name states, one or more changes to the data are lost.
The error is produced by:
D=100
T1: R(D,V1)
V1 = V1 + 3
T2: R(D,V2)
V2 = V2 + 6
T1: W(V1,D)
T2: W(V2,D)
D=103
D=106!
The read of the second transaction happen before the rollback of T1,
therefore a wrong value is used for T2.
The first read (to V1) and the second read (to V3) of the same value D give different results because D is changed in the meantime.
T1 reads X and Y, T2 writes Y and Z, T1 has still the old value of Y.
This anomaly is due to the insertion of a "phantom" tuple that satisfies the conditions of a previous query.
Sequence of input/output operations performed by concurrent transactions. es:
S1: r1(x) r2(z) w1(x) w2
r1,w1
$r_1,w_1\in T_1$ $r_2,w_2\in T_2$
The class of acceptable schedules produced by a scheduler depends on the cost of equivalence checking, because scheduling must happen in real-time and the more is optimized my sheduling the more computational power I will need to obtain it.
$CSR\subset VSR$
$r_i(x)$ reads-from $w_j(x)$ in a schedule S when $w_j(x)$ precedes $r_i(x)$ in S and there is no $w_k(x)$ between $r_i(x)$ and $w_j(x)$ in S
$w_j(x)$ is a final write if it is the last write on x that occurs in S
Two schedules are view-equivalent $S_i {\approx}_v S_j$ if they have the same reads-from relations and the same final writes.
A schedule is view-serializable if it is equivalent to a serial schedule.
VSR is the set of view-serializable schedules.
defines the schedules which are:
anomaly-free
But is vast and costly to evaluate
Is a subset of VSR solutions, used because it contains costs.
S3: w0(x) r2(x) r1(x) w2(x) w2(z)
S4: w0(X) r1(x) r2(x) w2(x) w2(z)
S5: w0(x) r1(x) w1(x) r2(x) w1(z)
S6: w0(x) r1(x) w1(x) w1(z) r2(z)
In this example S3 is view serializable to S4 because both schedules have the following properties:
Meanwhile S5 and S6 are view serializable because in both schedules:
S7: r1(x) r2(x) w1(x) w2(x)
S8: r1(X) r2(x) w2(x) r1(x)
S9: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
Deciding view-equivalence of two given schedules can be done in polynomial time
Deciding View-serializability of a generic schedule is a NP-complete problem
An action ai is conflicting with aj (i!=j) if both are operations on common data and at least one of them is a write operation.
Two schedules are conflict-equivalent if they contain the same operations and all conflicting operation pair occur in the same order. One schedule is conflict-serializable if it is conflict-equivalent to a serial schedule.
CSR is the set of conflict-equivalent schedules.
Every conflict-serializable schedule is also view-serializable, but the converse is not necessarily true
In order to prove that $\;CSR\subset VSR\;$ we have to prove that conflict-equivalence implies view-equivalence.
Let S1 and S2 be two conflict-equivalent schedules:
So this implies that S1 and S2 are also view-equivalent.
It is done with a conflict graph that has:
A schedule is in CSR iff its conflict graph is acyclic.