Minimum for any join: cost of ∣R∣+∣S∣ with ∣R∣+1+1 buffer pages, store entire |R| in memory
Sort-Merge Join
Sort both R and S, then join
Each tuple in R partition merges with all tuples in matching S-partition
Advance pointer pointing to smaller tuple; rewind S-pointer as necessary
I/O cost = (Cost to sort R) + (Cost to sort S) + Merging cost; (merging cost = |R| + |S| if no rewind, |R| + ||R|| * |S| if rewind everytime)
Optimised (partial sorting): if B>N(R,i)+N(S,j), stop sorting, N(R,i)= total number of sorted runs of R after pass i
I/O cost if B>2∣S∣, 3∗(∣R∣+∣S∣) => 2 for creating initial sorted runs (one pass is sufficient), 1 for merge
else 3∗(∣R∣+∣S∣)+c∗∣R∣+d∗∣S∣, where c and d is number of merge passes for R, S
Grace Hash Join (no Hybrid Hash Join)
Split R and S into k partitions each, join these k partitions together in probing phase
read Ri to build hash table (build relation) - pick smaller for build (must fit in-memory)
read Si to probe hash table (probe relation)
partitioning phase: 1 input buffer, k hash buffers, once full, flush into page on disk
probing phase: 1 input buffer, 1 output buffer, 1 hash table; use different h'(.) and build a hash table for each partition, then, probe with S, if match, add to output buffer
build R1, probe S1, build R2, ...
once output buffer is full, flush (don't flush between partitions)
set k=B−1 to minimise partition sizes
assuming uniform hashing distribution:
size of each partition Ri: B−1∣R∣
size of hash table for Ri: B−1f∗∣R∣, f is fudge factor
during probing phase, B>B−1f∗∣R∣+2, one each for input & output
approximately, B>f∗∣R∣
Partition Overflow Problem: hash table doesn't fit in memory, recursively partition overflowed partitions
I/O cost = 3(|R| + |S|) if no partition overflow
I/O cost = (c * 2 + 1)(|R| + |S|) where c is number of partitioning phases
monoversion: each read action returns the most recently created object version
VSS ⊆ MVSS (not other way round)
SI: Xact T takes snapshot of committed state of DB at start of T
can't read from concurrent Xacts
Concurrent if overlap start & commits
Oi is more recent than Oj if Ti commit after Tj
Concurrent Update Property: if multiple concurrency Xact update same object, only one can commit (if not, may not be serialisable)
First Committer Win (FCW): check at point of commit
First Updater Win (FUW) - locks only used for checking (NOT lock-based)
to update O: request X-lock on O; when commit / abort, release locks
if not held by anyone:
if O has been updated by concurrent Xact: abort
else: grant lock
else: wait for T' to abort / commit
if T' commit: abort
else: use (if not held by anyone) case
Garbage Collection: delete version Oi if exists a newer version Oj st for every active Xact Tk that started after commit of Ti, Tj commits before Tk starts (aka all active Xact can refer to Oj)
SI performs similarly to Read Committed, but different anomalies; does not guarantee serialisablity too (violates MVSS, but not detected)
Write Skew Anomaly
Both Xact read from initial value
Read-Only Xact Anomaly
A Read-Only Xact reads values that shouldn't be possible
SSI: keep track of rw dependencies among concurrent Xact
Ti -rw-> Tj -rw-> Tk: abort one of them (has false positives)
ww from T1→T2 if T1 writes to O, then T2 writes immediate successor of O
T1 commit before Tj and no Xact that commits between them writes to O
wr from T1→T2 if T1 writes to O, then T2 reads this ver. of O
rw from T1→T2 if T1 reads a ver. of O, then T2 writes immediate successor of O
Dependency Serialisation Graph (dashed if concurrent, solid if not)
if S is SI that is not MVSS, then (1) at least one cycle in DSG, (2) for each cycle, exists Ti, Tj, Tk st
Ti and Tk might be same Xact
Ti and Tj are concurrent with Ti−rw−>Tj
AND Tj and Tk are concurrent with Tj−rw−>Tk
Recovery Manager
Undo: remove effects of aborted Xact to preserve atomicity
Redo: re-installing effects of committed Xact to preserve durability
Failure
transaction failure: transaction aborts
application rollbacks transaction (voluntary)
DBMS rollbacks transaction (e.g. deadlock, violation of integrity constraint)
system crash: loss of volatile memory contents
power failure
bug in DBMS / OS
hardware malfunction
media failures: data is lost / corrupted on non-volatile storage
disk head crash / failure during data transfer
Buffer Pool
Can evict dirty uncommitted pages? (yes => steal, no => no-steal)
Must all dirty pages be flushed before Xact commits? (yes => force => no redo, no => no-force)
in practice: use steal, no-force (need undo & need redo)
no steal => no undo needed (not practical because not enough buffer pages leads to blocking)
force => no redo needed (hurts performance of commit because random I/O)
Log-Based DB Recovery
Log (trail / journey): history of actions executed by DBMS - stored as sequential file of records in stable storage - uniquely identified by LSN
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) - designed for steal, no-force approach, assumes strict 2PL
Xact Table (TT): one entry per active Xact, contains: XactID, lastLSN (most recent for Xact), status (C or U) because kept until End Log Record
Dirty Page Table (DPT): one entry per dirty page, contains: pageID, recLSN (earliest for update that caused dirty)
Normal Processing
Updating TT (Xact ID, lastLSN, status):
first log record for Xact T: create new entry in TT with status = U
for new log record: update lastLSN
when commit: update status = C
when end log record: remove from TT
Updating DPT (pageID, recLSN):
when page P is updated (and not in DPT): create new entry with recLSN = LSN (don't update this)
when flushed: remove from DPT
Log Records
default: LSN, type, XactID, prevLSN (for same Xact, first points to NULL)
update log record (ULR): pageID, byte offset (within page), length (in bytes of update), before-image (for undo), after-image (for redo)
compensation log record (CLR) - made when ULR is undone: pageID, undoNextLSN (prevLSN in ULR), action taken to undo
commit log record
abort log record - created when aborting Xact: undo is initiated for this Xact
end log record - created after book-keeping after commit / abort is done
(simple) checkpoint log record: stores Xact table
(fuzzy) begin_checkpoint log record: time of snapshot of DPT & TT
Write-ahead logging (WAL) protocol: do not flush uncommitted update until log record is flushed
need to log changes needed for undo
to enforce, each DB page contains pageLSN (most recent log record), before flushing page P, ensure all log records up to pageLSN is flushed
Implementing Commit
Force-at-commit protocol: do not commit until after-images of all updated records are in stable storage
to enforce, write commit log record for Xact, flush all log records (not data)
Xact is committed ⟺ its commit log record is written to stable storage
Implementing Restarts (order matters)
Analysis phase: determines point in log to start Redo phase, identifies superset of dirtied buffer pool pages & active Xacts at time of crash
Redo phase: redo actions to restore DB state
Undo phase: undo actions of uncommitted Xacts
Analysis Phase
init DPT and TT to be empty
sequentially scan logs
if r is end log record: remove T from TTelse: add T to TT if not in TT set lastLSN = r's LSN status = C if commit log recordif (r is redoable) & (its P not in DPT): add P to DPT(pageID = P, recLSN = r)
Redo Phase
opt cond (defn already flushed) = (P is not in DPT) or (P's recLSN in DPT > r's LSN)
redoLSN = smallest recLSN in DPTlet r = log record w/ redoLSNstart scan from r:if (r is ULR | CLR) & (not opt cond): fetch page P for r if P's pageLSN < r's LSN: haven't redo, so redo action set P's pageLSN = r's LSN else: because <= P's pageLSN is OK set P's recLSN in DPT = P's pageLSN+1
at end: create end log records for Xacts with status = C, & remove from TT
Undo Phase: abort loser Xacts
init L = lastLSNs (status = U) from TTrepeat until L is empty delete largest lastLSN from L let r be log record for ^ if r is ULR: create CLR r2 r2 undoNextLSN = r's prevLSN r2 prevLSN = r's LSN undo action update P's pageLSN = r2's LSN UpdateLAndTT(r's prevLSN) if r is CLR: UpdateLAndTT(r's undoNextLSN) if r is abort log record: UpdateLAndTT(r's prevLSN)def UpdateLAndTT(lsn): if lsn is not null: add lsn to L else: # reached first log => done create end log record for T remove T from TT
Simple Checkpointing
periodically perform checkpointing: suspend normal processing, wait until all current processing is done, flush all dirty pages in buffer (to sync log record & DB), write checkpoint log record containing TT, resume
during Analysis Phase, start from begin_ LR, init with TT, DPT in end_
Fuzzy Checkpointing (no suspension)
snapshot DPT & TT, write begin_ log record
write end_ log record (very slow to write)
write special master record containing LSN of begin_ to known location (for fast retrieval) in stable storage
during Analysis Phase, start from begin_, init with TT and DPT in end_