Tapahtumanhallinnan pulmakohtia ja ratkaisuja

Tapahtumanhallinnan pulmakohtia ja ratkaisuja TJT E54 Kehittämismenetelmät ja arkkitehtuurit liiketoiminnassa Kevät 2006 Ville Seppänen <rissepp@cc.jyu.fi> ACID Atomicity? Consistency? Isolation? Durability? 1

Tapahtuman ominaisuudet: Jakamattomuus (atomicity) Tapahtuman järjestelmään tekemät muutokset vahvistetaan () ainoastaan mikäli kaikki T:n operaariot onnistuvat. Muussa tapauksessa T keskeytetään (abort) Virheen sattuessa järjestelmään mahdollisesti tehdyt muutokset peruutetaan (roll-back, backward error recovery) siten, että järjestelmän alkuperäinen tila palautetaan lokitietojen perusteella Tapahtuma vs. Operaatio? Tapahtuma T = operaatiot t 1,..., t n Esim 1. T = Siirretään 100 tililtä A tilille B t 1 = Luetaan A:n saldo AS. t 2 = Kirjoitetaan AS = AS 100 t 3 = Luetaan B:n saldo BS t 4 = Kirjoitetaan BS = BS + 100 Esim 2. T = Käsitellään Teuvon tilaus koskien CD-levyä Musa t 1 = Vähennetään Musa:n varastomäärää t 2 = Lähetetään Teuvolle Musa t 3 = Lähetetään Teuvolle lasku 2

Tapahtuman ominaisuudet: Johdonmukaisuus (consistency) T:n järjestelmän tilaan tekemien muutosten tulee vastata kuvatussa ilmiössä tapahtuvia muutoksia ja muutosten tulee tapahtua järjestelmän eheysvaatimusten (integrity constraints) rajoissa Useimmissa tapauksissa järjestelmä on väistämättä epäjohdonmukaisessa tilassa jossain T:n välivaiheessa. Eheys tarkastetaan tavallisesti vasta T:n pyytäessä vahvistusta (deferred integrity constraints) Tapahtuman ominaisuudet: Johdonmukaisuus (consistency) State n ACID State n+1 ACID State n+2 abort abort write(a, (A-10)) read(b) write(b, (B+10)) Temporary inconsistency 3

Tapahtuman ominaisuudet: Eriytyvyys (isolation) Järjestelmän samanaikaisesti suorittamat tapahtumat täytyy eristää toisistaan; ts. Ne eivät saa päästä käsiksi epäjohdonmukaisessa tilassa oleviin resursseihin (vrt. edellinen kalvo) Virheelliset tulokset Ketjuuntuvat peruutukset (cascading rollback) Sarjallistuvuusvaatimus (ei sarjallisen T:n tulos sama kuin jos kunkin T:n operaatiot suoritettaisiin peräkkäin) Hallintakeinoina mm. optimistinensamanaikaisuuden hallinta, aikaleimaus, lukitusmenetelmät Sopiva menetelmä tapauskohtaisesti Tapahtuman ominaisuudet: Eriytyvyys (isolation) write(a, (A+10)) Update loss write(a, (A+10)) abort Dirty read write(a, (A+10)) write(a, (A+10)) read(b) read(b) write(b, (B-10)) read(c) write(c, (C+10)) read(c) SUM=A+B+C Incorrect summary (A+B+C=100) Inconsistent read write(a, (A+10)) 4

Tapahtuman ominaisuudet: Pysyvyys (durability) Tapahtuman järjestelmään tekemät muutokset eivät saa kadota tahattomasti tai tapaturmaisesti Hajautettu tapahtumanhallinta 5

Kaksivaiheinen vahvistus (2-phase protocol) Atomic ment protocol An algorithm that ensures that all the processes involved in a distributed transaction either or abort Coordination process Responsible for controlling the overall atomicity Controls the voting A number of participating processes Execute parts of a distributed transaction Kaksivaiheinen vahvistus (2-phase protocol) Voting procedure phase one The coordinator sends vote requests to all participants When a participant receives the vote request, it replies by voting either Yes or No, according to whether it is able to carry out the task Participants that voted Yes start waiting for a comfirmation message from the coordinator. Participants that voted No can unilateraly abort 6

Kaksivaiheinen vahvistus (2-phase protocol) Voting procedure phase two The coordinator collects all vote messages If all the participants voted Yes the coordinator decides to and sends Commit messages to all participants Otherwise, the coordinator decides to abort and sends Abort messages to all participants that voted Yes According to the received message, a participant decides to or abort Kaksivaiheinen vahvistus (2-phase protocol) It is possible that messages may not arrive due to a failure and processes may be waiting forever: A timeout mechanism must be able to interrupt the waiting period In addition, the coordinator may attach the list of participants to the vote request message and thereby let the participants to know each other: Cooperative termination protocol 7

Cooperative termination protocol If participant p comes across the timeout while waiting for the Commit or Abort message from the coordinator, it can request this from the participant q If q has already decided to (or abort) it sends a Commit (or Abort) to p In the case that q has not voted yet it can decide to abort and send Abort to p If q has voted Yes but has not received the final Commit or Abort request from the coordinator it cannot help p in making the decision Nested transactions A mechanism to facilitate transactions in distributed systems (Moss 1981) A tree-like model with parents, children, toplevel (root), and leaves 8

Nested transactions Rules A parent can spawn any number of children Any number of children may be active concurrently Parent can t access data when its children are alive A child can inherit a lock held by any ancestor On child, its locks are inherited (antiinheritance) by parent Nested transactions Rules, continued Commit dependency: parent can only after all its children terminate (/abort) Abort dependency: On parent abort, even updates of ted children are undone Updates persist only if all ancestors 9

Nested transactions Nested transactions Intra-transactional parallelism Safe concurrency Reduced response time Intra-transactional recovery control Finer control of error handling Improves availability System modularity Composition of separately developed modules 10

Save points Can be seen as a check point in a transaction that forces the system to save the state of the running application and return a save point identifier for future reference Instead of removing an entire transaction after a failure of single operation (subprocess) backward recovery can return the last valid state of the transaction saved in the save point reference Compensation & Sagas Long-living and distributed transactions are not reasonable to implement as a single ACID transaction Keep the resources locked for the long periods of time, which can significantly delay the of other simultaneously executed transactions Deadlock frequency grows with the fourth power of the transaction size 11

Compensation & Sagas Atomic ment protocols ease the implementation of ACID transactions in distributed systems but futher transactions to become long living Backward recovery is expensive and difficult to implement The idea of compensating transactions (processes, operations) was introduced to simulate the transactional properties in such applications Compensation & Sagas In Saga model (Garcia-Molina & Salem 1987) a long living transaction is broken up into a collection of subtransactions that interleave with other transactions The results of a subtransaction can be made immediately visible af ter it is ted 12

Compensation & Sagas Saga subtransactions can be executed independently but they must finally form an atomic unit Should any of subtransactions fail the failed transaction is aborted and already ted subtransactions are undone Removing the ted transaction means that results of subsequent transactions may become inconsistent and cause cascading rollbacks Instead of removing the results of failed transaction by using rollback the idea of compensating transaction is introduced Compensation & Sagas Each Saga subtransaction T i is provided with a compensating transaction C i If the compensation is not needed saga T 1, T 2,..., T n will execute as a sequence of transactions In case of compensation the sequence would be T 1, T 2,..., T k, C k,..., C 2, C 1 where 0 k < n, and C k is a predefined compensating transaction of T i A compensating transaction doesn t necessarily restore the state that prevailed before the execution of T i but rather undoes operation(s) of T i from a semantic point of view 13

Compensation & Sagas Contrary to traditional backward recovery mechanism, which is based on compensating operations automatically deduced from the schedule, semantically compensating transactions may not be automatically generated Even relatively simple actions (such as subtraction) must follow the integrity constraints Everything cannot be undone For example, real actions with real consequences : impossible to undo but it may be possible to override the effect with a new action Committed acceptable & aborted acceptable termination states: business considerations Forward recovery Combining the save points, compensation and recovery of failed transaction Transaction that caused the failure is aborted using the conventional rollback Committed transactions are undone in reversed order using their compensating counterparts until the save point is found (backward recovery) Finally, the transaction is restarted from the location of found save point (forward recovery) 14

Forward recovery Save points Forward recovery When recovering, it s not always feasible or even possible to complete the transaction by trying to re-execute the original steps again However, it s possible that the same objective can be achieved with different ways. Alternative methods of forward recovery should be considered during the process design phase 15

Transactional processes Typical scenario: a top-level process consists of subcomponents that are logically connected to the higherlevel process but are executed virtually independently A global transaction a collage of all separate transactions that are included in the process often long-living and distributed Each global transaction can be divided into a set of local transactions can be represented as a tree of hierarchically ordered tasks, which may also include manual operations Transactional processes The lowest level consists of DB ACID transactions 16

Transactional processes Transactional processes 17