accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: FATE
Date Fri, 24 Feb 2012 23:18:51 GMT
One other thing.  The Accumulo master runs FATE.  If the master dies,
when it starts again it will start FATE.  FATE will resume executing
transactions.  A master could die and restart multiple times on
different machines during a create table operation and the client
requesting the table creation would never know there was a problem.

On Fri, Feb 24, 2012 at 5:55 PM, Keith Turner <keith@deenlo.com> wrote:
> These slides have some info.
>
> http://people.apache.org/~kturner/accumulo14_15.pdf
>
> FATE makes it very easy to write fault tolerant multi-step Accumulo
> operations, like create table and delete table.  Create table has to
> modify zookeeper and the Accumulo metadata table.  Its many steps.  If
> there is a machine failure at any step, it could leave Accumulo in an
> inconsistent state.  An operation like create table is broken into
> lots of little FATE operations.
>
> FATE operations are called Repo's (repeatable persisted operation).
> These operation must be written in such a way that its safe to call
> them one or more times.  A repo is submitted to FATE.  FATE pushes the
> repo into zookeeper before executing it.  Then FATE executes the repo,
> if the repo returns another repo it pushes that into zookeeper and
> then executes it.  If a repo does not return another repo, it
> consideres the FATE transaction done.  If a repo throws an exeception,
> then it pops all of the repo operation for the transaction out of
> zookeeper calling undo on each one.  A repo has an isReady method,
> this is provided so a FATE operation can wait for a condition on the
> cluster to become true w/o waiting tying up a thread.  If a repo is
> not ready then it will not be executed, later it will be read from
> Zookeeper and isReady called again.
>
> Clients submit FATE operations by first requesting a FATE transaction
> id (txid).  If the server fails while the client is making this call,
> its safe to request a txid again.  After the client gets a txid, they
> seed it with a REPO.  It safe for the client to attempt to seed the
> FATE operation multiple times in the case of server failures.  After
> the operation is seeded, the client waits for txid to finish.  If the
> server fails while they are waiting, they can wait on the txid again.
>
> FATE operations acquire persistent read/write locks in zookeeper on
> Accumulo tables.  These locks are slightly different from the
> zookeeper lock recipe.  First the locks use persistent sequential
> nodes instead of ephemeral nodes.  There is no need to use an
> ephemeral node because the lock is not related to a process, but
> rather a persistent FATE op that will continue to execute even if a
> process dies.  The lock data is the FATE txid plus W or R for read or
> write lock.  This type of locking is so much easier to deal with than
> ephemeral locks where you are never quite sure if the other process is
> really dead.
>
> Keith
>
>
> On Fri, Feb 24, 2012 at 5:24 PM, Mubarak Seyed <seyed@apple.com> wrote:
>> Hi Dev,
>>
>> Can someone please explain how does FATE (Fault tolerant executor) framework work
in Accumulo?
>>
>> Thanks,
>> Mubarak

Mime
View raw message