Plamen Jeliazkov (JIRA)
[jira] [Commented] (HADOOP-10641) Introduce Coordination Engine
Date Sun, 08 Jun 2014 22:57:03 GMT

    https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021473#comment-14021473

Plamen Jeliazkov commented on HADOOP-10641:

Hi Lohit, thanks for your comments!
# checkQuorum is an optimization some coordination engines may choose to implement in order
to fail-fast to client requests. In the NameNode case, if quorum loss was suspected, that
NameNode could start issuing StandbyExceptions.
# You are correct that the ZKCoordinationEngine does not implement ZNode clean-up currently.
That is because it was made as a proof of concept for the CoordinationEngine API. Nonetheless,
proper clean-up can be implemented. All one has to do is delete the ZNodes that everyone else
has already learned about.
## Suppose you have Node A, B, and C, and Agreements 1, 2, 3, 4, and 5.
## Node A and B learn Agreement 1 first. Node C is a lagging node. A & B contain 1. C
contains nothing.
## Node A and B continue onwards, learning up to Agreement 4. A & B contain 1, 2, 3, and
4 now. C contains nothing.
## Node C finally learns Agreement 1. A & B contain 1, 2, 3, and 4 now. C contains 1.
## We can now discard Agreement 1 from persistence because we know that all the Nodes, A,
B, and C, have safely learned about and applied Agreement 1.
## We can apply this process for all other Agreements. 

> Introduce Coordination Engine
> -----------------------------
>                 Key: HADOOP-10641
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10641
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HADOOP-10641.patch, HADOOP-10641.patch, HADOOP-10641.patch
> Coordination Engine (CE) is a system, which allows to agree on a sequence of events in
a distributed system. In order to be reliable CE should be distributed by itself.
> Coordination Engine can be based on different algorithms (paxos, raft, 2PC, zab) and
have different implementations, depending on use cases, reliability, availability, and performance
> CE should have a common API, so that it could serve as a pluggable component in different
projects. The immediate beneficiaries are HDFS (HDFS-6469) and HBase (HBASE-10909).
> First implementation is proposed to be based on ZooKeeper.

