hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Antonov <olorinb...@gmail.com>
Subject Re: Hadoop Summit EU
Date Mon, 07 Apr 2014 22:47:26 GMT
Well...Since that was mentioned anyway, allow me a tiny
correction/clarification.. :)

It's ConsensusNode, not ConsistencyNode, and it's not really custom Paxos
implementation, it's more like interface for coordination service atop
standard NameNode, which may be backed by any consensus library/algorithm,
be it variation of Paxos, ZooKeeper/ZAB, Raft or anything else. The
consensus API itself (ConsensusNode code) and ZooKeeper-based
implementation of consensus protocol is going to be open-sourced (we're
working on it), and once it's out, consensus libraries authors are welcome
to start integration with their libs too.

Regarding HBase - that's actually what's being developed under HBASE-10909,
HBASE-10866 and referenced jiras (everybody interested is welcome to
discuss/feedback).

-Mikhail


2014-04-07 11:36 GMT-07:00 Enis Söztutar <enis@hortonworks.com>:

> Ops sorry this was intented for internal lists. Apologies for any
> confusion.
>
> Enis
>
> On Monday, April 7, 2014, Enis Söztutar <enis@hortonworks.com> wrote:
>
> > Me and Devaraj attended their talk on their solution for paxos based
> > namenode and HBase replication.
> >
> > They have two solutions, one for single datacenter, and the other multi
> DC
> > geo replication.
> >
> > For the namenode, there is a wrapper, called ConsistencyNode, that
> > basically gets the requests, replicate it via their consistency protocol
> to
> > other CNodes within the DC (paxos based) in the edit log. If the proposal
> > for this is accepted, the changes are made durable. However, from my
> > understanding, on the read side the client chooses only one replica to
> > read. The client decides to connect to one of the replica namenodes,
> which
> > means that it is not doing a paxos read. I think they also wrapped the
> > client, so that if it gets a FileNotFoundException or something similar,
> it
> > will retry on a different server. Also they track the last seen proposal
> id
> > as a transaction id for this as well from my understanding (so
> > read-what-you-write consistency maybe?). The full details of the
> > consistency was not clear to me from the presentation.
> > For their multi-DC replication, they are doing a similar thing, but the
> > data replication is not handled by paxos, only the namenode metadata. For
> > each datacenter, they have a target replication factor (can be set
> > differently for each DC, like 0 because of regulatory reasons). The
> > metadata of NN is replicated via a similar mechanism. The data
> replication
> > is async to the metadata replication though. When a block is finalized,
> the
> > CNode quorum on that particular DC, schedules a remote copy to one of the
> > datacenters. That copy job, copies the block with directly writing the
> > block from the datanode to a remote datanode. Then that remote DC block
> is
> > replicated to the target replication by that DC's CNode quorum. When the
> > target is reached, that DC will create another proposal about the data
> > replication being complete. So the state machine probably contains where
> > each data is replicated, but they were still mentioning the client
> getting
> > DataNotReplicatedException or something.
> >
> > Their work on HBase is still WIP. I do not remember much details on the
> > protocol, except it uses the same replication protocol (their "patented"
> > paxos based replication).
> >
> > Of course the devil is in the details. I did not get that from the
> > presentation.
> >
> > As a side note, Doug when asked, was saying that they are cooking
> > something for backups, so maybe their "secret project" also contains
> > multi-DC consistent state?
> >
> > Enis
> >
> >
> > On Sat, Apr 5, 2014 at 1:55 AM, Ted Yu <yuzhihong@gmail.com
> <javascript:_e(%7B%7D,'cvml','yuzhihong@gmail.com');>
> > > wrote:
> >
> >> Enis:
> >> There was a talk by Konstantin Boudnik<
> http://hadoopsummit.org/amsterdam/speakers/#konstantin-boudnik>
> >> .
> >>
> >> Any interesting material from his presentation ?
> >>
> >> Cheers
> >>
> >
> >
>



-- 
Thanks,
Michael Antonov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message