Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 633311027F for ; Mon, 7 Apr 2014 22:48:36 +0000 (UTC) Received: (qmail 51593 invoked by uid 500); 7 Apr 2014 22:48:26 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 50784 invoked by uid 500); 7 Apr 2014 22:48:25 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 50698 invoked by uid 500); 7 Apr 2014 22:48:24 -0000 Delivered-To: apmail-hadoop-hbase-dev@hadoop.apache.org Received: (qmail 50624 invoked by uid 99); 7 Apr 2014 22:48:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Apr 2014 22:48:22 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of olorinbant@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-wg0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Apr 2014 22:48:17 +0000 Received: by mail-wg0-f51.google.com with SMTP id k14so111961wgh.22 for ; Mon, 07 Apr 2014 15:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=IugHahycFD6fpfCJXJpwXkKkqocth66RkZiJ4UbkiMw=; b=nTR7fwkMmU5nIm8+kReETg/8qWcAcF6F+ntlmISbOw8AlG3rH86CKros5n4v4D6GQ7 RinzySFkaO2qER8bCZW83RJbYyGCBW0jvVfNzaWYFrAMRWbVkIk1hGzNnRrhRj2H7ycc wiYi8l4JkThYptksY0KyKXdZ52qbqqE5bYZjTysBd4kYJLPIzLSSCiOjCLklfrrPpoqc 02vfg4obdUN46BWGGzY073gQwVRuDHturRAL7ltbH5fzZRu0CiXo0QfnjvBO4vZ0lUPt wu1qeo0+MYVKqLfcFu1AzjnvVIVnaYPPJFwNp6LdKHRWVwjrltsJ2Xnu/fE5ljplyLUu vbng== X-Received: by 10.194.202.166 with SMTP id kj6mr260564wjc.48.1396910876265; Mon, 07 Apr 2014 15:47:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.16.194 with HTTP; Mon, 7 Apr 2014 15:47:26 -0700 (PDT) In-Reply-To: References: From: Mikhail Antonov Date: Mon, 7 Apr 2014 15:47:26 -0700 Message-ID: Subject: Re: Hadoop Summit EU To: dev@hbase.apache.org Cc: =?ISO-8859-1?Q?Enis_S=F6ztutar?= , Ted Yu , "hbase-dev@hadoop.apache.org" , "hdfs-dev@hadoop.apache.org" , Sanjay Radia Content-Type: multipart/alternative; boundary=047d7b873db49014fa04f67ba804 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b873db49014fa04f67ba804 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Well...Since that was mentioned anyway, allow me a tiny correction/clarification.. :) It's ConsensusNode, not ConsistencyNode, and it's not really custom Paxos implementation, it's more like interface for coordination service atop standard NameNode, which may be backed by any consensus library/algorithm, be it variation of Paxos, ZooKeeper/ZAB, Raft or anything else. The consensus API itself (ConsensusNode code) and ZooKeeper-based implementation of consensus protocol is going to be open-sourced (we're working on it), and once it's out, consensus libraries authors are welcome to start integration with their libs too. Regarding HBase - that's actually what's being developed under HBASE-10909, HBASE-10866 and referenced jiras (everybody interested is welcome to discuss/feedback). -Mikhail 2014-04-07 11:36 GMT-07:00 Enis S=F6ztutar : > Ops sorry this was intented for internal lists. Apologies for any > confusion. > > Enis > > On Monday, April 7, 2014, Enis S=F6ztutar wrote: > > > Me and Devaraj attended their talk on their solution for paxos based > > namenode and HBase replication. > > > > They have two solutions, one for single datacenter, and the other multi > DC > > geo replication. > > > > For the namenode, there is a wrapper, called ConsistencyNode, that > > basically gets the requests, replicate it via their consistency protoco= l > to > > other CNodes within the DC (paxos based) in the edit log. If the propos= al > > for this is accepted, the changes are made durable. However, from my > > understanding, on the read side the client chooses only one replica to > > read. The client decides to connect to one of the replica namenodes, > which > > means that it is not doing a paxos read. I think they also wrapped the > > client, so that if it gets a FileNotFoundException or something similar= , > it > > will retry on a different server. Also they track the last seen proposa= l > id > > as a transaction id for this as well from my understanding (so > > read-what-you-write consistency maybe?). The full details of the > > consistency was not clear to me from the presentation. > > For their multi-DC replication, they are doing a similar thing, but the > > data replication is not handled by paxos, only the namenode metadata. F= or > > each datacenter, they have a target replication factor (can be set > > differently for each DC, like 0 because of regulatory reasons). The > > metadata of NN is replicated via a similar mechanism. The data > replication > > is async to the metadata replication though. When a block is finalized, > the > > CNode quorum on that particular DC, schedules a remote copy to one of t= he > > datacenters. That copy job, copies the block with directly writing the > > block from the datanode to a remote datanode. Then that remote DC block > is > > replicated to the target replication by that DC's CNode quorum. When th= e > > target is reached, that DC will create another proposal about the data > > replication being complete. So the state machine probably contains wher= e > > each data is replicated, but they were still mentioning the client > getting > > DataNotReplicatedException or something. > > > > Their work on HBase is still WIP. I do not remember much details on the > > protocol, except it uses the same replication protocol (their "patented= " > > paxos based replication). > > > > Of course the devil is in the details. I did not get that from the > > presentation. > > > > As a side note, Doug when asked, was saying that they are cooking > > something for backups, so maybe their "secret project" also contains > > multi-DC consistent state? > > > > Enis > > > > > > On Sat, Apr 5, 2014 at 1:55 AM, Ted Yu > > > wrote: > > > >> Enis: > >> There was a talk by Konstantin Boudnik< > http://hadoopsummit.org/amsterdam/speakers/#konstantin-boudnik> > >> . > >> > >> Any interesting material from his presentation ? > >> > >> Cheers > >> > > > > > --=20 Thanks, Michael Antonov --047d7b873db49014fa04f67ba804--