Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D2ED969FE for ; Wed, 13 Jul 2011 01:47:13 +0000 (UTC) Received: (qmail 44119 invoked by uid 500); 13 Jul 2011 01:47:13 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 44046 invoked by uid 500); 13 Jul 2011 01:47:12 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 44038 invoked by uid 99); 13 Jul 2011 01:47:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2011 01:47:12 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of teddyyyy123@gmail.com designates 74.125.83.42 as permitted sender) Received: from [74.125.83.42] (HELO mail-gw0-f42.google.com) (74.125.83.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2011 01:47:07 +0000 Received: by gwb17 with SMTP id 17so3009700gwb.15 for ; Tue, 12 Jul 2011 18:46:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=FNcI//Gn2S9gOVAs1d51C3IKDsPVq4cXx11ac6c6tW0=; b=GH6DjLMwEye5ElcyTT7tV2I22ulFPe2WGDXVxLl/CvmWxR4B8gz8Dh6NPVRwfi+3I2 4UdlO+zi8kes6ooklPXKj8/Y+gV/Nv8RvZcBsYv7XsxiZMOHra8wG5LLq7t3NILgF87b hRdIQszuxEUnDQNZSrEICn0H32PJO+ZM8Obd8= MIME-Version: 1.0 Received: by 10.236.190.69 with SMTP id d45mr828884yhn.208.1310521606483; Tue, 12 Jul 2011 18:46:46 -0700 (PDT) Received: by 10.236.202.166 with HTTP; Tue, 12 Jul 2011 18:46:46 -0700 (PDT) In-Reply-To: References: Date: Tue, 12 Jul 2011 18:46:46 -0700 Message-ID: Subject: Re: question on ZAB protocol From: Yang To: user@zookeeper.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable btw, to give proper credit, I thought about this question after reading http://www.vldb.org/pvldb/vol4/p243-rao.pdf which actually just waits for 1 reply On Tue, Jul 12, 2011 at 6:45 PM, Yang wrote: > I read the ZAB paper before, and never realized this question, but > find out today that I can't answer why, so I'm bringing it up here. > > according to the paper > > B. Reed and F. P. Junqueira. A simple totally ordered broadcast > protocol. In LADIS =9208: Proceedings of the 2nd Workshop > on Large-Scale Distributed Systems and Middleware, pages 1=966, > New York, NY, USA, 2008. ACM. > > > > the leader broadcasts a write to all replicas, and then waits for a > quorum to reply, before sending out the COMMIT. > why is the quorum necessary (i.e. why can't the leader just wait for > one reply and start sending the COMMIT?)?? > > now that I think about it, it seems that waiting for just one reply is > enough, because the connection from leader to replicas are FIFO, as > long as the replicas do not die, > they will eventually get the writes, even though the writes arrive at > them after the leader starts the COMMIT. > > the only reason I can think of =A0for using a quorum is to tolerate more > failures: if the only replied replica =A0dies, and leader dies, then we > lose that =A0latest write. > by requiring f ACKs, you can tolerate f-1 failures. but then you don't > really need 2f+1 nodes in the ZK cluster, just f+1 is enough. > > > Thanks a lot > Yang >