Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 268006884 for ; Thu, 21 Jul 2011 22:10:46 +0000 (UTC) Received: (qmail 94572 invoked by uid 500); 21 Jul 2011 22:10:45 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 94483 invoked by uid 500); 21 Jul 2011 22:10:45 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 94474 invoked by uid 99); 21 Jul 2011 22:10:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 22:10:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.216.177 as permitted sender) Received: from [209.85.216.177] (HELO mail-qy0-f177.google.com) (209.85.216.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 22:10:40 +0000 Received: by qyk7 with SMTP id 7so982234qyk.15 for ; Thu, 21 Jul 2011 15:10:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=IehcXG7N0B/34+RpWsGY1eybZo11mzCpEqyPjnA1YKA=; b=hSyP1w0c0jzEZFFSELsO+V3UCxhCyOiufhc1k5CCZNJDDkO3NAmoPgrTDIXx/NO7r3 NED/fxJ31viRJxqOtxbyN3lYyr0rl+aiEJFQDZ1w83ZwFMY9k6rmrcHRZtYXoy7BMR8p VmA/2EHRO5wfZKBhf0b8/zEpjHPa0bONtT5Mk= Received: by 10.224.218.9 with SMTP id ho9mr738532qab.336.1311286218092; Thu, 21 Jul 2011 15:10:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.54.15 with HTTP; Thu, 21 Jul 2011 15:09:58 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Thu, 21 Jul 2011 15:09:58 -0700 Message-ID: Subject: Re: what would happen with this case ? (ZAB protocol question) To: user@zookeeper.apache.org Cc: Yang Content-Type: multipart/alternative; boundary=20cf3005df223a6f4d04a89b9d7b --20cf3005df223a6f4d04a89b9d7b Content-Type: text/plain; charset=UTF-8 I think the message ordering constraints combined with the quorum deal with this situation. On Thu, Jul 21, 2011 at 1:42 PM, Alexander Shraer wrote: > Hi Ted, > > In your scenario there is no problem I can see. The problem is in another > scenario I described in the JIRA - there C has seen more proposals than B > but B has seen more commits than C. When leader election happens (and > assuming they don't restart beforehand), B will be elected as leader and not > C, which is a problem because C's suffix of transactions which were acked by > both A and C will be truncated. > > Alex > > > -----Original Message----- > > From: Ted Dunning [mailto:ted.dunning@gmail.com] > > Sent: Thursday, July 21, 2011 1:25 PM > > To: user@zookeeper.apache.org > > Cc: Yang > > Subject: Re: what would happen with this case ? (ZAB protocol question) > > > > Alex, > > > > Are you sure that this is a bug. > > > > Take the case of three servers A, B and C with A being leader. > > > > If transactions 1, 2 and 3 are committed, then a majority of the nodes, > > including at least A, must have seen these transactions. Moreover, > > transactions cannot be committed on a node unless all previous > transactions > > have been seen on that node as well. Thus, by symmetry, we can consider > > cases where B alone committed these transactions or where B and C > committed > > them. Only the first case is problematic. > > > > Now, assume further that transaction 4 has arrived at B and been > forwarded > > to A but neither B nor C have committed to it. > > > > The situation now is that in this first epoch, A has seen 1-4, B has seen > > 1-3 and C has seen nothing. At least two nodes know the current epoch > > because we obviously have a quorum and we know that B knows the current > > epoch because it has seen transactions from this epoch. Thus the > collection > > of machines that know the current epoch can be A+B or A+B+C. > > > > IF all three nodes now die simultaneously and B and C come back up, the > > question is what will happen. We know that the two nodes will agree on > the > > epoch because at least B has the last epoch. Node B will be elected > leader > > because it has seen later transactions than C. C will now get the > > transactions and we have a quorum in a new epoch. > > > > If A returns at this point, it will know about transactions 1, 2, 3 and > 4. > > Further, it will know that 1, 2, and 3 have been committed in the first > > epoch and that 4 was proposed, but never committed. As it joins, it will > > find that a new epoch has started and will recognize B as master. B will > > tell it to truncate the log by deleting 4, but 4 was never committed > anyway. > > > > Where is the problem? > > > > On Thu, Jul 21, 2011 at 1:11 PM, Alexander Shraer > inc.com>wrote: > > > > > The problem is in leader election - if the server doesn't reboot before > > > running leader election (the usual case) then only the transactions > for > > > which it received a commit count and it might not be elected leader, > even if > > > it has seen more transactions than the others. This may lead to > transactions > > > being dropped. > > > > > > I opened a JIRA for this. > > > > --20cf3005df223a6f4d04a89b9d7b--