Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 661C0C5DA for ; Sat, 15 Nov 2014 15:39:43 +0000 (UTC) Received: (qmail 63932 invoked by uid 500); 15 Nov 2014 15:39:42 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 63890 invoked by uid 500); 15 Nov 2014 15:39:42 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 63878 invoked by uid 99); 15 Nov 2014 15:39:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Nov 2014 15:39:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of devaki.vamsi@gmail.com designates 209.85.192.50 as permitted sender) Received: from [209.85.192.50] (HELO mail-qg0-f50.google.com) (209.85.192.50) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Nov 2014 15:39:38 +0000 Received: by mail-qg0-f50.google.com with SMTP id e89so2383334qgf.23 for ; Sat, 15 Nov 2014 07:38:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=iRJ6pjiBxc9Hx4j6DnFDIuVa12gQfV88dTSWAHu72Mo=; b=eY0xOphcYRqCNFou98CBdDuQwl56+gIegJ1C4Favl1GnF0FCGrHQUgMm5uxKl9MZVt kzSHzTaJCywdqGJ46/ViuzLvJ19hRnKq16CzB0QSw/j5QBFquwrjWeIFL5G0EUjSv4vO aiIvBN30tl8MHCJRLZf+3nUA5kmr2noUoE4w80rdhcReWImPyzVtAf48NhUckFEFsR15 E46SRpPJGbs6rqCRJvXJmiV5aX08CaAxxKrH4X3P1mF+USOPsdQ3S8F1Yh8pv1ihIB60 tCyOsiK/LChLy9MLVX2YyJFWWTUUUOhxVC1YHNPJZR7eez3nuE48jfvJRaTdow6qh8Ru hmpQ== X-Received: by 10.140.41.71 with SMTP id y65mr19598571qgy.64.1416065912556; Sat, 15 Nov 2014 07:38:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.18.170 with HTTP; Sat, 15 Nov 2014 07:38:12 -0800 (PST) In-Reply-To: <54672442.4030400@devoteam.com> References: <32afb2d6.237d.149b15a530e.Coremail.rchzzjcn@163.com> <54672442.4030400@devoteam.com> From: Vamsi Devaki Date: Sat, 15 Nov 2014 07:38:12 -0800 Message-ID: Subject: Re: A suggestion about the design for znode version in ZooKeeper To: user Content-Type: multipart/alternative; boundary=001a11c135e8b252460507e789dd X-Virus-Checked: Checked by ClamAV on apache.org --001a11c135e8b252460507e789dd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Robin, One way to work with the situation is to use multi / transaction API. You can check the version of the parent and operate on child nodes atomically. A quick explanation can be found at - http://tdunning.blogspot.com/2011/06/tour-of-multi-update-for-zookeeper.htm= l Regards, Vamsi On Sat, Nov 15, 2014 at 2:00 AM, "J=C3=BCrgen Wagner (DVT)" < juergen.wagner@devoteam.com> wrote: > Zookeeper uses an optimistic appoach in this case. The "problem" will > only occur if you simply use the optimistic mode in your application as > well. > > So, you have to implement a pessimistic version, i.e., create a lock and > then perform the update or guarantee otherwise that the required operatio= ns > will be atomic. In that case, you can guarantee that nobody will delete t= he > node while you're busy with the update. > > Cheers, > --J=C3=BCrgen > > > > On 15.11.2014 10:25, Ivan Kelly wrote: > > another option would be to start the znode id at the znode id of the > parent znode which will be different between each deletion and > creation of child nodes. One problem with this though (apart from > being limited to 2^31 bits), is that the api doesn't have any way to > return the initial znode version on creation. Fixing this, in a > backward-compatible, non-ugly way would be hard I think. > > -Ivan > > On 15 November 2014 03:48, Robin wr= ote: > > Hi zookeepers, > > When I dig into ZooKeeper's internals, I have learned the following flaw = about znode version in ZooKeeper: znode's version will be reset when znode = is deleted/re-created. This is a trap for some operations which make update= s based on znode version. > > Let's see an example: a client gets the data of a znode (e.g, /test) an= d version(e.g, 1), change the data of the znode, and writes it back with th= e condition that the version does not change (still be 1). If another clien= t deletes and re-creates this znode during the first client is updating the= data, the version matches, but it now contains the wrong data. > > The problem I can see is that the znode version is designed to be a monot= onically increasing integer. If we can include the birth-date(timestamp) of= the znode or zxid for the creation of the znode as part of the znode's ver= sion, and only the integer part of the version will increase every time whe= n the znode is updated, while keeping the birth-date or zxid part of the ve= rsion not change, we can avoid the problem. > > Of course, there will be some cost for the new design: it needs bigger si= ze for the version field. > > Thanks, > - Robin > > > > -- > > Mit freundlichen Gr=C3=BC=C3=9Fen/Kind regards/Cordialement v=C3=B4tre/At= entamente/=D0=A1 > =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC > *i.A. J=C3=BCrgen Wagner* > Head of Competence Center "Intelligence" > & Senior Cloud Consultant > > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 > E-Mail: juergen.wagner@devoteam.com, URL: www.devoteam.de > ------------------------------ > Managing Board: J=C3=BCrgen Hatzipantelis (CEO) > Address of Record: 64331 Weiterstadt, Germany; Commercial Register: > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 > > > --=20 Vamsi --001a11c135e8b252460507e789dd--