Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2766E10770 for ; Wed, 31 Jul 2013 23:57:59 +0000 (UTC) Received: (qmail 36415 invoked by uid 500); 31 Jul 2013 23:57:58 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 36385 invoked by uid 500); 31 Jul 2013 23:57:58 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 36377 invoked by uid 99); 31 Jul 2013 23:57:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 23:57:58 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of baskar.duraikannu@outlook.com designates 65.54.190.155 as permitted sender) Received: from [65.54.190.155] (HELO bay0-omc3-s17.bay0.hotmail.com) (65.54.190.155) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 23:57:52 +0000 Received: from BAY402-EAS179 ([65.54.190.187]) by bay0-omc3-s17.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 31 Jul 2013 16:57:30 -0700 X-TMN: [cfOJIam6tH/EGKrKMjhB//U/N7xDxdm6] X-Originating-Email: [baskar.duraikannu@outlook.com] Message-ID: Date: Wed, 31 Jul 2013 19:57:28 -0400 Subject: Re: Zookeeper performance From: Baskar Duraikannu To: MIME-Version: 1.0 Importance: normal Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-OriginalArrivalTime: 31 Jul 2013 23:57:30.0272 (UTC) FILETIME=[B7805200:01CE8E49] X-Virus-Checked: Checked by ClamAV on apache.org Yes=2C I am coming to the realization that zookeeper may not be the right s= olution. I might need to use the data store primitives to solve the issue. Thanks --- Original Message --- From: "Ted Dunning" Sent: July 31=2C 2013 7:15 PM To: user@zookeeper.apache.org Subject: Re: Zookeeper performance Generally=2C ZK is much better as a coordination layer. Starting with an expected transaction load well above the normal limits of operation is not a grand idea. Much better to do something simpler like have ZK coordinate shard masters that each use conventional methods for handling transactions (see voltdb for one approach to sharding well to allow each transaction to never span shards). Similarly=2C you can also shard and maintain version numbers=2C transaction id's and an in-memory transaction table. This allows multi-shard MVCC commit semantics but can be a bit tricky to deal with transactions stalled by dead nodes. Using ZK for the raw transaction stream isn't a grand idea=2C however. On Wed=2C Jul 31=2C 2013 at 4:05 PM=2C Henry Robinson = wrote: > So how about the following optimistic approach: > > 1. Read the current version of the database (stored in a znode's version > metadata). If it is even=2C wait and try again=3B even numbers mean someo= ne is > committing and the DB might be in an inconsistent state. Then read the > state from the database your update will rely upon (user1.name=2C in this > instance). You must also be able to atomically read the current version > from the database as well as zookeeper=2C to ensure that the data is from= the > version you think it is. If the DB version does not match the ZK version= =2C > restart. > 2. Once an update is ready to commit=2C test-and-increment the current > version in ZK to an even number=2C write your update to the DB=2C along w= ith > the eventual version of the data (the next odd number). > 3. Increment the current version in ZK to an odd number. > > The even / odd distinction means that you can detect when someone else is > updating the database=2C since otherwise there's no way to do so atomical= ly > with an update to ZK (so another transaction can't tell if you've finishe= d > your update or not=2C and so doesn't know when to wait until). > > The problem is failure - what happens if a client fails while it's writin= g > a transaction? Eventually someone can increment the transaction number=2C= and > if you provide an 'undo' log before you make any changes=2C that client c= an > possibly recover from a partial commit. But at this point you need to > understand your application's requirements in much more detail than we do > to make recommendations. > > In particular=2C your storage layer may offer sufficiently powerful > primitives such that you don't need ZK=3B although if it's a filesystem t= hen > that probably isn't true. > > Henry > > > On 31 July 2013 15:51=2C Baskar Duraikannu >wrote: > > > We cannot always resolve conflicts ourselves. For example=2C let us say > that > > a) user1 changed the name from 'Kathy' to Katherineb) user2 changes the > > name from 'Kathy' to 'Kat' > > Both read 'Kathy' as input=3B user1's update succeeded. If we need to l= et > > user2 know that something has changed as this may result in the user no= t > > changing 'Kathy' to 'Kat' (as an example). > > Hope this explains > > > > > Date: Wed=2C 31 Jul 2013 07:49:39 -0400 > > > Subject: Re: Zookeeper performance > > > From: camille@apache.org > > > To: user@zookeeper.apache.org > > > > > > This sounds highly error prone to me regardless of whether or not > > zookeeper > > > can handle the load-. Why not just use a standard transaction model > with > > a > > > vector clock or other timing device to detect conflicts so you don't > have > > > to worry about a second server to talk to (zookeeper) to do an update= ? > > > On Jul 31=2C 2013 7:17 AM=2C "Baskar Duraikannu" < > > baskar.duraikannu@outlook.com> > > > wrote: > > > > > > > Hello > > > > > > > > We are looking to use zookeeper for optimistic concurrency. Basical= ly > > when > > > > the user saves data on a screen=2C we need to lock=2C read to ensu= re > that > > no > > > > one else has changed the row while user is editing data=2C persist = data > > and > > > > unlock znode. > > > > > > > > If the app/thread does not get a lock=2C we may set a watch so that > > polling > > > > is avoided. > > > > > > > > Our application is write intensive certain times of the day. We may > get > > > > about 100k requests per second. Can zookeeper handle this volume? > > > > > > > > -- > Henry Robinson > Software Engineer > Cloudera > 415-994-6679 >