Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (nike.apache.org: domain of baskar.duraikannu@outlook.com
 designates 65.54.190.155 as permitted sender)
Message-ID: <BAY402-EAS17924C49309E1E5B65048E2EE570@phx.gbl>
Date: Wed, 31 Jul 2013 19:57:28 -0400
Subject: Re: Zookeeper performance
From: Baskar Duraikannu <baskar.duraikannu@outlook.com>
To: <user@zookeeper.apache.org>
MIME-Version: 1.0
Importance: normal
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"


Yes=2C I am coming to the realization that zookeeper may not be the right s=
olution. I might need to use the data store primitives to solve the issue.

Thanks

--- Original Message ---

From: "Ted Dunning" <ted.dunning@gmail.com>
Sent: July 31=2C 2013 7:15 PM
To: user@zookeeper.apache.org
Subject: Re: Zookeeper performance

Generally=2C ZK is much better as a coordination layer.

Starting with an expected transaction load well above the normal limits of
operation is not a grand idea.

Much better to do something simpler like have ZK coordinate shard masters
that each use conventional methods for handling transactions (see voltdb
for one approach to sharding well to allow each transaction to never span
shards).

Similarly=2C you can also shard and maintain version numbers=2C transaction
id's and an in-memory transaction table.  This allows multi-shard MVCC
commit semantics but can be a bit tricky to deal with transactions stalled
by dead nodes.

Using ZK for the raw transaction stream isn't a grand idea=2C however.


On Wed=2C Jul 31=2C 2013 at 4:05 PM=2C Henry Robinson <henry@cloudera.com> =
wrote:

> So how about the following optimistic approach:
>
> 1. Read the current version of the database (stored in a znode's version
> metadata). If it is even=2C wait and try again=3B even numbers mean someo=
ne is
> committing and the DB might be in an inconsistent state. Then read the
> state from the database your update will rely upon (user1.name=2C in this
> instance). You must also be able to atomically read the current version
> from the database as well as zookeeper=2C to ensure that the data is from=
 the
> version you think it is. If the DB version does not match the ZK version=
=2C
> restart.
> 2. Once an update is ready to commit=2C test-and-increment the current
> version in ZK to an even number=2C write your update to the DB=2C along w=
ith
> the eventual version of the data (the next odd number).
> 3. Increment the current version in ZK to an odd number.
>
> The even / odd distinction means that you can detect when someone else is
> updating the database=2C since otherwise there's no way to do so atomical=
ly
> with an update to ZK (so another transaction can't tell if you've finishe=
d
> your update or not=2C and so doesn't know when to wait until).
>
> The problem is failure - what happens if a client fails while it's writin=
g
> a transaction? Eventually someone can increment the transaction number=2C=
 and
> if you provide an 'undo' log before you make any changes=2C that client c=
an
> possibly recover from a partial commit. But at this point you need to
> understand your application's requirements in much more detail than we do
> to make recommendations.
>
> In particular=2C your storage layer may offer sufficiently powerful
> primitives such that you don't need ZK=3B although if it's a filesystem t=
hen
> that probably isn't true.
>
> Henry
>
>
> On 31 July 2013 15:51=2C Baskar Duraikannu <baskar.duraikannu@outlook.com
> >wrote:
>
> > We cannot always resolve conflicts ourselves. For example=2C let us say
> that
> > a) user1 changed the name from 'Kathy' to Katherineb) user2 changes the
> > name from 'Kathy' to 'Kat'
> > Both read 'Kathy' as input=3B user1's update succeeded. If we need to l=
et
> > user2 know that something has changed as this may result in the user no=
t
> > changing 'Kathy' to 'Kat' (as an example).
> > Hope this explains
> >
> > > Date: Wed=2C 31 Jul 2013 07:49:39 -0400
> > > Subject: Re: Zookeeper performance
> > > From: camille@apache.org
> > > To: user@zookeeper.apache.org
> > >
> > > This sounds highly error prone to me regardless of whether or not
> > zookeeper
> > > can handle the load-. Why not just use a standard transaction model
> with
> > a
> > > vector clock or other timing device to detect conflicts so you don't
> have
> > > to worry about a second server to talk to (zookeeper) to do an update=
?
> > > On Jul 31=2C 2013 7:17 AM=2C "Baskar Duraikannu" <
> > baskar.duraikannu@outlook.com>
> > > wrote:
> > >
> > > > Hello
> > > >
> > > > We are looking to use zookeeper for optimistic concurrency. Basical=
ly
> > when
> > > > the user saves data on a screen=2C we need to lock=2C  read to ensu=
re
> that
> > no
> > > > one else has changed the row while user is editing data=2C persist =
data
> > and
> > > > unlock znode.
> > > >
> > > > If the app/thread does not get a lock=2C we may set a watch so that
> > polling
> > > > is avoided.
> > > >
> > > > Our application is write intensive certain times of the day. We may
> get
> > > > about 100k requests per second.  Can zookeeper handle this volume?
> >
> >
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>