hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Rosien <a...@rosien.net>
Subject Re: Sequence Number Generation With Zookeeper
Date Thu, 12 Aug 2010 01:17:35 GMT
Ah thanks, I forgot the "majority-commit" property because I also
forgot that all servers know what the cluster should look like, rather
than act adaptively (which wouldn't make sense after all).

.. Adam

On Wed, Aug 11, 2010 at 3:23 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Can't happen.
>
> In a network partition, the side without a quorum can't update the file
> version.
>
> On Wed, Aug 11, 2010 at 3:11 PM, Adam Rosien <adam@rosien.net> wrote:
>
>> What happens during a network partition and different clients are
>> incrementing "different" counters, and then the partition goes away?
>> Won't (potentially) the same sequence value be given out to two
>> clients?
>>
>> .. Adam
>>
>> On Thu, Aug 5, 2010 at 5:38 PM, Jonathan Holloway
>> <jonathan.holloway@gmail.com> wrote:
>> > Hi Ted,
>> >
>> > Thanks for the comments.
>> >
>> > I might have overlooked something here, but is it also possible to do the
>> > following:
>> >
>> > 1. Create a PERSISTENT node
>> > 2. Have multiple clients set the data on the node, e.g.  Stat stat =
>> > zookeeper.setData(SEQUENCE, ArrayUtils.EMPTY_BYTE_ARRAY, -1);
>> > 3. Use the version number from stat.getVersion() as the sequence
>> (obviously
>> > I'm limited to Integer.MAX_VALUE)
>> >
>> > Are there any weird race conditions involved here which would mean that a
>> > client would receive the wrong Stat object back?
>> >
>> > Many thanks again,
>> > Jon.
>> >
>> > On 5 August 2010 16:09, Ted Dunning <ted.dunning@gmail.com> wrote:
>> >
>> >> (b)
>> >>
>> >> BUT:
>> >>
>> >> Sequential numbering is a special case of "now".  In large diameters,
>> now
>> >> gets very expensive.  This is a special case of that assertion.  If
>> there
>> >> is
>> >> a way to get away from this presumption of the need for sequential
>> >> numbering, you will be miles better off.
>> >>
>> >> HOWEVER:
>> >>
>> >> ZK can do better than you suggest.  Incrementing a counter does involve
>> >> potential contention, but you will very likely be able to get to pretty
>> >> high
>> >> rates before the optimistic locking begins to fail.  If you code your
>> >> update
>> >> with a few tries at full speed followed by some form of retry back-off,
>> you
>> >> should get pretty close to the best possible performance.
>> >>
>> >> You might also try building a lock with an ephemeral file before
>> updating
>> >> the counter.  I would expect that this will be slower than the back-off
>> >> option if only because involves more transactions in ZK.  IF you wanted
>> to
>> >> get too complicated for your own good, you could have a secondary
>> strategy
>> >> flag that is only sampled by all clients every few seconds and is
>> updated
>> >> whenever a client needs to back-off more than say 5 steps.  If this flag
>> >> has
>> >> been updated recently, then clients should switch to the locking
>> protocol.
>> >>  You might even have several locks so that you don't exclude all other
>> >> updaters, merely thin them out a bit.  This flagged strategy would run
>> as
>> >> fast as optimistic locking as long as optimistic locking is fast and
>> then
>> >> would limit the total number of transactions needed under very high
>> load.
>> >>
>> >> On Thu, Aug 5, 2010 at 3:31 PM, Jonathan Holloway <
>> >> jonathan.holloway@gmail.com> wrote:
>> >>
>> >> > My so far involve:
>> >> > a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this
>> >> gives
>> >> > me the monotonically increasing number, but the sequence number isn't
>> >> > contiguous
>> >> > b) Storing the sequence number in the data portion of a persistent
>> node -
>> >> > then updating this (using the version number - aka optimistic
>> locking).
>> >> >  The
>> >> > problem with this is that under high load I'm assuming there'll be
a
>> lot
>> >> of
>> >> > contention and hence failures with regards to updates.
>> >> >
>> >>
>> >
>>
>

Mime
View raw message