Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (athena.apache.org: domain of rajkumar.w93@gmail.com
 designates 209.85.161.42 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type
         :content-transfer-encoding;
        b=B09UrsT5p52xiupv3ovjXF/0KOvafMFKrEXejkCAYf7qAv3fnoXz80PpQHcDQwFv0E
         jyJyXOn6I4+5InZVnRUu6GB5yC4JvQuH0UeyXEE1CpLSien1zqmuCkJO4Sj5bIn56OkJ
         URXgLxkLfYgGG2vxkT9g6qMximc2d/r0thXnc=
MIME-Version: 1.0
Sender: rajkumar.w93@gmail.com
In-Reply-To: <C9918290.2EDD4%aebaugh@real.com>
References: <AANLkTim27eiNVL6MHn6r7-ATE+ni3Q85SSGiVP7RUcaJ@mail.gmail.com>
	<C9918290.2EDD4%aebaugh@real.com>
Date: Tue, 1 Mar 2011 12:55:52 +0530
Message-ID: <AANLkTim3GWCJUReozeCWXbhLxZa5Od6C8jMTrhp7usYM@mail.gmail.com>
Subject: Re: Zookeeper for generating sequential IDs
From: Ertio Lew <ertiop93@gmail.com>
To: user@zookeeper.apache.org, jhodges@twitter.com
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks Andrew, however I would prefer to stay away from supercolumns
because of their well known limitations.

Regarding the snowflake I think I can make it useful for me by
limiting the currently 12 bits sequence no. to 8 bits and using the
saved up 4 bits to store the category of data. Thus I would be
reducing the theoritical limit of 4096 ids per millisecond per machine
to 256 ids per ms per machine. Sounds too good for my use case..

@Jeff,  Would you like to say something on this idea ??

Thank you all..
Ertio


On Tue, Mar 1, 2011 at 6:44 AM, Andrew Ebaugh <aebaugh@real.com> wrote:
> Getting a bit into Cassandra weeds, but what about using a super column
> and TimeUUIDType keys? IMO splitting data for one unique item into
> multiple manipulated keys sounds complex, and more what a super column wa=
s
> made for.
>
> So instead of having:
>
> TimeIDA-Name -> {name column data}
> TimeIDA-Blah -> {blah column data}
> TimdIDB-Name -> ...
>
> You'd have :
>
> TimeIDA ->
> =A0{
> =A0Name -> {name column data}
> =A0Blah -> {blah column data}
> =A0}
>
> TimeIDB ->
> =A0...
>
>
> This would give you the advantage of being able to query key slices based
> on time ranges.
> Here's a good article (seems a bit outdated for 0.7):
> http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
>
>
>
> On 2/28/11 9:50 AM, "Ertio Lew" <ertiop93@gmail.com> wrote:
>
>>Thanks Jeff !
>>
>>Your point is truly valid! However... even my idea is "not to store
>>information about the data/entities in the Id" but to split the
>>several data of an entity into several rows(according to category of
>>that data) in same CF in Cassandra.
>>So for e.g. if you want to split the information about a tweet in two
>>rows according to the 'type of information', then you want two keys
>>generated using the same ID.
>>
>>For this purpose you definitely need to have some kind of manipulation
>>required with your Ids. Or otherwise you cannot split the data for a
>>particular entity (in same CF) in two rows, according to data
>>category. Of course you can also suggest to store different types of
>>data in different CFs but sometimes it is more optimal to keep a limit
>>on the no of CFs in Cassandra.
>>
>>Regards
>>Ertio Lew
>>
>>
>>
>>On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges <jhodges@twitter.com> wrote=
:
>>> Also, feel free to mock me for the phrase "identifying id".
>>>
>>> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jhodges@twitter.com>
>>>wrote:
>>>> If you patch snowflake to remove 4 bits from the timestamp section,
>>>> you will take the time that it takes before the IDs generated overflow
>>>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
>>>> little over 4 years (2 ** 37 milliseconds). This is likely
>>>> unacceptable for your use case.
>>>>
>>>> However, the larger point to discuss is that encoding additional
>>>> information about your data in the identifying id is, in general, a
>>>> bad idea. It means your architecture is strictly coupled to your
>>>> current and likely less-than-perfect understanding of the problem and
>>>> makes it harder to iterate towards a better one. For instance, we had
>>>> to rewrite certain parts of our search infrastructure when migrating
>>>> to snowflake because it had assumed that the generated id space of
>>>> tweets was uniform across time.
>>>>
>>>> But, of course, I'm just some dude on the internet who doesn't know
>>>> your particular problem or design in detail. God speed and good luck.
>>>>
>>>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>>>>> Yes I think we could perhaps reduce the micro seconds precision
>>>>> provided by it(I think 41 bits) to an appropriate extent to match our
>>>>> needs.
>>>>>
>>>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <ted.dunning@gmail.com>
>>>>>wrote:
>>>>>> So patch it!
>>>>>>
>>>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <ertiop93@gmail.com>
>>>>>>wrote:
>>>>>>
>>>>>>> First that it does not start at 0 since it comprises timestamp,
>>>>>>> workerId and noOfGeneratedIds.
>>>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this
>>>>>>>ID
>>>>>>> then I risk* that it might overwrite the already existing ID create=
d
>>>>>>> by it.
>>>>>>>
>>>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <ted.dunning@gmail.com=
>
>>>>>>> wrote:
>>>>>>> > Uh.... any sequential generator that starts at zero will take a
>>>>>>>LONG time
>>>>>>> > until it generates a value > 2^60.
>>>>>>> >
>>>>>>> > If you generator a million id's per second (=3D 2^20) then it wil=
l
>>>>>>>be
>>>>>>> longer
>>>>>>> > than 30,000 years before you get past 2^60.
>>>>>>> >
>>>>>>> > Is this *really* a problem?
>>>>>>> >
>>>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <ertiop93@gmail.com>
>>>>>>>wrote:
>>>>>>> >
>>>>>>> >> Could you recommend any other ID generator that could help me
>>>>>>>with
>>>>>>> >> increasing Ids(not necessarily sequential) with size<=3D 60 bits=
 ?
>>>>>>> >>
>>>>>>> >> Thanks
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <ertiop93@gmail.com>
>>>>>>>wrote:
>>>>>>> >> > Thanks Patrick,
>>>>>>> >> >
>>>>>>> >> > I considered your suggestion. But sadly it could not fit my
>>>>>>>use case.
>>>>>>> >> > I am looking for a solution that could help me generate 64
>>>>>>>bits Ids
>>>>>>> >> > but in those 64 bits I would like atleast 4 free bits so that
>>>>>>>I could
>>>>>>> >> > manage with those free bits to distinguish the type of data
>>>>>>>for a
>>>>>>> >> > particular entity in the same columnfamily.
>>>>>>> >> >
>>>>>>> >> > If I could keep the snowflake's Id size to around 60 bits,
>>>>>>>that would
>>>>>>> >> > have been great..
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt
>>>>>>><phunt@apache.org>
>>>>>>> wrote:
>>>>>>> >> >> Keep in mind that blog post is pretty old. I see comments
>>>>>>>like this
>>>>>>> in
>>>>>>> >> >> the commit log
>>>>>>> >> >>
>>>>>>> >> >> "hard to call it alpha/experimental after serving billions of
>>>>>>>ids"
>>>>>>> >> >>
>>>>>>> >> >> so it seems it's in production at twitter at least...
>>>>>>> >> >>
>>>>>>> >> >> Patrick
>>>>>>> >> >>
>>>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew
>>>>>>><ertiop93@gmail.com>
>>>>>>> wrote:
>>>>>>> >> >>> Thanks Patrick,
>>>>>>> >> >>>
>>>>>>> >> >>> The fact that it is still in the alpha stage and twitter is
>>>>>>>not yet
>>>>>>> >> >>> using it, makes me look to other solutions as well, which
>>>>>>>have a
>>>>>>> large
>>>>>>> >> >>> community/users base & are more mature.
>>>>>>> >> >>>
>>>>>>> >> >>> I do not know much about the snowflake if it is being used i=
n
>>>>>>> >> >>> production by anyone ..
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt
>>>>>>><phunt@apache.org>
>>>>>>> >> wrote:
>>>>>>> >> >>>> Have you looked at snowflake?
>>>>>>> >> >>>>
>>>>>>> >> >>>>
>>>>>>>http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>>>>> >> >>>>
>>>>>>> >> >>>> Patrick
>>>>>>> >> >>>>
>>>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>>>>>> ted.dunning@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >> >>>>> If your id's don't need to be exactly sequential or if the
>>>>>>> generation
>>>>>>> >> rate
>>>>>>> >> >>>>> is less than a few thousand per second, ZK is a fine
>>>>>>>choice.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> To get very high generation rates, what is typically done
>>>>>>>is to
>>>>>>> >> allocate
>>>>>>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>>>>>>> locally.
>>>>>>> >> =A0This
>>>>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id
>>>>>>>space
>>>>>>> and
>>>>>>> >> it means
>>>>>>> >> >>>>> that the ordering of id's only approximates the time
>>>>>>>ordering of
>>>>>>> when
>>>>>>> >> the
>>>>>>> >> >>>>> id's were assigned. =A0Neither of these is typically a
>>>>>>>problem.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew
>>>>>>><ertiop93@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>> Hi all,
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> I am involved in a project where we're building a social
>>>>>>> application
>>>>>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution
>>>>>>>to
>>>>>>> generate
>>>>>>> >> >>>>>> unique sequential IDs for the content on the application.
>>>>>>>I have
>>>>>>> >> been
>>>>>>> >> >>>>>> suggested by some people to have a look =A0to Zookeeper f=
or
>>>>>>>this. I
>>>>>>> >> >>>>>> would highly appreciate if anyone can suggest if
>>>>>>>zookeeper is
>>>>>>> >> suitable
>>>>>>> >> >>>>>> for this purpose and any good resources to gain
>>>>>>>information about
>>>>>>> >> >>>>>> zookeeper.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Since the application is based on a eventually consistent
>>>>>>> >> distributed
>>>>>>> >> >>>>>> platform using Cassandra, we have felt a need to look
>>>>>>>over to
>>>>>>> other
>>>>>>> >> >>>>>> solutions instead of building our own using our DB.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Regards
>>>>>>> >> >>>>>> Ertio Lew.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>
>>>>>>> >> >>>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>