zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ertio Lew <ertio...@gmail.com>
Subject Re: Zookeeper for generating sequential IDs
Date Tue, 01 Mar 2011 07:25:52 GMT
Thanks Andrew, however I would prefer to stay away from supercolumns
because of their well known limitations.

Regarding the snowflake I think I can make it useful for me by
limiting the currently 12 bits sequence no. to 8 bits and using the
saved up 4 bits to store the category of data. Thus I would be
reducing the theoritical limit of 4096 ids per millisecond per machine
to 256 ids per ms per machine. Sounds too good for my use case..

@Jeff,  Would you like to say something on this idea ??

Thank you all..
Ertio



On Tue, Mar 1, 2011 at 6:44 AM, Andrew Ebaugh <aebaugh@real.com> wrote:
> Getting a bit into Cassandra weeds, but what about using a super column
> and TimeUUIDType keys? IMO splitting data for one unique item into
> multiple manipulated keys sounds complex, and more what a super column was
> made for.
>
> So instead of having:
>
> TimeIDA-Name -> {name column data}
> TimeIDA-Blah -> {blah column data}
> TimdIDB-Name -> ...
>
> You'd have :
>
> TimeIDA ->
>  {
>  Name -> {name column data}
>  Blah -> {blah column data}
>  }
>
> TimeIDB ->
>  ...
>
>
> This would give you the advantage of being able to query key slices based
> on time ranges.
> Here's a good article (seems a bit outdated for 0.7):
> http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
>
>
>
> On 2/28/11 9:50 AM, "Ertio Lew" <ertiop93@gmail.com> wrote:
>
>>Thanks Jeff !
>>
>>Your point is truly valid! However... even my idea is "not to store
>>information about the data/entities in the Id" but to split the
>>several data of an entity into several rows(according to category of
>>that data) in same CF in Cassandra.
>>So for e.g. if you want to split the information about a tweet in two
>>rows according to the 'type of information', then you want two keys
>>generated using the same ID.
>>
>>For this purpose you definitely need to have some kind of manipulation
>>required with your Ids. Or otherwise you cannot split the data for a
>>particular entity (in same CF) in two rows, according to data
>>category. Of course you can also suggest to store different types of
>>data in different CFs but sometimes it is more optimal to keep a limit
>>on the no of CFs in Cassandra.
>>
>>Regards
>>Ertio Lew
>>
>>
>>
>>On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges <jhodges@twitter.com> wrote:
>>> Also, feel free to mock me for the phrase "identifying id".
>>>
>>> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jhodges@twitter.com>
>>>wrote:
>>>> If you patch snowflake to remove 4 bits from the timestamp section,
>>>> you will take the time that it takes before the IDs generated overflow
>>>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
>>>> little over 4 years (2 ** 37 milliseconds). This is likely
>>>> unacceptable for your use case.
>>>>
>>>> However, the larger point to discuss is that encoding additional
>>>> information about your data in the identifying id is, in general, a
>>>> bad idea. It means your architecture is strictly coupled to your
>>>> current and likely less-than-perfect understanding of the problem and
>>>> makes it harder to iterate towards a better one. For instance, we had
>>>> to rewrite certain parts of our search infrastructure when migrating
>>>> to snowflake because it had assumed that the generated id space of
>>>> tweets was uniform across time.
>>>>
>>>> But, of course, I'm just some dude on the internet who doesn't know
>>>> your particular problem or design in detail. God speed and good luck.
>>>>
>>>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>>>>> Yes I think we could perhaps reduce the micro seconds precision
>>>>> provided by it(I think 41 bits) to an appropriate extent to match our
>>>>> needs.
>>>>>
>>>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <ted.dunning@gmail.com>
>>>>>wrote:
>>>>>> So patch it!
>>>>>>
>>>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <ertiop93@gmail.com>
>>>>>>wrote:
>>>>>>
>>>>>>> First that it does not start at 0 since it comprises timestamp,
>>>>>>> workerId and noOfGeneratedIds.
>>>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into
this
>>>>>>>ID
>>>>>>> then I risk* that it might overwrite the already existing ID
created
>>>>>>> by it.
>>>>>>>
>>>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <ted.dunning@gmail.com>
>>>>>>> wrote:
>>>>>>> > Uh.... any sequential generator that starts at zero will
take a
>>>>>>>LONG time
>>>>>>> > until it generates a value > 2^60.
>>>>>>> >
>>>>>>> > If you generator a million id's per second (= 2^20) then
it will
>>>>>>>be
>>>>>>> longer
>>>>>>> > than 30,000 years before you get past 2^60.
>>>>>>> >
>>>>>>> > Is this *really* a problem?
>>>>>>> >
>>>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <ertiop93@gmail.com>
>>>>>>>wrote:
>>>>>>> >
>>>>>>> >> Could you recommend any other ID generator that could
help me
>>>>>>>with
>>>>>>> >> increasing Ids(not necessarily sequential) with size<=
60 bits ?
>>>>>>> >>
>>>>>>> >> Thanks
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <ertiop93@gmail.com>
>>>>>>>wrote:
>>>>>>> >> > Thanks Patrick,
>>>>>>> >> >
>>>>>>> >> > I considered your suggestion. But sadly it could
not fit my
>>>>>>>use case.
>>>>>>> >> > I am looking for a solution that could help me
generate 64
>>>>>>>bits Ids
>>>>>>> >> > but in those 64 bits I would like atleast 4 free
bits so that
>>>>>>>I could
>>>>>>> >> > manage with those free bits to distinguish the
type of data
>>>>>>>for a
>>>>>>> >> > particular entity in the same columnfamily.
>>>>>>> >> >
>>>>>>> >> > If I could keep the snowflake's Id size to around
60 bits,
>>>>>>>that would
>>>>>>> >> > have been great..
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt
>>>>>>><phunt@apache.org>
>>>>>>> wrote:
>>>>>>> >> >> Keep in mind that blog post is pretty old.
I see comments
>>>>>>>like this
>>>>>>> in
>>>>>>> >> >> the commit log
>>>>>>> >> >>
>>>>>>> >> >> "hard to call it alpha/experimental after serving
billions of
>>>>>>>ids"
>>>>>>> >> >>
>>>>>>> >> >> so it seems it's in production at twitter at
least...
>>>>>>> >> >>
>>>>>>> >> >> Patrick
>>>>>>> >> >>
>>>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew
>>>>>>><ertiop93@gmail.com>
>>>>>>> wrote:
>>>>>>> >> >>> Thanks Patrick,
>>>>>>> >> >>>
>>>>>>> >> >>> The fact that it is still in the alpha
stage and twitter is
>>>>>>>not yet
>>>>>>> >> >>> using it, makes me look to other solutions
as well, which
>>>>>>>have a
>>>>>>> large
>>>>>>> >> >>> community/users base & are more mature.
>>>>>>> >> >>>
>>>>>>> >> >>> I do not know much about the snowflake
if it is being used in
>>>>>>> >> >>> production by anyone ..
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick
Hunt
>>>>>>><phunt@apache.org>
>>>>>>> >> wrote:
>>>>>>> >> >>>> Have you looked at snowflake?
>>>>>>> >> >>>>
>>>>>>> >> >>>>
>>>>>>>http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>>>>> >> >>>>
>>>>>>> >> >>>> Patrick
>>>>>>> >> >>>>
>>>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted
Dunning <
>>>>>>> ted.dunning@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >> >>>>> If your id's don't need to be exactly
sequential or if the
>>>>>>> generation
>>>>>>> >> rate
>>>>>>> >> >>>>> is less than a few thousand per
second, ZK is a fine
>>>>>>>choice.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> To get very high generation rates,
what is typically done
>>>>>>>is to
>>>>>>> >> allocate
>>>>>>> >> >>>>> blocks of id's using ZK and then
allocate out of the block
>>>>>>> locally.
>>>>>>> >>  This
>>>>>>> >> >>>>> can cause you to wind up with a
slightly swiss-cheesed id
>>>>>>>space
>>>>>>> and
>>>>>>> >> it means
>>>>>>> >> >>>>> that the ordering of id's only
approximates the time
>>>>>>>ordering of
>>>>>>> when
>>>>>>> >> the
>>>>>>> >> >>>>> id's were assigned.  Neither of
these is typically a
>>>>>>>problem.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM,
Ertio Lew
>>>>>>><ertiop93@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>> Hi all,
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> I am involved in a project
where we're building a social
>>>>>>> application
>>>>>>> >> >>>>>> using Cassandra DB and Java.
I am looking for a solution
>>>>>>>to
>>>>>>> generate
>>>>>>> >> >>>>>> unique sequential IDs for the
content on the application.
>>>>>>>I have
>>>>>>> >> been
>>>>>>> >> >>>>>> suggested by some people to
have a look  to Zookeeper for
>>>>>>>this. I
>>>>>>> >> >>>>>> would highly appreciate if
anyone can suggest if
>>>>>>>zookeeper is
>>>>>>> >> suitable
>>>>>>> >> >>>>>> for this purpose and any good
resources to gain
>>>>>>>information about
>>>>>>> >> >>>>>> zookeeper.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Since the application is based
on a eventually consistent
>>>>>>> >> distributed
>>>>>>> >> >>>>>> platform using Cassandra, we
have felt a need to look
>>>>>>>over to
>>>>>>> other
>>>>>>> >> >>>>>> solutions instead of building
our own using our DB.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Any kind of comments, suggestions
are highly welcomed! :)
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Regards
>>>>>>> >> >>>>>> Ertio Lew.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>
>>>>>>> >> >>>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>

Mime
View raw message