zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ebaugh <aeba...@real.com>
Subject Re: Zookeeper for generating sequential IDs
Date Tue, 01 Mar 2011 01:14:53 GMT
Getting a bit into Cassandra weeds, but what about using a super column
and TimeUUIDType keys? IMO splitting data for one unique item into
multiple manipulated keys sounds complex, and more what a super column was
made for.

So instead of having:

TimeIDA-Name -> {name column data}
TimeIDA-Blah -> {blah column data}
TimdIDB-Name -> ...

You'd have :

TimeIDA ->
 {
  Name -> {name column data}
  Blah -> {blah column data}
 }

TimeIDB ->
  ...


This would give you the advantage of being able to query key slices based
on time ranges.
Here's a good article (seems a bit outdated for 0.7):
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model



On 2/28/11 9:50 AM, "Ertio Lew" <ertiop93@gmail.com> wrote:

>Thanks Jeff !
>
>Your point is truly valid! However... even my idea is "not to store
>information about the data/entities in the Id" but to split the
>several data of an entity into several rows(according to category of
>that data) in same CF in Cassandra.
>So for e.g. if you want to split the information about a tweet in two
>rows according to the 'type of information', then you want two keys
>generated using the same ID.
>
>For this purpose you definitely need to have some kind of manipulation
>required with your Ids. Or otherwise you cannot split the data for a
>particular entity (in same CF) in two rows, according to data
>category. Of course you can also suggest to store different types of
>data in different CFs but sometimes it is more optimal to keep a limit
>on the no of CFs in Cassandra.
>
>Regards
>Ertio Lew
>
>
>
>On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges <jhodges@twitter.com> wrote:
>> Also, feel free to mock me for the phrase "identifying id".
>>
>> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jhodges@twitter.com>
>>wrote:
>>> If you patch snowflake to remove 4 bits from the timestamp section,
>>> you will take the time that it takes before the IDs generated overflow
>>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
>>> little over 4 years (2 ** 37 milliseconds). This is likely
>>> unacceptable for your use case.
>>>
>>> However, the larger point to discuss is that encoding additional
>>> information about your data in the identifying id is, in general, a
>>> bad idea. It means your architecture is strictly coupled to your
>>> current and likely less-than-perfect understanding of the problem and
>>> makes it harder to iterate towards a better one. For instance, we had
>>> to rewrite certain parts of our search infrastructure when migrating
>>> to snowflake because it had assumed that the generated id space of
>>> tweets was uniform across time.
>>>
>>> But, of course, I'm just some dude on the internet who doesn't know
>>> your particular problem or design in detail. God speed and good luck.
>>>
>>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>>>> Yes I think we could perhaps reduce the micro seconds precision
>>>> provided by it(I think 41 bits) to an appropriate extent to match our
>>>> needs.
>>>>
>>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <ted.dunning@gmail.com>
>>>>wrote:
>>>>> So patch it!
>>>>>
>>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <ertiop93@gmail.com>
>>>>>wrote:
>>>>>
>>>>>> First that it does not start at 0 since it comprises timestamp,
>>>>>> workerId and noOfGeneratedIds.
>>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this
>>>>>>ID
>>>>>> then I risk* that it might overwrite the already existing ID created
>>>>>> by it.
>>>>>>
>>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <ted.dunning@gmail.com>
>>>>>> wrote:
>>>>>> > Uh.... any sequential generator that starts at zero will take
a
>>>>>>LONG time
>>>>>> > until it generates a value > 2^60.
>>>>>> >
>>>>>> > If you generator a million id's per second (= 2^20) then it
will
>>>>>>be
>>>>>> longer
>>>>>> > than 30,000 years before you get past 2^60.
>>>>>> >
>>>>>> > Is this *really* a problem?
>>>>>> >
>>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <ertiop93@gmail.com>
>>>>>>wrote:
>>>>>> >
>>>>>> >> Could you recommend any other ID generator that could help
me
>>>>>>with
>>>>>> >> increasing Ids(not necessarily sequential) with size<=
60 bits ?
>>>>>> >>
>>>>>> >> Thanks
>>>>>> >>
>>>>>> >>
>>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <ertiop93@gmail.com>
>>>>>>wrote:
>>>>>> >> > Thanks Patrick,
>>>>>> >> >
>>>>>> >> > I considered your suggestion. But sadly it could not
fit my
>>>>>>use case.
>>>>>> >> > I am looking for a solution that could help me generate
64
>>>>>>bits Ids
>>>>>> >> > but in those 64 bits I would like atleast 4 free bits
so that
>>>>>>I could
>>>>>> >> > manage with those free bits to distinguish the type
of data
>>>>>>for a
>>>>>> >> > particular entity in the same columnfamily.
>>>>>> >> >
>>>>>> >> > If I could keep the snowflake's Id size to around 60
bits,
>>>>>>that would
>>>>>> >> > have been great..
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt
>>>>>><phunt@apache.org>
>>>>>> wrote:
>>>>>> >> >> Keep in mind that blog post is pretty old. I see
comments
>>>>>>like this
>>>>>> in
>>>>>> >> >> the commit log
>>>>>> >> >>
>>>>>> >> >> "hard to call it alpha/experimental after serving
billions of
>>>>>>ids"
>>>>>> >> >>
>>>>>> >> >> so it seems it's in production at twitter at least...
>>>>>> >> >>
>>>>>> >> >> Patrick
>>>>>> >> >>
>>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew
>>>>>><ertiop93@gmail.com>
>>>>>> wrote:
>>>>>> >> >>> Thanks Patrick,
>>>>>> >> >>>
>>>>>> >> >>> The fact that it is still in the alpha stage
and twitter is
>>>>>>not yet
>>>>>> >> >>> using it, makes me look to other solutions
as well, which
>>>>>>have a
>>>>>> large
>>>>>> >> >>> community/users base & are more mature.
>>>>>> >> >>>
>>>>>> >> >>> I do not know much about the snowflake if it
is being used in
>>>>>> >> >>> production by anyone ..
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt
>>>>>><phunt@apache.org>
>>>>>> >> wrote:
>>>>>> >> >>>> Have you looked at snowflake?
>>>>>> >> >>>>
>>>>>> >> >>>> 
>>>>>>http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>>>> >> >>>>
>>>>>> >> >>>> Patrick
>>>>>> >> >>>>
>>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning
<
>>>>>> ted.dunning@gmail.com>
>>>>>> >> wrote:
>>>>>> >> >>>>> If your id's don't need to be exactly
sequential or if the
>>>>>> generation
>>>>>> >> rate
>>>>>> >> >>>>> is less than a few thousand per second,
ZK is a fine
>>>>>>choice.
>>>>>> >> >>>>>
>>>>>> >> >>>>> To get very high generation rates,
what is typically done
>>>>>>is to
>>>>>> >> allocate
>>>>>> >> >>>>> blocks of id's using ZK and then allocate
out of the block
>>>>>> locally.
>>>>>> >>  This
>>>>>> >> >>>>> can cause you to wind up with a slightly
swiss-cheesed id
>>>>>>space
>>>>>> and
>>>>>> >> it means
>>>>>> >> >>>>> that the ordering of id's only approximates
the time
>>>>>>ordering of
>>>>>> when
>>>>>> >> the
>>>>>> >> >>>>> id's were assigned.  Neither of these
is typically a
>>>>>>problem.
>>>>>> >> >>>>>
>>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio
Lew
>>>>>><ertiop93@gmail.com>
>>>>>> >> wrote:
>>>>>> >> >>>>>
>>>>>> >> >>>>>> Hi all,
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> I am involved in a project where
we're building a social
>>>>>> application
>>>>>> >> >>>>>> using Cassandra DB and Java. I
am looking for a solution
>>>>>>to
>>>>>> generate
>>>>>> >> >>>>>> unique sequential IDs for the content
on the application.
>>>>>>I have
>>>>>> >> been
>>>>>> >> >>>>>> suggested by some people to have
a look  to Zookeeper for
>>>>>>this. I
>>>>>> >> >>>>>> would highly appreciate if anyone
can suggest if
>>>>>>zookeeper is
>>>>>> >> suitable
>>>>>> >> >>>>>> for this purpose and any good resources
to gain
>>>>>>information about
>>>>>> >> >>>>>> zookeeper.
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Since the application is based
on a eventually consistent
>>>>>> >> distributed
>>>>>> >> >>>>>> platform using Cassandra, we have
felt a need to look
>>>>>>over to
>>>>>> other
>>>>>> >> >>>>>> solutions instead of building our
own using our DB.
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Any kind of comments, suggestions
are highly welcomed! :)
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Regards
>>>>>> >> >>>>>> Ertio Lew.
>>>>>> >> >>>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>
>>>>>> >> >>>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>
>>>
>>


Mime
View raw message