zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ertio Lew <ertio...@gmail.com>
Subject Re: Zookeeper for generating sequential IDs
Date Mon, 28 Feb 2011 17:50:51 GMT
Thanks Jeff !

Your point is truly valid! However... even my idea is "not to store
information about the data/entities in the Id" but to split the
several data of an entity into several rows(according to category of
that data) in same CF in Cassandra.
So for e.g. if you want to split the information about a tweet in two
rows according to the 'type of information', then you want two keys
generated using the same ID.

For this purpose you definitely need to have some kind of manipulation
required with your Ids. Or otherwise you cannot split the data for a
particular entity (in same CF) in two rows, according to data
category. Of course you can also suggest to store different types of
data in different CFs but sometimes it is more optimal to keep a limit
on the no of CFs in Cassandra.

Regards
Ertio Lew



On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges <jhodges@twitter.com> wrote:
> Also, feel free to mock me for the phrase "identifying id".
>
> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jhodges@twitter.com> wrote:
>> If you patch snowflake to remove 4 bits from the timestamp section,
>> you will take the time that it takes before the IDs generated overflow
>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
>> little over 4 years (2 ** 37 milliseconds). This is likely
>> unacceptable for your use case.
>>
>> However, the larger point to discuss is that encoding additional
>> information about your data in the identifying id is, in general, a
>> bad idea. It means your architecture is strictly coupled to your
>> current and likely less-than-perfect understanding of the problem and
>> makes it harder to iterate towards a better one. For instance, we had
>> to rewrite certain parts of our search infrastructure when migrating
>> to snowflake because it had assumed that the generated id space of
>> tweets was uniform across time.
>>
>> But, of course, I'm just some dude on the internet who doesn't know
>> your particular problem or design in detail. God speed and good luck.
>>
>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>>> Yes I think we could perhaps reduce the micro seconds precision
>>> provided by it(I think 41 bits) to an appropriate extent to match our
>>> needs.
>>>
>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>>> So patch it!
>>>>
>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>>>>
>>>>> First that it does not start at 0 since it comprises timestamp,
>>>>> workerId and noOfGeneratedIds.
>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this ID
>>>>> then I risk* that it might overwrite the already existing ID created
>>>>> by it.
>>>>>
>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <ted.dunning@gmail.com>
>>>>> wrote:
>>>>> > Uh.... any sequential generator that starts at zero will take a
LONG time
>>>>> > until it generates a value > 2^60.
>>>>> >
>>>>> > If you generator a million id's per second (= 2^20) then it will
be
>>>>> longer
>>>>> > than 30,000 years before you get past 2^60.
>>>>> >
>>>>> > Is this *really* a problem?
>>>>> >
>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <ertiop93@gmail.com>
wrote:
>>>>> >
>>>>> >> Could you recommend any other ID generator that could help me
with
>>>>> >> increasing Ids(not necessarily sequential) with size<= 60
bits ?
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <ertiop93@gmail.com>
wrote:
>>>>> >> > Thanks Patrick,
>>>>> >> >
>>>>> >> > I considered your suggestion. But sadly it could not fit
my use case.
>>>>> >> > I am looking for a solution that could help me generate
64 bits Ids
>>>>> >> > but in those 64 bits I would like atleast 4 free bits so
that I could
>>>>> >> > manage with those free bits to distinguish the type of
data for a
>>>>> >> > particular entity in the same columnfamily.
>>>>> >> >
>>>>> >> > If I could keep the snowflake's Id size to around 60 bits,
that would
>>>>> >> > have been great..
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <phunt@apache.org>
>>>>> wrote:
>>>>> >> >> Keep in mind that blog post is pretty old. I see comments
like this
>>>>> in
>>>>> >> >> the commit log
>>>>> >> >>
>>>>> >> >> "hard to call it alpha/experimental after serving billions
of ids"
>>>>> >> >>
>>>>> >> >> so it seems it's in production at twitter at least...
>>>>> >> >>
>>>>> >> >> Patrick
>>>>> >> >>
>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <ertiop93@gmail.com>
>>>>> wrote:
>>>>> >> >>> Thanks Patrick,
>>>>> >> >>>
>>>>> >> >>> The fact that it is still in the alpha stage and
twitter is not yet
>>>>> >> >>> using it, makes me look to other solutions as well,
which have a
>>>>> large
>>>>> >> >>> community/users base & are more mature.
>>>>> >> >>>
>>>>> >> >>> I do not know much about the snowflake if it is
being used in
>>>>> >> >>> production by anyone ..
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt
<phunt@apache.org>
>>>>> >> wrote:
>>>>> >> >>>> Have you looked at snowflake?
>>>>> >> >>>>
>>>>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>>> >> >>>>
>>>>> >> >>>> Patrick
>>>>> >> >>>>
>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning
<
>>>>> ted.dunning@gmail.com>
>>>>> >> wrote:
>>>>> >> >>>>> If your id's don't need to be exactly sequential
or if the
>>>>> generation
>>>>> >> rate
>>>>> >> >>>>> is less than a few thousand per second,
ZK is a fine choice.
>>>>> >> >>>>>
>>>>> >> >>>>> To get very high generation rates, what
is typically done is to
>>>>> >> allocate
>>>>> >> >>>>> blocks of id's using ZK and then allocate
out of the block
>>>>> locally.
>>>>> >>  This
>>>>> >> >>>>> can cause you to wind up with a slightly
swiss-cheesed id space
>>>>> and
>>>>> >> it means
>>>>> >> >>>>> that the ordering of id's only approximates
the time ordering of
>>>>> when
>>>>> >> the
>>>>> >> >>>>> id's were assigned.  Neither of these
is typically a problem.
>>>>> >> >>>>>
>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio
Lew <ertiop93@gmail.com>
>>>>> >> wrote:
>>>>> >> >>>>>
>>>>> >> >>>>>> Hi all,
>>>>> >> >>>>>>
>>>>> >> >>>>>> I am involved in a project where we're
building a social
>>>>> application
>>>>> >> >>>>>> using Cassandra DB and Java. I am looking
for a solution to
>>>>> generate
>>>>> >> >>>>>> unique sequential IDs for the content
on the application. I have
>>>>> >> been
>>>>> >> >>>>>> suggested by some people to have a
look  to Zookeeper for this. I
>>>>> >> >>>>>> would highly appreciate if anyone can
suggest if zookeeper is
>>>>> >> suitable
>>>>> >> >>>>>> for this purpose and any good resources
to gain information about
>>>>> >> >>>>>> zookeeper.
>>>>> >> >>>>>>
>>>>> >> >>>>>> Since the application is based on a
eventually consistent
>>>>> >> distributed
>>>>> >> >>>>>> platform using Cassandra, we have felt
a need to look over to
>>>>> other
>>>>> >> >>>>>> solutions instead of building our own
using our DB.
>>>>> >> >>>>>>
>>>>> >> >>>>>> Any kind of comments, suggestions are
highly welcomed! :)
>>>>> >> >>>>>>
>>>>> >> >>>>>> Regards
>>>>> >> >>>>>> Ertio Lew.
>>>>> >> >>>>>>
>>>>> >> >>>>>
>>>>> >> >>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>
>>
>

Mime
View raw message