zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hodges <jhod...@twitter.com>
Subject Re: Zookeeper for generating sequential IDs
Date Mon, 28 Feb 2011 17:20:17 GMT
Also, feel free to mock me for the phrase "identifying id".

On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jhodges@twitter.com> wrote:
> If you patch snowflake to remove 4 bits from the timestamp section,
> you will take the time that it takes before the IDs generated overflow
> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
> little over 4 years (2 ** 37 milliseconds). This is likely
> unacceptable for your use case.
>
> However, the larger point to discuss is that encoding additional
> information about your data in the identifying id is, in general, a
> bad idea. It means your architecture is strictly coupled to your
> current and likely less-than-perfect understanding of the problem and
> makes it harder to iterate towards a better one. For instance, we had
> to rewrite certain parts of our search infrastructure when migrating
> to snowflake because it had assumed that the generated id space of
> tweets was uniform across time.
>
> But, of course, I'm just some dude on the internet who doesn't know
> your particular problem or design in detail. God speed and good luck.
>
> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>> Yes I think we could perhaps reduce the micro seconds precision
>> provided by it(I think 41 bits) to an appropriate extent to match our
>> needs.
>>
>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>> So patch it!
>>>
>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>>>
>>>> First that it does not start at 0 since it comprises timestamp,
>>>> workerId and noOfGeneratedIds.
>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this ID
>>>> then I risk* that it might overwrite the already existing ID created
>>>> by it.
>>>>
>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <ted.dunning@gmail.com>
>>>> wrote:
>>>> > Uh.... any sequential generator that starts at zero will take a LONG
time
>>>> > until it generates a value > 2^60.
>>>> >
>>>> > If you generator a million id's per second (= 2^20) then it will be
>>>> longer
>>>> > than 30,000 years before you get past 2^60.
>>>> >
>>>> > Is this *really* a problem?
>>>> >
>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <ertiop93@gmail.com>
wrote:
>>>> >
>>>> >> Could you recommend any other ID generator that could help me with
>>>> >> increasing Ids(not necessarily sequential) with size<= 60 bits
?
>>>> >>
>>>> >> Thanks
>>>> >>
>>>> >>
>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <ertiop93@gmail.com>
wrote:
>>>> >> > Thanks Patrick,
>>>> >> >
>>>> >> > I considered your suggestion. But sadly it could not fit my
use case.
>>>> >> > I am looking for a solution that could help me generate 64
bits Ids
>>>> >> > but in those 64 bits I would like atleast 4 free bits so that
I could
>>>> >> > manage with those free bits to distinguish the type of data
for a
>>>> >> > particular entity in the same columnfamily.
>>>> >> >
>>>> >> > If I could keep the snowflake's Id size to around 60 bits,
that would
>>>> >> > have been great..
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <phunt@apache.org>
>>>> wrote:
>>>> >> >> Keep in mind that blog post is pretty old. I see comments
like this
>>>> in
>>>> >> >> the commit log
>>>> >> >>
>>>> >> >> "hard to call it alpha/experimental after serving billions
of ids"
>>>> >> >>
>>>> >> >> so it seems it's in production at twitter at least...
>>>> >> >>
>>>> >> >> Patrick
>>>> >> >>
>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <ertiop93@gmail.com>
>>>> wrote:
>>>> >> >>> Thanks Patrick,
>>>> >> >>>
>>>> >> >>> The fact that it is still in the alpha stage and twitter
is not yet
>>>> >> >>> using it, makes me look to other solutions as well,
which have a
>>>> large
>>>> >> >>> community/users base & are more mature.
>>>> >> >>>
>>>> >> >>> I do not know much about the snowflake if it is being
used in
>>>> >> >>> production by anyone ..
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <phunt@apache.org>
>>>> >> wrote:
>>>> >> >>>> Have you looked at snowflake?
>>>> >> >>>>
>>>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>> >> >>>>
>>>> >> >>>> Patrick
>>>> >> >>>>
>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>>> ted.dunning@gmail.com>
>>>> >> wrote:
>>>> >> >>>>> If your id's don't need to be exactly sequential
or if the
>>>> generation
>>>> >> rate
>>>> >> >>>>> is less than a few thousand per second, ZK
is a fine choice.
>>>> >> >>>>>
>>>> >> >>>>> To get very high generation rates, what is
typically done is to
>>>> >> allocate
>>>> >> >>>>> blocks of id's using ZK and then allocate out
of the block
>>>> locally.
>>>> >>  This
>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed
id space
>>>> and
>>>> >> it means
>>>> >> >>>>> that the ordering of id's only approximates
the time ordering of
>>>> when
>>>> >> the
>>>> >> >>>>> id's were assigned.  Neither of these is typically
a problem.
>>>> >> >>>>>
>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew
<ertiop93@gmail.com>
>>>> >> wrote:
>>>> >> >>>>>
>>>> >> >>>>>> Hi all,
>>>> >> >>>>>>
>>>> >> >>>>>> I am involved in a project where we're
building a social
>>>> application
>>>> >> >>>>>> using Cassandra DB and Java. I am looking
for a solution to
>>>> generate
>>>> >> >>>>>> unique sequential IDs for the content on
the application. I have
>>>> >> been
>>>> >> >>>>>> suggested by some people to have a look
 to Zookeeper for this. I
>>>> >> >>>>>> would highly appreciate if anyone can suggest
if zookeeper is
>>>> >> suitable
>>>> >> >>>>>> for this purpose and any good resources
to gain information about
>>>> >> >>>>>> zookeeper.
>>>> >> >>>>>>
>>>> >> >>>>>> Since the application is based on a eventually
consistent
>>>> >> distributed
>>>> >> >>>>>> platform using Cassandra, we have felt
a need to look over to
>>>> other
>>>> >> >>>>>> solutions instead of building our own using
our DB.
>>>> >> >>>>>>
>>>> >> >>>>>> Any kind of comments, suggestions are highly
welcomed! :)
>>>> >> >>>>>>
>>>> >> >>>>>> Regards
>>>> >> >>>>>> Ertio Lew.
>>>> >> >>>>>>
>>>> >> >>>>>
>>>> >> >>>>
>>>> >> >>>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>
>>
>

Mime
View raw message