Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (athena.apache.org: domain of rajkumar.w93@gmail.com
 designates 209.85.161.42 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type
         :content-transfer-encoding;
        b=hH/cYYKf/fDEHKv1kNqJ/Uf4N4EUzFYakmUGGkWYMMtfo+DlrcHpttFrTg776naD5j
         pWIGTnRFIpXyNZVQXGGJIcfkpJ30gMsV/20zUBS6Jh/08ZJK0Vbgw1Wptn7ijsiKMubw
         0nBIH7yPYQwij6+UL+Kq4oW+i6vF9p1/+s7aE=
MIME-Version: 1.0
Sender: rajkumar.w93@gmail.com
In-Reply-To: <AANLkTikEuXo1E4Qxq6qv6EjUipYGBMnhgNjEcnZn-pFw@mail.gmail.com>
References: <AANLkTi=Of77TdUpQQmjbNUzquejFe=zPzENcjwfK885b@mail.gmail.com>
	<AANLkTimNRt=2brNcs0sAJSLY+fMCYBMvTGu8E2QG_Z6i@mail.gmail.com>
	<AANLkTikNP_+qV4_D_oTaDeFuTPmf5ng+r=77a8jbtDXX@mail.gmail.com>
	<AANLkTinyOanus1PLewHfD1=WAUcq9QWvevecUxAmDSY6@mail.gmail.com>
	<AANLkTikN8SAVvTBZ6aJCOvXaQOgb7uudHfpBv1LZTbke@mail.gmail.com>
	<AANLkTi=1FUxjJ3Pk_uVYkUxYymcU1cN+S+2gKSdaJiqb@mail.gmail.com>
	<AANLkTi=Kj_X73xmzpU8Awu3AMo5dum5JsDiDF8wZLkRv@mail.gmail.com>
	<AANLkTikv4PUkjK-F5rLampbr73HFu09m8Si=Fc_5wbEH@mail.gmail.com>
	<AANLkTi=bn0FL7rxPEc-dgtsF8r-b8xjTvCm-jePJKOab@mail.gmail.com>
	<AANLkTinSxncSzhhz4av-=LbfXksmExNoCMfDP1tH=pM7@mail.gmail.com>
	<AANLkTi=F4YZs1FALM1nHe+E1bLfis8+CxbiWq+8nUd9E@mail.gmail.com>
	<AANLkTinvbcE3V3qabOL4CbfhrenkWQO6TBhx7+eZ69r5@mail.gmail.com>
	<AANLkTikEuXo1E4Qxq6qv6EjUipYGBMnhgNjEcnZn-pFw@mail.gmail.com>
Date: Mon, 28 Feb 2011 23:20:51 +0530
Message-ID: <AANLkTim27eiNVL6MHn6r7-ATE+ni3Q85SSGiVP7RUcaJ@mail.gmail.com>
Subject: Re: Zookeeper for generating sequential IDs
From: Ertio Lew <ertiop93@gmail.com>
To: jhodges@twitter.com, user@zookeeper.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks Jeff !

Your point is truly valid! However... even my idea is "not to store
information about the data/entities in the Id" but to split the
several data of an entity into several rows(according to category of
that data) in same CF in Cassandra.
So for e.g. if you want to split the information about a tweet in two
rows according to the 'type of information', then you want two keys
generated using the same ID.

For this purpose you definitely need to have some kind of manipulation
required with your Ids. Or otherwise you cannot split the data for a
particular entity (in same CF) in two rows, according to data
category. Of course you can also suggest to store different types of
data in different CFs but sometimes it is more optimal to keep a limit
on the no of CFs in Cassandra.

Regards
Ertio Lew


On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges <jhodges@twitter.com> wrote:
> Also, feel free to mock me for the phrase "identifying id".
>
> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jhodges@twitter.com> wrote:
>> If you patch snowflake to remove 4 bits from the timestamp section,
>> you will take the time that it takes before the IDs generated overflow
>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
>> little over 4 years (2 ** 37 milliseconds). This is likely
>> unacceptable for your use case.
>>
>> However, the larger point to discuss is that encoding additional
>> information about your data in the identifying id is, in general, a
>> bad idea. It means your architecture is strictly coupled to your
>> current and likely less-than-perfect understanding of the problem and
>> makes it harder to iterate towards a better one. For instance, we had
>> to rewrite certain parts of our search infrastructure when migrating
>> to snowflake because it had assumed that the generated id space of
>> tweets was uniform across time.
>>
>> But, of course, I'm just some dude on the internet who doesn't know
>> your particular problem or design in detail. God speed and good luck.
>>
>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>>> Yes I think we could perhaps reduce the micro seconds precision
>>> provided by it(I think 41 bits) to an appropriate extent to match our
>>> needs.
>>>
>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <ted.dunning@gmail.com> wr=
ote:
>>>> So patch it!
>>>>
>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <ertiop93@gmail.com> wrote:
>>>>
>>>>> First that it does not start at 0 since it comprises timestamp,
>>>>> workerId and noOfGeneratedIds.
>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this I=
D
>>>>> then I risk* that it might overwrite the already existing ID created
>>>>> by it.
>>>>>
>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <ted.dunning@gmail.com>
>>>>> wrote:
>>>>> > Uh.... any sequential generator that starts at zero will take a LON=
G time
>>>>> > until it generates a value > 2^60.
>>>>> >
>>>>> > If you generator a million id's per second (=3D 2^20) then it will =
be
>>>>> longer
>>>>> > than 30,000 years before you get past 2^60.
>>>>> >
>>>>> > Is this *really* a problem?
>>>>> >
>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <ertiop93@gmail.com> wro=
te:
>>>>> >
>>>>> >> Could you recommend any other ID generator that could help me with
>>>>> >> increasing Ids(not necessarily sequential) with size<=3D 60 bits ?
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <ertiop93@gmail.com> wr=
ote:
>>>>> >> > Thanks Patrick,
>>>>> >> >
>>>>> >> > I considered your suggestion. But sadly it could not fit my use =
case.
>>>>> >> > I am looking for a solution that could help me generate 64 bits =
Ids
>>>>> >> > but in those 64 bits I would like atleast 4 free bits so that I =
could
>>>>> >> > manage with those free bits to distinguish the type of data for =
a
>>>>> >> > particular entity in the same columnfamily.
>>>>> >> >
>>>>> >> > If I could keep the snowflake's Id size to around 60 bits, that =
would
>>>>> >> > have been great..
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <phunt@apache.org>
>>>>> wrote:
>>>>> >> >> Keep in mind that blog post is pretty old. I see comments like =
this
>>>>> in
>>>>> >> >> the commit log
>>>>> >> >>
>>>>> >> >> "hard to call it alpha/experimental after serving billions of i=
ds"
>>>>> >> >>
>>>>> >> >> so it seems it's in production at twitter at least...
>>>>> >> >>
>>>>> >> >> Patrick
>>>>> >> >>
>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <ertiop93@gmail.com>
>>>>> wrote:
>>>>> >> >>> Thanks Patrick,
>>>>> >> >>>
>>>>> >> >>> The fact that it is still in the alpha stage and twitter is no=
t yet
>>>>> >> >>> using it, makes me look to other solutions as well, which have=
 a
>>>>> large
>>>>> >> >>> community/users base & are more mature.
>>>>> >> >>>
>>>>> >> >>> I do not know much about the snowflake if it is being used in
>>>>> >> >>> production by anyone ..
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <phunt@apache.o=
rg>
>>>>> >> wrote:
>>>>> >> >>>> Have you looked at snowflake?
>>>>> >> >>>>
>>>>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.h=
tml
>>>>> >> >>>>
>>>>> >> >>>> Patrick
>>>>> >> >>>>
>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>>>> ted.dunning@gmail.com>
>>>>> >> wrote:
>>>>> >> >>>>> If your id's don't need to be exactly sequential or if the
>>>>> generation
>>>>> >> rate
>>>>> >> >>>>> is less than a few thousand per second, ZK is a fine choice.
>>>>> >> >>>>>
>>>>> >> >>>>> To get very high generation rates, what is typically done is=
 to
>>>>> >> allocate
>>>>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>>>>> locally.
>>>>> >> =A0This
>>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id sp=
ace
>>>>> and
>>>>> >> it means
>>>>> >> >>>>> that the ordering of id's only approximates the time orderin=
g of
>>>>> when
>>>>> >> the
>>>>> >> >>>>> id's were assigned. =A0Neither of these is typically a probl=
em.
>>>>> >> >>>>>
>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <ertiop93@gmail.c=
om>
>>>>> >> wrote:
>>>>> >> >>>>>
>>>>> >> >>>>>> Hi all,
>>>>> >> >>>>>>
>>>>> >> >>>>>> I am involved in a project where we're building a social
>>>>> application
>>>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution to
>>>>> generate
>>>>> >> >>>>>> unique sequential IDs for the content on the application. I=
 have
>>>>> >> been
>>>>> >> >>>>>> suggested by some people to have a look =A0to Zookeeper for=
 this. I
>>>>> >> >>>>>> would highly appreciate if anyone can suggest if zookeeper =
is
>>>>> >> suitable
>>>>> >> >>>>>> for this purpose and any good resources to gain information=
 about
>>>>> >> >>>>>> zookeeper.
>>>>> >> >>>>>>
>>>>> >> >>>>>> Since the application is based on a eventually consistent
>>>>> >> distributed
>>>>> >> >>>>>> platform using Cassandra, we have felt a need to look over =
to
>>>>> other
>>>>> >> >>>>>> solutions instead of building our own using our DB.
>>>>> >> >>>>>>
>>>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>> >> >>>>>>
>>>>> >> >>>>>> Regards
>>>>> >> >>>>>> Ertio Lew.
>>>>> >> >>>>>>
>>>>> >> >>>>>
>>>>> >> >>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>
>>
>