Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 86727 invoked from network); 1 Mar 2011 07:26:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Mar 2011 07:26:23 -0000 Received: (qmail 65775 invoked by uid 500); 1 Mar 2011 07:26:22 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 65722 invoked by uid 500); 1 Mar 2011 07:26:19 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 65714 invoked by uid 99); 1 Mar 2011 07:26:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 07:26:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rajkumar.w93@gmail.com designates 209.85.161.42 as permitted sender) Received: from [209.85.161.42] (HELO mail-fx0-f42.google.com) (209.85.161.42) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 07:26:13 +0000 Received: by fxm20 with SMTP id 20so5411896fxm.15 for ; Mon, 28 Feb 2011 23:25:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=mdvdzJbtqMYAH4bstz0S4qDrgpmDtM3H5Gk84YCE+CM=; b=BNm7/ypBx57pIHFRZa0QlPvp0k3nuJmqdlbz9NmJfHtNS7Mc+L2TNuaZsySMqn8kgK w7HyPpF90BkF/aFy2TmZXw/SI+nhWrZMws7UTYNu3PNmYw97mcqTVT2x7Wk/iep6vXcP dKYMyIVsajRXCGiTxGlFueoBbO1Iztcxsc5/s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=B09UrsT5p52xiupv3ovjXF/0KOvafMFKrEXejkCAYf7qAv3fnoXz80PpQHcDQwFv0E jyJyXOn6I4+5InZVnRUu6GB5yC4JvQuH0UeyXEE1CpLSien1zqmuCkJO4Sj5bIn56OkJ URXgLxkLfYgGG2vxkT9g6qMximc2d/r0thXnc= MIME-Version: 1.0 Received: by 10.223.112.81 with SMTP id v17mr7209140fap.102.1298964352186; Mon, 28 Feb 2011 23:25:52 -0800 (PST) Sender: rajkumar.w93@gmail.com Received: by 10.223.23.26 with HTTP; Mon, 28 Feb 2011 23:25:52 -0800 (PST) In-Reply-To: References: Date: Tue, 1 Mar 2011 12:55:52 +0530 X-Google-Sender-Auth: WCxOdPHbahZh_lFnkRDfnTH-vnA Message-ID: Subject: Re: Zookeeper for generating sequential IDs From: Ertio Lew To: user@zookeeper.apache.org, jhodges@twitter.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Andrew, however I would prefer to stay away from supercolumns because of their well known limitations. Regarding the snowflake I think I can make it useful for me by limiting the currently 12 bits sequence no. to 8 bits and using the saved up 4 bits to store the category of data. Thus I would be reducing the theoritical limit of 4096 ids per millisecond per machine to 256 ids per ms per machine. Sounds too good for my use case.. @Jeff, Would you like to say something on this idea ?? Thank you all.. Ertio On Tue, Mar 1, 2011 at 6:44 AM, Andrew Ebaugh wrote: > Getting a bit into Cassandra weeds, but what about using a super column > and TimeUUIDType keys? IMO splitting data for one unique item into > multiple manipulated keys sounds complex, and more what a super column wa= s > made for. > > So instead of having: > > TimeIDA-Name -> {name column data} > TimeIDA-Blah -> {blah column data} > TimdIDB-Name -> ... > > You'd have : > > TimeIDA -> > =A0{ > =A0Name -> {name column data} > =A0Blah -> {blah column data} > =A0} > > TimeIDB -> > =A0... > > > This would give you the advantage of being able to query key slices based > on time ranges. > Here's a good article (seems a bit outdated for 0.7): > http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model > > > > On 2/28/11 9:50 AM, "Ertio Lew" wrote: > >>Thanks Jeff ! >> >>Your point is truly valid! However... even my idea is "not to store >>information about the data/entities in the Id" but to split the >>several data of an entity into several rows(according to category of >>that data) in same CF in Cassandra. >>So for e.g. if you want to split the information about a tweet in two >>rows according to the 'type of information', then you want two keys >>generated using the same ID. >> >>For this purpose you definitely need to have some kind of manipulation >>required with your Ids. Or otherwise you cannot split the data for a >>particular entity (in same CF) in two rows, according to data >>category. Of course you can also suggest to store different types of >>data in different CFs but sometimes it is more optimal to keep a limit >>on the no of CFs in Cassandra. >> >>Regards >>Ertio Lew >> >> >> >>On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges wrote= : >>> Also, feel free to mock me for the phrase "identifying id". >>> >>> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges >>>wrote: >>>> If you patch snowflake to remove 4 bits from the timestamp section, >>>> you will take the time that it takes before the IDs generated overflow >>>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a >>>> little over 4 years (2 ** 37 milliseconds). This is likely >>>> unacceptable for your use case. >>>> >>>> However, the larger point to discuss is that encoding additional >>>> information about your data in the identifying id is, in general, a >>>> bad idea. It means your architecture is strictly coupled to your >>>> current and likely less-than-perfect understanding of the problem and >>>> makes it harder to iterate towards a better one. For instance, we had >>>> to rewrite certain parts of our search infrastructure when migrating >>>> to snowflake because it had assumed that the generated id space of >>>> tweets was uniform across time. >>>> >>>> But, of course, I'm just some dude on the internet who doesn't know >>>> your particular problem or design in detail. God speed and good luck. >>>> >>>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew wrote: >>>>> Yes I think we could perhaps reduce the micro seconds precision >>>>> provided by it(I think 41 bits) to an appropriate extent to match our >>>>> needs. >>>>> >>>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning >>>>>wrote: >>>>>> So patch it! >>>>>> >>>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew >>>>>>wrote: >>>>>> >>>>>>> First that it does not start at 0 since it comprises timestamp, >>>>>>> workerId and noOfGeneratedIds. >>>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this >>>>>>>ID >>>>>>> then I risk* that it might overwrite the already existing ID create= d >>>>>>> by it. >>>>>>> >>>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning >>>>>>> wrote: >>>>>>> > Uh.... any sequential generator that starts at zero will take a >>>>>>>LONG time >>>>>>> > until it generates a value > 2^60. >>>>>>> > >>>>>>> > If you generator a million id's per second (=3D 2^20) then it wil= l >>>>>>>be >>>>>>> longer >>>>>>> > than 30,000 years before you get past 2^60. >>>>>>> > >>>>>>> > Is this *really* a problem? >>>>>>> > >>>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew >>>>>>>wrote: >>>>>>> > >>>>>>> >> Could you recommend any other ID generator that could help me >>>>>>>with >>>>>>> >> increasing Ids(not necessarily sequential) with size<=3D 60 bits= ? >>>>>>> >> >>>>>>> >> Thanks >>>>>>> >> >>>>>>> >> >>>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew >>>>>>>wrote: >>>>>>> >> > Thanks Patrick, >>>>>>> >> > >>>>>>> >> > I considered your suggestion. But sadly it could not fit my >>>>>>>use case. >>>>>>> >> > I am looking for a solution that could help me generate 64 >>>>>>>bits Ids >>>>>>> >> > but in those 64 bits I would like atleast 4 free bits so that >>>>>>>I could >>>>>>> >> > manage with those free bits to distinguish the type of data >>>>>>>for a >>>>>>> >> > particular entity in the same columnfamily. >>>>>>> >> > >>>>>>> >> > If I could keep the snowflake's Id size to around 60 bits, >>>>>>>that would >>>>>>> >> > have been great.. >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt >>>>>>> >>>>>>> wrote: >>>>>>> >> >> Keep in mind that blog post is pretty old. I see comments >>>>>>>like this >>>>>>> in >>>>>>> >> >> the commit log >>>>>>> >> >> >>>>>>> >> >> "hard to call it alpha/experimental after serving billions of >>>>>>>ids" >>>>>>> >> >> >>>>>>> >> >> so it seems it's in production at twitter at least... >>>>>>> >> >> >>>>>>> >> >> Patrick >>>>>>> >> >> >>>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew >>>>>>> >>>>>>> wrote: >>>>>>> >> >>> Thanks Patrick, >>>>>>> >> >>> >>>>>>> >> >>> The fact that it is still in the alpha stage and twitter is >>>>>>>not yet >>>>>>> >> >>> using it, makes me look to other solutions as well, which >>>>>>>have a >>>>>>> large >>>>>>> >> >>> community/users base & are more mature. >>>>>>> >> >>> >>>>>>> >> >>> I do not know much about the snowflake if it is being used i= n >>>>>>> >> >>> production by anyone .. >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt >>>>>>> >>>>>>> >> wrote: >>>>>>> >> >>>> Have you looked at snowflake? >>>>>>> >> >>>> >>>>>>> >> >>>> >>>>>>>http://engineering.twitter.com/2010/06/announcing-snowflake.html >>>>>>> >> >>>> >>>>>>> >> >>>> Patrick >>>>>>> >> >>>> >>>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning < >>>>>>> ted.dunning@gmail.com> >>>>>>> >> wrote: >>>>>>> >> >>>>> If your id's don't need to be exactly sequential or if the >>>>>>> generation >>>>>>> >> rate >>>>>>> >> >>>>> is less than a few thousand per second, ZK is a fine >>>>>>>choice. >>>>>>> >> >>>>> >>>>>>> >> >>>>> To get very high generation rates, what is typically done >>>>>>>is to >>>>>>> >> allocate >>>>>>> >> >>>>> blocks of id's using ZK and then allocate out of the block >>>>>>> locally. >>>>>>> >> =A0This >>>>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id >>>>>>>space >>>>>>> and >>>>>>> >> it means >>>>>>> >> >>>>> that the ordering of id's only approximates the time >>>>>>>ordering of >>>>>>> when >>>>>>> >> the >>>>>>> >> >>>>> id's were assigned. =A0Neither of these is typically a >>>>>>>problem. >>>>>>> >> >>>>> >>>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew >>>>>>> >>>>>>> >> wrote: >>>>>>> >> >>>>> >>>>>>> >> >>>>>> Hi all, >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> I am involved in a project where we're building a social >>>>>>> application >>>>>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution >>>>>>>to >>>>>>> generate >>>>>>> >> >>>>>> unique sequential IDs for the content on the application. >>>>>>>I have >>>>>>> >> been >>>>>>> >> >>>>>> suggested by some people to have a look =A0to Zookeeper f= or >>>>>>>this. I >>>>>>> >> >>>>>> would highly appreciate if anyone can suggest if >>>>>>>zookeeper is >>>>>>> >> suitable >>>>>>> >> >>>>>> for this purpose and any good resources to gain >>>>>>>information about >>>>>>> >> >>>>>> zookeeper. >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> Since the application is based on a eventually consistent >>>>>>> >> distributed >>>>>>> >> >>>>>> platform using Cassandra, we have felt a need to look >>>>>>>over to >>>>>>> other >>>>>>> >> >>>>>> solutions instead of building our own using our DB. >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :) >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> Regards >>>>>>> >> >>>>>> Ertio Lew. >>>>>>> >> >>>>>> >>>>>>> >> >>>>> >>>>>>> >> >>>> >>>>>>> >> >>> >>>>>>> >> >> >>>>>>> >> > >>>>>>> >> >>>>>>> > >>>>>>> >>>>>> >>>>> >>>> >>> > >