Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 24066 invoked from network); 28 Feb 2011 17:51:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Feb 2011 17:51:22 -0000 Received: (qmail 73264 invoked by uid 500); 28 Feb 2011 17:51:21 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 72397 invoked by uid 500); 28 Feb 2011 17:51:19 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 72377 invoked by uid 99); 28 Feb 2011 17:51:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Feb 2011 17:51:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rajkumar.w93@gmail.com designates 209.85.161.42 as permitted sender) Received: from [209.85.161.42] (HELO mail-fx0-f42.google.com) (209.85.161.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Feb 2011 17:51:13 +0000 Received: by fxm20 with SMTP id 20so4693148fxm.15 for ; Mon, 28 Feb 2011 09:50:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=OyNMNDrvMuzJDveYpJj1DcPGJkDIa5NyimyAAeBPP7I=; b=ROY2OhTsEMX80axvcjeaPqtOu7zCKk2vdKOumSSNYuSeEsaZT9y3mE2DlPnNW+RyXC 2eLerxiMH8A7F1t9ws5Ws7RAE1UiwAKkfBttjRxkYe5Wp6NCCkcsVGNOwt9Y5LxvkLRs jKgNoxTpGG49cmHA+FWJTnPBZFQB+bq0fV5po= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=hH/cYYKf/fDEHKv1kNqJ/Uf4N4EUzFYakmUGGkWYMMtfo+DlrcHpttFrTg776naD5j pWIGTnRFIpXyNZVQXGGJIcfkpJ30gMsV/20zUBS6Jh/08ZJK0Vbgw1Wptn7ijsiKMubw 0nBIH7yPYQwij6+UL+Kq4oW+i6vF9p1/+s7aE= MIME-Version: 1.0 Received: by 10.223.81.76 with SMTP id w12mr1516570fak.26.1298915451660; Mon, 28 Feb 2011 09:50:51 -0800 (PST) Sender: rajkumar.w93@gmail.com Received: by 10.223.23.26 with HTTP; Mon, 28 Feb 2011 09:50:51 -0800 (PST) In-Reply-To: References: Date: Mon, 28 Feb 2011 23:20:51 +0530 X-Google-Sender-Auth: u7whwT1agdsISjnqLNS5MbkOlec Message-ID: Subject: Re: Zookeeper for generating sequential IDs From: Ertio Lew To: jhodges@twitter.com, user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Jeff ! Your point is truly valid! However... even my idea is "not to store information about the data/entities in the Id" but to split the several data of an entity into several rows(according to category of that data) in same CF in Cassandra. So for e.g. if you want to split the information about a tweet in two rows according to the 'type of information', then you want two keys generated using the same ID. For this purpose you definitely need to have some kind of manipulation required with your Ids. Or otherwise you cannot split the data for a particular entity (in same CF) in two rows, according to data category. Of course you can also suggest to store different types of data in different CFs but sometimes it is more optimal to keep a limit on the no of CFs in Cassandra. Regards Ertio Lew On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges wrote: > Also, feel free to mock me for the phrase "identifying id". > > On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges wrote: >> If you patch snowflake to remove 4 bits from the timestamp section, >> you will take the time that it takes before the IDs generated overflow >> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a >> little over 4 years (2 ** 37 milliseconds). This is likely >> unacceptable for your use case. >> >> However, the larger point to discuss is that encoding additional >> information about your data in the identifying id is, in general, a >> bad idea. It means your architecture is strictly coupled to your >> current and likely less-than-perfect understanding of the problem and >> makes it harder to iterate towards a better one. For instance, we had >> to rewrite certain parts of our search infrastructure when migrating >> to snowflake because it had assumed that the generated id space of >> tweets was uniform across time. >> >> But, of course, I'm just some dude on the internet who doesn't know >> your particular problem or design in detail. God speed and good luck. >> >> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew wrote: >>> Yes I think we could perhaps reduce the micro seconds precision >>> provided by it(I think 41 bits) to an appropriate extent to match our >>> needs. >>> >>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning wr= ote: >>>> So patch it! >>>> >>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew wrote: >>>> >>>>> First that it does not start at 0 since it comprises timestamp, >>>>> workerId and noOfGeneratedIds. >>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this I= D >>>>> then I risk* that it might overwrite the already existing ID created >>>>> by it. >>>>> >>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning >>>>> wrote: >>>>> > Uh.... any sequential generator that starts at zero will take a LON= G time >>>>> > until it generates a value > 2^60. >>>>> > >>>>> > If you generator a million id's per second (=3D 2^20) then it will = be >>>>> longer >>>>> > than 30,000 years before you get past 2^60. >>>>> > >>>>> > Is this *really* a problem? >>>>> > >>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew wro= te: >>>>> > >>>>> >> Could you recommend any other ID generator that could help me with >>>>> >> increasing Ids(not necessarily sequential) with size<=3D 60 bits ? >>>>> >> >>>>> >> Thanks >>>>> >> >>>>> >> >>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew wr= ote: >>>>> >> > Thanks Patrick, >>>>> >> > >>>>> >> > I considered your suggestion. But sadly it could not fit my use = case. >>>>> >> > I am looking for a solution that could help me generate 64 bits = Ids >>>>> >> > but in those 64 bits I would like atleast 4 free bits so that I = could >>>>> >> > manage with those free bits to distinguish the type of data for = a >>>>> >> > particular entity in the same columnfamily. >>>>> >> > >>>>> >> > If I could keep the snowflake's Id size to around 60 bits, that = would >>>>> >> > have been great.. >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt >>>>> wrote: >>>>> >> >> Keep in mind that blog post is pretty old. I see comments like = this >>>>> in >>>>> >> >> the commit log >>>>> >> >> >>>>> >> >> "hard to call it alpha/experimental after serving billions of i= ds" >>>>> >> >> >>>>> >> >> so it seems it's in production at twitter at least... >>>>> >> >> >>>>> >> >> Patrick >>>>> >> >> >>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew >>>>> wrote: >>>>> >> >>> Thanks Patrick, >>>>> >> >>> >>>>> >> >>> The fact that it is still in the alpha stage and twitter is no= t yet >>>>> >> >>> using it, makes me look to other solutions as well, which have= a >>>>> large >>>>> >> >>> community/users base & are more mature. >>>>> >> >>> >>>>> >> >>> I do not know much about the snowflake if it is being used in >>>>> >> >>> production by anyone .. >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt >>>>> >> wrote: >>>>> >> >>>> Have you looked at snowflake? >>>>> >> >>>> >>>>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.h= tml >>>>> >> >>>> >>>>> >> >>>> Patrick >>>>> >> >>>> >>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning < >>>>> ted.dunning@gmail.com> >>>>> >> wrote: >>>>> >> >>>>> If your id's don't need to be exactly sequential or if the >>>>> generation >>>>> >> rate >>>>> >> >>>>> is less than a few thousand per second, ZK is a fine choice. >>>>> >> >>>>> >>>>> >> >>>>> To get very high generation rates, what is typically done is= to >>>>> >> allocate >>>>> >> >>>>> blocks of id's using ZK and then allocate out of the block >>>>> locally. >>>>> >> =A0This >>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id sp= ace >>>>> and >>>>> >> it means >>>>> >> >>>>> that the ordering of id's only approximates the time orderin= g of >>>>> when >>>>> >> the >>>>> >> >>>>> id's were assigned. =A0Neither of these is typically a probl= em. >>>>> >> >>>>> >>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew >>>>> >> wrote: >>>>> >> >>>>> >>>>> >> >>>>>> Hi all, >>>>> >> >>>>>> >>>>> >> >>>>>> I am involved in a project where we're building a social >>>>> application >>>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution to >>>>> generate >>>>> >> >>>>>> unique sequential IDs for the content on the application. I= have >>>>> >> been >>>>> >> >>>>>> suggested by some people to have a look =A0to Zookeeper for= this. I >>>>> >> >>>>>> would highly appreciate if anyone can suggest if zookeeper = is >>>>> >> suitable >>>>> >> >>>>>> for this purpose and any good resources to gain information= about >>>>> >> >>>>>> zookeeper. >>>>> >> >>>>>> >>>>> >> >>>>>> Since the application is based on a eventually consistent >>>>> >> distributed >>>>> >> >>>>>> platform using Cassandra, we have felt a need to look over = to >>>>> other >>>>> >> >>>>>> solutions instead of building our own using our DB. >>>>> >> >>>>>> >>>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :) >>>>> >> >>>>>> >>>>> >> >>>>>> Regards >>>>> >> >>>>>> Ertio Lew. >>>>> >> >>>>>> >>>>> >> >>>>> >>>>> >> >>>> >>>>> >> >>> >>>>> >> >> >>>>> >> > >>>>> >> >>>>> > >>>>> >>>> >>> >> >