Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 42928 invoked from network); 19 Mar 2011 05:06:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Mar 2011 05:06:20 -0000 Received: (qmail 24797 invoked by uid 500); 19 Mar 2011 05:06:19 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 24590 invoked by uid 500); 19 Mar 2011 05:06:16 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 24460 invoked by uid 99); 19 Mar 2011 05:06:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Mar 2011 05:06:13 +0000 X-ASF-Spam-Status: No, hits=1.1 required=5.0 tests=NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Mar 2011 05:06:07 +0000 Received: from SP2-EX07CAS02.ds.corp.yahoo.com (sp2-ex07cas02.corp.sp2.yahoo.com [98.137.59.38]) by mrout1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p2J55E7V035804; Fri, 18 Mar 2011 22:05:14 -0700 (PDT) Received: from SP2-EX07VS04.ds.corp.yahoo.com ([98.137.59.33]) by SP2-EX07CAS02.ds.corp.yahoo.com ([98.137.59.38]) with mapi; Fri, 18 Mar 2011 22:05:14 -0700 From: Michi Mutsuzaki To: "user@zookeeper.apache.org" CC: Otis Gospodnetic Date: Fri, 18 Mar 2011 22:05:12 -0700 Subject: Re: Using ZK for real-time group membership notification Thread-Topic: Using ZK for real-time group membership notification Thread-Index: Acvl8zqt5ORUzpv+SIiq5pYEqosfLA== Message-ID: References: <649974.8956.qm@web130110.mail.mud.yahoo.com> <565128.31393.qm@web130120.mail.mud.yahoo.com> In-Reply-To: <565128.31393.qm@web130120.mail.mud.yahoo.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Hi Otis, Would it be possible to change your app to use producer-consumer queue? Tha= t way, no document will go unprocessed when an instance goes down.=20 http://zookeeper.apache.org/doc/r3.3.2/zookeeperTutorial.html#sc_producerCo= nsumerQueues --Michi On Mar 18, 2011, at 9:41 PM, Otis Gospodnetic wrote: > Hi, >=20 > Thanks Ben. Let me describe the context a bit more - once you know what = I'm=20 > trying to do you may have suggestions for how to solve the problem with o= r=20 > without ZK. >=20 > I have a continuous "stream" of documents that I need to process. The st= ream is=20 > pretty fast (don't know the exact number, but it's many docs a second). = Docs=20 > live only in-memory in that stream and cannot be saved to disk at any poi= nt up=20 > the stream. >=20 > My app listens to this stream. Because of the high document ingestion ra= te I=20 > need N instances of my app to listen to this stream. So all N apps liste= n and=20 > they all "get" the same documents, but only 1 app actually processes each= =20 > document -- "if (docID mod N =3D=3D appID) then process doc" -- the usual= consistent=20 > hashing stuff. I'd like to be able to add and remove apps dynamically an= d have=20 > the remaining apps realize that "N" has changed. Similarly, when some ap= p=20 > instance dies and thus "N" changes, I'd like all the remaining instances = to know=20 > about it. >=20 > If my apps don't know the correct "N" then 1/Nth of docs will go unproces= sed (if=20 > the app died or was removed) until the remaining apps adjust their local = value=20 > of "N". >=20 >> to deal with this applications can use views, which allow clients to >> reconcile differences. for example, if two processes communicate and >=20 > Hm, this requires apps to communicate with each other. If each app was a= ware of=20 > other apps, then I could get the membership count directly using that mec= hanism,=20 > although I still wouldn't be able to immediately detect when some app die= d, at=20 > least I'm not sure how I could do that. >=20 >> one has a different list of members than the other then they can both >> consult zookeeper to reconcile or use the membership list with the >> highest zxid. the other option is to count on eventually everyone >> converging. >=20 > Right, if I could live with eventually having the right "N", then I could= use ZK=20 > as described on=20 > http://eng.wealthfront.com/2010/01/actually-implementing-group-management= .html >=20 > But say I'm OK with "eventually everyone converging". Can I use ZK then?= And=20 > if so, how "eventually" is this "eventually"? That is, if an app dies, h= ow=20 > quickly can ZK notify all znode watchers that znode change? A few millise= conds=20 > or more? >=20 > In general, how does one deal with situations like the one I described ab= ove,=20 > where each app is responsible for 1/Nth of work and where N can uncontrol= lably=20 > and unexpectedly change? >=20 > Thanks! > Otis >=20 >=20 >=20 >=20 >=20 > ----- Original Message ---- >> From: Benjamin Reed >> To: user@zookeeper.apache.org >> Sent: Fri, March 18, 2011 5:59:43 PM >> Subject: Re: Using ZK for real-time group membership notification >>=20 >> in a distributed setting such an answer is impossible. especially >> given the theory of relativity and the speed of light. a machine may >> fail right after sending a heart beat or another may come online right >> after sending a report. even if zookeeper could provide this you would >> still have thread scheduling issues on a local machine that means that >> you are operating on old information. >>=20 >> to deal with this applications can use views, which allow clients to >> reconcile differences. for example, if two processes communicate and >> one has a different list of members than the other then they can both >> consult zookeeper to reconcile or use the membership list with the >> highest zxid. the other option is to count on eventually everyone >> converging. >>=20 >> i would not develop a distributed system with the assumption that "all >> group members know *the exact number of members at all times*". >>=20 >> ben >>=20 >> On Fri, Mar 18, 2011 at 2:02 PM, Otis Gospodnetic >> wrote: >>> Hi, >>>=20 >>> Short version: >>> How can ZK be used to make sure that all group members know *the exact= =20 >> number of >>> members at all times*? >>>=20 >>> I have an app that can be run on 1 or more servers. New instances of = the=20 >> app >>> come and go, may die, etc. -- the number of the app instances is compl= etely >>> dynamic. At any one time, as these apps come and go, each live instan= ce of=20 >> the >>> app needs to know how many instances are there total. If a new instan= ce of=20 >> the >>> app is started, all instances need to know the new total number of=20 >> instances. >>> If an app is stopped or if it dies, the remaining apps need to know th= e new >>> number of app instances. >>>=20 >>> Also, and this is critical, they need to know about these additions/re= movals=20 >> of >>> apps right away and they all need to find out them at the same time.=20 >> Basically, >>> all members of some group need to know *the exact number of members at= all >>> times*. >>>=20 >>> This sounds almost like we need to watch a "parent group znode" and mon= itor =20 >> the >>> number of its ephemeral children, which represent each app instance th= at is >>> watching the "parent groups znode". Is that right? If so, then all I= 'd=20 >> need to >>> know is the answer to "How many watchers are watching this znode?" of = "How=20 >> many >>> kids does this znode have?". And I'd need ZK to notify all watchers wh= enever=20 >> the >>> answer to this question changes. Ideally it would send/push the answe= r=20 > (the >>> number of watchers) to all watchers, but if not, I assume any watcher = that=20 >> is >>> notified about the change would go poll ZK to get the number of epheme= ral=20 >> kids. >>>=20 >>> I think the above is essentially what's described on >>>=20 >> http://eng.wealthfront.com/2010/01/actually-implementing-group-managemen= t.html , >>> but doesn't answer the part that's critical for me (the very first Q u= p=20 >> above). >>>=20 >>> Thanks, >>> Otis >>> ---- >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >>> Lucene ecosystem search :: http://search-lucene.com/ >>>=20 >>>=20 >>=20