Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 81828 invoked from network); 21 Mar 2011 02:38:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Mar 2011 02:38:48 -0000 Received: (qmail 88894 invoked by uid 500); 21 Mar 2011 02:38:46 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 88374 invoked by uid 500); 21 Mar 2011 02:38:45 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 88186 invoked by uid 99); 21 Mar 2011 02:38:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Mar 2011 02:38:44 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Camille.Fournier@gs.com designates 204.4.187.100 as permitted sender) Received: from [204.4.187.100] (HELO mxecd09.gs.com) (204.4.187.100) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Mar 2011 02:38:36 +0000 X-IronPort-AV: E=Sophos;i="4.63,217,1299474000"; d="scan'208";a="192555354" Received: from unknown (HELO mxpcd01-public.ny.fw.gs.com) ([148.86.97.78]) by mxecd09.idz.gs.com with ESMTP; 20 Mar 2011 22:38:15 -0400 From: "Fournier, Camille F. [Tech]" X-sendergroup: RELAYLIST Received: from gshcbdp06ex.firmwide.corp.gs.com ([139.172.19.145]) by cd01-mxp-vip-prod.ny.fw.gs.com with ESMTP; 20 Mar 2011 22:38:15 -0400 Received: from GSCMAMP02EX.firmwide.corp.gs.com ([139.172.184.156]) by gshcbdp06ex.firmwide.corp.gs.com ([139.172.19.145]) with mapi; Sun, 20 Mar 2011 22:38:15 -0400 To: "'user@zookeeper.apache.org'" Date: Sun, 20 Mar 2011 22:38:14 -0400 Subject: RE: Using ZK for real-time group membership notification Thread-Topic: Using ZK for real-time group membership notification Thread-Index: Acvl8AE07Y5TcuuAQRCqB856TO5dLgBf+mYQ Message-ID: <69D3016305F9084FBD2C4A0DF189BD5C16BB69363C@GSCMAMP02EX.firmwide.corp.gs.com> References: <649974.8956.qm@web130110.mail.mud.yahoo.com> <565128.31393.qm@web130120.mail.mud.yahoo.com> In-Reply-To: <565128.31393.qm@web130120.mail.mud.yahoo.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-retentionstamp: Firmwide Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org To do this right you probably need messaging queues. I'd research the vario= us MQ solutions out there. They are built to handle exactly this sort of is= sue. You could try to implement it via a ZK distributed queue plus some sort of = crazy transaction logic in each process (so that a document to process woul= d return to the queue if the processor somehow didn't commit the transactio= n before dying etc), but it sounds=20 frankly like a nightmare, even if you're comfortable with occasionally just= losing documents. > But say I'm OK with "eventually everyone converging". Can I use ZK then?= And=20 > if so, how "eventually" is this "eventually"? That is, if an app dies, h= ow=20 > quickly can ZK notify all znode watchers that znode change? A few millise= conds=20 > or more? If your app dies, it will live on in ZK until the session timeout is reache= d (that is usually in the seconds magnitude). Unless we implement better se= ssion death detection logic.=20 C -----Original Message----- From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=20 Sent: Saturday, March 19, 2011 12:41 AM To: user@zookeeper.apache.org Subject: Re: Using ZK for real-time group membership notification Hi, Thanks Ben. Let me describe the context a bit more - once you know what I'= m=20 trying to do you may have suggestions for how to solve the problem with or= =20 without ZK. I have a continuous "stream" of documents that I need to process. The stre= am is=20 pretty fast (don't know the exact number, but it's many docs a second). Do= cs=20 live only in-memory in that stream and cannot be saved to disk at any point= up=20 the stream. My app listens to this stream. Because of the high document ingestion rate= I=20 need N instances of my app to listen to this stream. So all N apps listen = and=20 they all "get" the same documents, but only 1 app actually processes each=20 document -- "if (docID mod N =3D=3D appID) then process doc" -- the usual c= onsistent=20 hashing stuff. I'd like to be able to add and remove apps dynamically and = have=20 the remaining apps realize that "N" has changed. Similarly, when some app= =20 instance dies and thus "N" changes, I'd like all the remaining instances to= know=20 about it. If my apps don't know the correct "N" then 1/Nth of docs will go unprocesse= d (if=20 the app died or was removed) until the remaining apps adjust their local va= lue=20 of "N". > to deal with this applications can use views, which allow clients to > reconcile differences. for example, if two processes communicate and Hm, this requires apps to communicate with each other. If each app was awa= re of=20 other apps, then I could get the membership count directly using that mecha= nism,=20 although I still wouldn't be able to immediately detect when some app died,= at=20 least I'm not sure how I could do that. > one has a different list of members than the other then they can both > consult zookeeper to reconcile or use the membership list with the > highest zxid. the other option is to count on eventually everyone > converging. Right, if I could live with eventually having the right "N", then I could u= se ZK=20 as described on=20 http://eng.wealthfront.com/2010/01/actually-implementing-group-management.h= tml But say I'm OK with "eventually everyone converging". Can I use ZK then? = And=20 if so, how "eventually" is this "eventually"? That is, if an app dies, how= =20 quickly can ZK notify all znode watchers that znode change? A few milliseco= nds=20 or more? In general, how does one deal with situations like the one I described abov= e,=20 where each app is responsible for 1/Nth of work and where N can uncontrolla= bly=20 and unexpectedly change? Thanks! Otis ----- Original Message ---- > From: Benjamin Reed > To: user@zookeeper.apache.org > Sent: Fri, March 18, 2011 5:59:43 PM > Subject: Re: Using ZK for real-time group membership notification >=20 > in a distributed setting such an answer is impossible. especially > given the theory of relativity and the speed of light. a machine may > fail right after sending a heart beat or another may come online right > after sending a report. even if zookeeper could provide this you would > still have thread scheduling issues on a local machine that means that > you are operating on old information. >=20 > to deal with this applications can use views, which allow clients to > reconcile differences. for example, if two processes communicate and > one has a different list of members than the other then they can both > consult zookeeper to reconcile or use the membership list with the > highest zxid. the other option is to count on eventually everyone > converging. >=20 > i would not develop a distributed system with the assumption that "all > group members know *the exact number of members at all times*". >=20 > ben >=20 > On Fri, Mar 18, 2011 at 2:02 PM, Otis Gospodnetic > wrote: > > Hi, > > > > Short version: > > How can ZK be used to make sure that all group members know *the exact= =20 >number of > > members at all times*? > > > > I have an app that can be run on 1 or more servers. New instances of = the=20 >app > > come and go, may die, etc. -- the number of the app instances is compl= etely > > dynamic. At any one time, as these apps come and go, each live instan= ce of=20 >the > > app needs to know how many instances are there total. If a new instan= ce of=20 >the > > app is started, all instances need to know the new total number of=20 >instances. > > If an app is stopped or if it dies, the remaining apps need to know th= e new > > number of app instances. > > > > Also, and this is critical, they need to know about these additions/re= movals=20 >of > > apps right away and they all need to find out them at the same time.=20 >Basically, > > all members of some group need to know *the exact number of members at= all > > times*. > > > > This sounds almost like we need to watch a "parent group znode" and mo= nitor =20 >the > > number of its ephemeral children, which represent each app instance th= at is > > watching the "parent groups znode". Is that right? If so, then all I= 'd=20 >need to > > know is the answer to "How many watchers are watching this znode?" of = "How=20 >many > > kids does this znode have?". And I'd need ZK to notify all watchers wh= enever=20 >the > > answer to this question changes. Ideally it would send/push the answe= r=20 (the > > number of watchers) to all watchers, but if not, I assume any watcher = that=20 >is > > notified about the change would go poll ZK to get the number of epheme= ral=20 >kids. > > > > I think the above is essentially what's described on > >=20 >http://eng.wealthfront.com/2010/01/actually-implementing-group-management.= html , > > but doesn't answer the part that's critical for me (the very first Q u= p=20 >above). > > > > Thanks, > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > >=20