Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (nike.apache.org: domain of Camille.Fournier@gs.com
 designates 204.4.187.100 as permitted sender)
From: "Fournier, Camille F. [Tech]" <Camille.Fournier@gs.com>
To: "'user@zookeeper.apache.org'" <user@zookeeper.apache.org>
Date: Sun, 20 Mar 2011 22:38:14 -0400
Subject: RE: Using ZK for real-time group membership notification
Thread-Topic: Using ZK for real-time group membership notification
Thread-Index: Acvl8AE07Y5TcuuAQRCqB856TO5dLgBf+mYQ
Message-ID: 
 <69D3016305F9084FBD2C4A0DF189BD5C16BB69363C@GSCMAMP02EX.firmwide.corp.gs.com>
References: <649974.8956.qm@web130110.mail.mud.yahoo.com>
 <AANLkTikGyhb7jJX9OK1QHqjgh3Bg9PCoA6VA4MKx3-kf@mail.gmail.com>
 <565128.31393.qm@web130120.mail.mud.yahoo.com>
In-Reply-To: <565128.31393.qm@web130120.mail.mud.yahoo.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

To do this right you probably need messaging queues. I'd research the vario=
us MQ solutions out there. They are built to handle exactly this sort of is=
sue.

You could try to implement it via a ZK distributed queue plus some sort of =
crazy transaction logic in each process (so that a document to process woul=
d return to the queue if the processor somehow didn't commit the transactio=
n before dying etc), but it sounds=20
frankly like a nightmare, even if you're comfortable with occasionally just=
 losing documents.

> But say I'm OK with "eventually everyone converging".  Can I use ZK then?=
  And=20
> if so, how "eventually" is this "eventually"?  That is, if an app dies, h=
ow=20
> quickly can ZK notify all znode watchers that znode change? A few millise=
conds=20
> or more?

If your app dies, it will live on in ZK until the session timeout is reache=
d (that is usually in the seconds magnitude). Unless we implement better se=
ssion death detection logic.=20

C

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=20
Sent: Saturday, March 19, 2011 12:41 AM
To: user@zookeeper.apache.org
Subject: Re: Using ZK for real-time group membership notification

Hi,

Thanks Ben.  Let me describe the context a bit more - once you know what I'=
m=20
trying to do you may have suggestions for how to solve the problem with or=
=20
without ZK.

I have a continuous "stream" of documents that I need to process.  The stre=
am is=20
pretty fast (don't know the exact number, but it's many docs a second).  Do=
cs=20
live only in-memory in that stream and cannot be saved to disk at any point=
 up=20
the stream.

My app listens to this stream.  Because of the high document ingestion rate=
 I=20
need N instances of my app to listen to this stream.  So all N apps listen =
and=20
they all "get" the same documents, but only 1 app actually processes each=20
document -- "if (docID mod N =3D=3D appID) then process doc" -- the usual c=
onsistent=20
hashing stuff.  I'd like to be able to add and remove apps dynamically and =
have=20
the remaining apps realize that "N" has changed.  Similarly, when some app=
=20
instance dies and thus "N" changes, I'd like all the remaining instances to=
 know=20
about it.

If my apps don't know the correct "N" then 1/Nth of docs will go unprocesse=
d (if=20
the app died or was removed) until the remaining apps adjust their local va=
lue=20
of "N".

> to deal with this applications can use views, which allow  clients to
> reconcile differences. for example, if two processes communicate  and

Hm, this requires apps to communicate with each other.  If each app was awa=
re of=20
other apps, then I could get the membership count directly using that mecha=
nism,=20
although I still wouldn't be able to immediately detect when some app died,=
 at=20
least I'm not sure how I could do that.

> one has a different list of members than the other then they can  both
> consult zookeeper to reconcile or use the membership list with  the
> highest zxid. the other option is to count on eventually  everyone
> converging.

Right, if I could live with eventually having the right "N", then I could u=
se ZK=20
as described on=20
http://eng.wealthfront.com/2010/01/actually-implementing-group-management.h=
tml

But say I'm OK with "eventually everyone converging".  Can I use ZK then?  =
And=20
if so, how "eventually" is this "eventually"?  That is, if an app dies, how=
=20
quickly can ZK notify all znode watchers that znode change? A few milliseco=
nds=20
or more?

In general, how does one deal with situations like the one I described abov=
e,=20
where each app is responsible for 1/Nth of work and where N can uncontrolla=
bly=20
and unexpectedly change?

Thanks!
Otis


----- Original Message ----
> From: Benjamin Reed <breed@apache.org>
> To: user@zookeeper.apache.org
> Sent: Fri, March 18, 2011 5:59:43 PM
> Subject: Re: Using ZK for real-time group membership notification
>=20
> in a distributed setting such an answer is impossible. especially
> given the  theory of relativity and the speed of light. a machine may
> fail right after  sending a heart beat or another may come online right
> after sending a report.  even if zookeeper could provide this you would
> still have thread scheduling  issues on a local machine that means that
> you are operating on old  information.
>=20
> to deal with this applications can use views, which allow  clients to
> reconcile differences. for example, if two processes communicate  and
> one has a different list of members than the other then they can  both
> consult zookeeper to reconcile or use the membership list with  the
> highest zxid. the other option is to count on eventually  everyone
> converging.
>=20
> i would not develop a distributed system with the  assumption that "all
> group members know *the exact number of  members at  all times*".
>=20
> ben
>=20
> On Fri, Mar 18, 2011 at 2:02 PM, Otis  Gospodnetic
> <otis_gospodnetic@yahoo.com>  wrote:
> > Hi,
> >
> > Short version:
> > How can ZK be used to  make sure that all group members know *the exact=
=20
>number of
> > members at  all times*?
> >
> > I have an app that can be run on 1 or more servers.   New instances of =
the=20
>app
> > come and go, may die, etc. -- the number of  the app instances is compl=
etely
> > dynamic.  At any one time, as these apps  come and go, each live instan=
ce of=20
>the
> > app needs to know how many  instances are there total.  If a new instan=
ce of=20
>the
> > app is started, all  instances need to know the new total number of=20
>instances.
> > If an app is  stopped or if it dies, the remaining apps need to know th=
e new
> > number of  app instances.
> >
> > Also, and this is critical, they need to know  about these additions/re=
movals=20
>of
> > apps right away and they all need to  find out them at the same time.=20
>Basically,
> > all members of some group  need to know *the exact number of members at=
 all
> > times*.
> >
> >  This sounds almost like we need to watch a "parent group znode" and mo=
nitor =20
>the
> > number of its ephemeral children, which represent each app instance  th=
at is
> > watching the "parent groups znode".  Is that right?  If so, then  all I=
'd=20
>need to
> > know is the answer to "How many watchers are watching  this znode?" of =
"How=20
>many
> > kids does this znode have?". And I'd need ZK  to notify all watchers wh=
enever=20
>the
> > answer to this question changes.   Ideally it would send/push the answe=
r=20
(the
> > number of watchers) to all  watchers, but if not, I assume any watcher =
that=20
>is
> > notified about the  change would go poll ZK to get the number of epheme=
ral=20
>kids.
> >
> > I  think the above is essentially what's described on
> >=20
>http://eng.wealthfront.com/2010/01/actually-implementing-group-management.=
html ,
> > but doesn't answer the part that's critical for me (the very first Q  u=
p=20
>above).
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
>=20