zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Using ZK for real-time group membership notification
Date Sat, 19 Mar 2011 15:55:35 GMT
I would recommend using KeptCollections to get a close estimate of the
currently live machines.

When you need to process a document, select a processor at random from the
collection.  Then if that processor manages to tell you that the document
has been accepted and successfully processed, you can move to the next
document.  If you don't get confirmation for any reason (timeout, connection
loss, whatever), you pick another random processor from the collection
(using the kept collection so that you know who is live) and also add the
misbehaving processor to a local or global temporary black-list.

If nothing ever goes down, this does a pretty much perfect job.  Adding a
node also works perfectly.  Taking a node out in an orderly way can also
work seamless if the node first removes itself from the collection of live
processors and then finishes all pending requests plus waits a few seconds
to see if any more requests come through.

In the presence you have a few (inevitable) problems.  For instance, a node
can fail in the tiny window between seeing that the node exists and sending
a request.  That shouldn't be much of a since the connect will fail.
 Another mild problem is that a node may fail in such a way that ZK takes 30
seconds or so to be sure that it is gone.  Again, connection failure will
let this be handled gracefully.

The only serious error here is when you send a document to a processor which
successfully handles the document, commits it downstream but then fails
before reporting the success to you.  This will lead to double commit of the
processed document.

Without a reliable central repository, it is impossible to avoid some error
like this.

On Fri, Mar 18, 2011 at 9:41 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> My app listens to this stream.  Because of the high document ingestion rate
> I
> need N instances of my app to listen to this stream.  So all N apps listen
> and
> they all "get" the same documents, but only 1 app actually processes each
> document -- "if (docID mod N == appID) then process doc" -- the usual
> consistent
> hashing stuff.  I'd like to be able to add and remove apps dynamically and
> have
> the remaining apps realize that "N" has changed.  Similarly, when some app
> instance dies and thus "N" changes, I'd like all the remaining instances to
> know
> about it.
> If my apps don't know the correct "N" then 1/Nth of docs will go
> unprocessed (if
> the app died or was removed) until the remaining apps adjust their local
> value
> of "N".

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message