Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com
 designates 74.125.83.42 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=lHTh8VVkmosOVopnsm+iGelzK9lvmYTziJmj6Th1qq1oWHXqUs9yyp9PYcjwkqiBcU
         FebYFqVVYRnx1H3s0Un8ciOvP3Cqg2xbiOUaE2UO+zuZXHcR3iEjwUNv6LAHijQcApKP
         Q4OnJiexUDCwmYZYm+aW29wlrJ4jQsTAt0de0=
MIME-Version: 1.0
In-Reply-To: <C9477782.4928C%mahadev@yahoo-inc.com>
References: <C9303FBF.F9B0%sl.rash@fb.com>
 <C9477782.4928C%mahadev@yahoo-inc.com>
From: Ted Dunning <ted.dunning@gmail.com>
Date: Mon, 3 Jan 2011 13:29:46 -0800
Message-ID: <AANLkTimDuvR+1Dbg5fVSp338h8fP9KyyS2A=+bF0x0DM@mail.gmail.com>
Subject: Re: performance of watches
To: user@zookeeper.apache.org
Content-Type: multipart/alternative; boundary=0023547c96c94e4da00498f7dbdf

--0023547c96c94e4da00498f7dbdf
Content-Type: text/plain; charset=UTF-8

Btw... this is one of the motives for multi-update.

On Mon, Jan 3, 2011 at 12:54 PM, Mahadev Konar <mahadev@yahoo-inc.com>wrote:

> Sam,
>  I think the approach ted described should have response time of under
> seconds, and I think is probably a more reasonable one for scaling up.
>
> Thanks
> mahadev
>
> On 12/16/10 10:17 PM, "Samuel Rash" <sl.rash@fb.com> wrote:
>
> > Can these approaches respond in under a few seconds? If a traffic source
> > remains unclaimed for even a short while, we have a problem.
> >
> > Also, a host may "shed" traffic manually be releasing a subset of its
> > paths.  In this way, all the other hosts watching only its location does
> > prevent against the herd when it dies, but how do they know when it
> > releases 50/625 traffic buckets?
> >
> > I agree we might be able to make a more intelligent design that trades
> > latency for watch efficiency, but the idea was that we'd use the simplest
> > approach that gave us the lowest latency *if* the throughput of watches
> > from zookeeper was sufficient (and it seems like it is from Mahadev's
> link)
> >
> > Thx,
> > -sr
> >
> > On 12/16/10 9:58 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> >
> >> This really sounds like it might be refactored a bit to decrease the
> >> number
> >> of notifications and reads.
> >>
> >> In particular, it sounds like you have two problems.
> >>
> >> The first is that the 40 hosts need to claim various traffic sources,
> one
> >> per traffic source, many sources per host.  This is well solved by the
> >> standard winner takes all file create idiom.
> >>
> >> The second problem is that other hosts need to know when traffic sources
> >> need claiming.
> >>
> >> I think you might consider an approach to the second problem which has
> >> each
> >> host posting a single ephemeral file containing a list of all of the
> >> sources
> >> it has claimed.  Whenever a host claims a new service, it can update
> this
> >> file.  When a host dies or exits, all the others will wake due to having
> a
> >> watch on the directory containing these ephemerals, will read the
> >> remaining
> >> host/source lists and determine which services are insufficiently
> covered.
> >> There will need to be some care taken about race conditions on this, but
> >> I
> >> think they all go the right way.
> >>
> >> This means that a host dying will cause 40 notifications followed by
> 1600
> >> reads and at most 40 attempts at file creates.   You might even be able
> to
> >> avoid the 1600 reads by having each of the source directories be watched
> >> by
> >> several of the 40 hosts.  Then a host dying would cause just a few
> >> notifications and a few file creates.
> >>
> >> A background process on each node could occasionally scan the service
> >> lists
> >> for each host to make sure nothing drops through the cracks.
> >>
> >> This seems much more moderate than what you describe.
> >>
> >> On Thu, Dec 16, 2010 at 8:23 PM, Samuel Rash <sl.rash@fb.com> wrote:
> >>
> >>> Yea--one host going down should trigger 24k watches.  Each host then
> >>> looks
> >>> at its load and determines which paths to acquire (they represent
> >>> traffic
> >>> flow).  This could result in, at worst, 24k create() attempts
> >>> immediately
> >>> after.
> >>>
> >>> I'll read the docs--Thanks
> >>>
> >>> -sr
> >>>
> >>> On 12/16/10 8:06 PM, "Mahadev Konar" <mahadev@yahoo-inc.com> wrote:
> >>>
> >>>> Hi Sam,
> >>>> Just a clarifiaction, will a host going down fire 625 * 39 watches?
> >>> That
> >>>> is ~ 24000 watches per host being down.
> >>>>
> >>>> You can take a look at
> >>>> http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview about
> >>>> watches and latencies and hw requirements. Please do take a look and
> if
> >>>> it doesn't answer your questions, we should add more documentation.
> >>>>
> >>>> Thanks
> >>>> Mahadev
> >>>>
> >>>> On 12/16/10 7:42 PM, "Samuel Rash" <sl.rash@fb.com> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> I am looking to run about 40 zookeeper clients with the following
> watch
> >>>> properties:
> >>>>
> >>>> 1. Up to 25,000 paths that every host has a watch on (each path has
> one
> >>>> child and the watch is one for that child, an ephemeral node, being
> >>>> removed)
> >>>> 2. An individual host "owns" 625 of these paths in this example; one
> >>> going
> >>>> down will fire 625 watches to the other 39 hosts
> >>>>
> >>>> Is there any limit on the rate at which these watches can be sent off?
> >>>> What's the right size cluster? (3? 5?)  Does it need to be dedicated
> >>> hw?
> >>>>
> >>>> Thanks,
> >>>> Sam
> >>>>
> >>>>
> >>>
> >>>
> >
> >
>
>

--0023547c96c94e4da00498f7dbdf--