incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Constantine Peresypkin <constant...@litestack.com>
Subject Re: Jeff Dean on fast response in an unreliable world
Date Wed, 12 Sep 2012 23:00:08 GMT
You're absolutely correct.
My point was that even less than 2 is sufficient.


On Thu, Sep 13, 2012 at 1:40 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> It isn't a doubling.  It is a power.
>
> If probability of exceeding the SLA is p, then the probability that two
> independent resources will exceed the SLA is p^2.  For three, the
> probability is p^3.
>
> To be concrete, I just did a simulation with a mixture of two log-normal
> distributions.  Using a mixture distribution here is important to emulate
> the long-tailed nature of response time distributions ... it doesn't
> suffice to use normal distributions.
>
> With a long tailed distribution that has a median of 20 ms response, the
> raw distribution has about a 2% chance of having a response > 50ms.  Using
> the lesser of two responses gives a probability of > 50 ms response if
> 0.04%.  Three responses gives a probability of 0.0008%.  For most
> applications, the difference between 2 and 3 replicated queries is nil.
>
> Moreover, if the second query has an artificial delay of a few ms, you get
> nearly the same improvements in probability of meeting the SLA, but you pay
> much lower average cost because you rarely invoke the redundant queries.
>
> So the reason that 2 are used instead of 3 is that 2 helps a lot while 3
> only improves things slightly more.
>
> On Wed, Sep 12, 2012 at 1:01 PM, Constantine Peresypkin <
> pconstantine@gmail.com> wrote:
>
> > If you do a double query you're increasing your chances to success by
> > factor of 2 only.
> > Why not triple or quadruple?
> >
> > On Wed, Sep 12, 2012 at 10:14 PM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > Heavens.... we can easily satisfy both needs.
> > >
> > > Just have a parameter that can be set to 0 (= universal double query)
> or
> > > Integer.MAX_INTEGER to get no backups at all.
> > >
> > > On Wed, Sep 12, 2012 at 11:47 AM, Constantine Peresypkin <
> > > pconstantine@gmail.com> wrote:
> > >
> > > > > The PowerDrill paper also mentions a variant of this where each
> query
> > > > fragment is sent to two machines, and the results for that fragment
> are
> > > > used from whatever machine responds first.
> > > >
> > > >
> > > > To send each query or request twice cluster load will be increased by
> > > 100%.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message