aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Proposal: External Update Coordination
Date Thu, 16 Oct 2014 02:49:53 GMT
David - the plan is to synthesize the waiting state.  Exactly how is not
yet certain.

On Wednesday, October 15, 2014, Maxim Khutornenko <maxim@apache.org> wrote:

> It is certainly possible to add new state or a status message but I
> don't think it's a blocker for the first iteration. Provided there is
> enough demand a state/message could be synthesized during the 'get'
> call based on the volatile state.
>
> On Wed, Oct 15, 2014 at 6:36 PM, David McLaughlin <david@dmclaughlin.com
> <javascript:;>> wrote:
> > +1 for pause being explicit RPC pauses, but does it really add complexity
> > to just add a new state (WAITING?) when no heartbeat is sent? Not being
> > able to see that an update was blocked because of a lack of heartbeat
> seems
> > like a missing feature.
> >
> > On Wed, Oct 15, 2014 at 5:12 PM, Maxim Khutornenko <maxim@apache.org
> <javascript:;>> wrote:
> >
> >> +1. Updated the doc:
> >>
> >>
> https://github.com/maxim111333/incubator-aurora/blob/hb_doc/docs/update-heartbeat.md
> >>
> >> On Wed, Oct 15, 2014 at 5:09 PM, Bill Farner <wfarner@apache.org
> <javascript:;>> wrote:
> >> > +1 to the scheduler not proceeding on an update when heartbeats are
> >> absent,
> >> > and requiring the heartbeat service to explicitly call pauseJobUpdate
> >> when
> >> > it detects problems.
> >> >
> >> > -=Bill
> >> >
> >> > On Wed, Oct 15, 2014 at 4:59 PM, Kevin Sweeney
> >> <ksweeney@twitter.com.invalid
> >> >> wrote:
> >> >
> >> >> Chatted with Maxim and Bill, I think we figured it out
> >> >>
> >> >> I think the confusion stems from the fact that there are two types
of
> >> >> pauses in this system, explicit, persisted pauses generated by the
> >> >> pauseJobUpdate RPC and implicit, volatile pauses caused due to the
> >> absence
> >> >> of a sufficiently fresh heartbeat (such as in the case of a network
> >> >> partition).
> >> >>
> >> >> In case a monitoring service detects a problem it should call the
> >> explicit
> >> >> pauseJobUpdate RPC, which will cause a state change that requires an
> >> >> explicit resumeJobUpdate RPC to resume. That feature already exists.
> >> >>
> >> >> But, we need one more thing to make this reliable - heartbeats to
> >> protect
> >> >> against network partitions between the scheduler and the monitoring
> >> >> service. These can be volatile and lightweight - the scheduler just
> >> checks
> >> >> for a sufficiently fresh heartbeat before it performs an update
> action,
> >> and
> >> >> if none is present it simply refuses to perform the action. If the
> >> >> partition heals a new heartbeat will arrive (if the update being
> >> monitored
> >> >> should still be allowed to proceed) and the scheduler will allow the
> >> update
> >> >> to proceed.
> >> >>
> >> >>
> >> >> On Wed, Oct 15, 2014 at 11:56 AM, Bill Farner <wfarner@apache.org
> <javascript:;>>
> >> wrote:
> >> >>
> >> >> > I think we should assess that after building the rest of the
> feature.
> >> >> IIUC
> >> >> > the rest of the code doesn't care if the update is initially
> paused.
> >> >> >
> >> >> > -=Bill
> >> >> >
> >> >> > On Wed, Oct 15, 2014 at 11:50 AM, Maxim Khutornenko <
> maxim@apache.org <javascript:;>
> >> >
> >> >> > wrote:
> >> >> >
> >> >> > > Can we get a consensus here? Looks like the only sticky point
> left
> >> is
> >> >> > > around starting an update in paused vs. non-paused state.
I can
> >> argue
> >> >> > > either way as it's easy to add later if needed.
> >> >> > >
> >> >> > > On Tue, Oct 14, 2014 at 1:03 PM, Bill Farner <wfarner@apache.org
> <javascript:;>>
> >> >> wrote:
> >> >> > > > I'm not arguing against the merits of the approach.
 Just
> feeling
> >> out
> >> >> > > > whether that should be done _after_ the rest of the
heartbeat
> >> >> support.
> >> >> > > > Seems like it can be cleanly added at the end to get
something
> >> usable
> >> >> > > > earlier.
> >> >> > > >
> >> >> > > > -=Bill
> >> >> > > >
> >> >> > > > On Tue, Oct 14, 2014 at 12:38 PM, Kevin Sweeney <
> >> kevints@apache.org <javascript:;>>
> >> >> > > wrote:
> >> >> > > >
> >> >> > > >> I'm +1 for using lack of heartbeats as a uniform
> >> >> unknown-or-unhealthy
> >> >> > > >> signal, and punting on a more complex NACK signal
(which we'd
> >> have
> >> >> to
> >> >> > > >> reliably persist).
> >> >> > > >>
> >> >> > > >> I think the only disagreement in this thread is
whether the
> >> default
> >> >> > > state
> >> >> > > >> for a new update should be running or waiting-for-heartbeat.
I
> >> think
> >> >> > > >> waiting for a heartbeat is not only a more correct
> implementation
> >> >> (no
> >> >> > > risk
> >> >> > > >> of acting after a failover but before the heartbeat
timeout)
> but
> >> >> > > simpler to
> >> >> > > >> implement (initialize the PulseMonitor data structure
as empty
> >> >> rather
> >> >> > > than
> >> >> > > >> with a synthetic heartbeat).
> >> >> > > >>
> >> >> > > >> From an API consumer perspective the sequence is:
> >> >> > > >>
> >> >> > > >> 1. API client sends a startUpdate RPC to the scheduler
> >> >> > > >> 2. API client receives an OK response, then arranges
for
> >> something
> >> >> to
> >> >> > > call
> >> >> > > >> heartbeat with that updateId on some interval
> >> >> > > >> 3. Whatever is supposed to send heartbeats sends
one
> immediately,
> >> >> then
> >> >> > > >> starts sending them on some smaller interval
> >> >> > > >>
> >> >> > > >> Waiting for the first heartbeat ensures that this
sequence has
> >> been
> >> >> > > >> completed successfully, while not waiting for it
only ensure
> that
> >> >> > step 1
> >> >> > > >> has happened.
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> On Tue, Oct 14, 2014 at 12:18 PM, Bill Farner <
> >> wfarner@apache.org <javascript:;>>
> >> >> > > wrote:
> >> >> > > >>
> >> >> > > >> > Wait - simpler solution than what?  We're talking
about not
> >> doing
> >> >> > > either.
> >> >> > > >> >
> >> >> > > >> > -=Bill
> >> >> > > >> >
> >> >> > > >> > On Tue, Oct 14, 2014 at 12:16 PM, Kevin Sweeney
<
> >> >> kevints@apache.org <javascript:;>
> >> >> > >
> >> >> > > >> > wrote:
> >> >> > > >> >
> >> >> > > >> > > I think waiting for the first heartbeat
before taking any
> >> action
> >> >> > is
> >> >> > > the
> >> >> > > >> > > simpler solution here as it allows the
implementation to
> be
> >> >> > entirely
> >> >> > > >> > > soft-state and still catches the bugs
I described.
> >> >> > > >> > >
> >> >> > > >> > > The implementation is just PulseMonitorImpl<UpdateId>
-
> >> >> heartbeat
> >> >> > > calls
> >> >> > > >> > > pulse and mutation operations check isAlive.
I think the
> code
> >> >> > might
> >> >> > > >> > > actually work as-is.
> >> >> > > >> > >
> >> >> > > >> > > On Tue, Oct 14, 2014 at 12:11 PM, Maxim
Khutornenko <
> >> >> > > maxim@apache.org <javascript:;>>
> >> >> > > >> > > wrote:
> >> >> > > >> > >
> >> >> > > >> > > > Pausing update on creation seems
like a logical approach
> >> when
> >> >> > > dealing
> >> >> > > >> > > > with inverted dependency model. I.e.
updater is happy to
> >> act
> >> >> as
> >> >> > > long
> >> >> > > >> > > > as it's greenlighted by the external
signal. It's also
> >> aligned
> >> >> > > with a
> >> >> > > >> > > > failover experience where coordinated
updates are
> >> rehydrated
> >> >> in
> >> >> > > >> paused
> >> >> > > >> > > > state waiting for HB awakening. That
said, I am OK
> punting
> >> it
> >> >> > for
> >> >> > > the
> >> >> > > >> > > > sake of simplicity for now.
> >> >> > > >> > > >
> >> >> > > >> > > > Kevin?
> >> >> > > >> > > >
> >> >> > > >> > > > On Tue, Oct 14, 2014 at 12:05 PM,
Bill Farner <
> >> >> > wfarner@apache.org <javascript:;>
> >> >> > > >
> >> >> > > >> > > wrote:
> >> >> > > >> > > > > If the goal is to reduce complexity
now and add
> features
> >> >> > later,
> >> >> > > why
> >> >> > > >> > not
> >> >> > > >> > > > > nuke both for now - kick off
the update right away,
> and
> >> let
> >> >> > > lack of
> >> >> > > >> > > > > heartbeats serve as a uniform
"unknown or unhealthy"
> >> signal?
> >> >> > > >> > > > >
> >> >> > > >> > > > > -=Bill
> >> >> > > >> > > > >
> >> >> > > >> > > > > On Mon, Oct 13, 2014 at 5:25
PM, Maxim Khutornenko <
> >> >> > > >> maxim@apache.org <javascript:;>
> >> >> > > >> > >
> >> >> > > >> > > > wrote:
> >> >> > > >> > > > >
> >> >> > > >> > > > >> I am still +1 on the idea
to have default paused
> state
> >> on
> >> >> > > >> creation.
> >> >> > > >> > I
> >> >> > > >> > > > >> think we could still differentiate
between initially
> >> paused
> >> >> > and
> >> >> > > >> > timed
> >> >> > > >> > > > >> out states internally by
looking at pause reason.
> It's
> >> >> quite
> >> >> > > >> > different
> >> >> > > >> > > > >> if we want to store explicit
NACK reasons from the
> >> external
> >> >> > > >> service
> >> >> > > >> > > > >> though. That would require
persistence and a bit more
> >> >> > > complicated
> >> >> > > >> > > > >> logic.
> >> >> > > >> > > > >>
> >> >> > > >> > > > >> On Mon, Oct 13, 2014 at
5:15 PM, Kevin Sweeney <
> >> >> > > >> kevints@apache.org <javascript:;>>
> >> >> > > >> > > > wrote:
> >> >> > > >> > > > >> > I like the idea of
implementing this scheduler-side
> >> >> purely
> >> >> > > >> through
> >> >> > > >> > > > >> volatile
> >> >> > > >> > > > >> > state, but the lack
of feedback (generic vs
> specific
> >> >> error
> >> >> > > >> > messages
> >> >> > > >> > > > when
> >> >> > > >> > > > >> an
> >> >> > > >> > > > >> > update is paused) leaves
something to be desired.
> >> Maybe
> >> >> we
> >> >> > > can
> >> >> > > >> > > address
> >> >> > > >> > > > >> that
> >> >> > > >> > > > >> > with a metadata field
in the initial call to
> >> startUpdate
> >> >> > > (with
> >> >> > > >> an
> >> >> > > >> > > > >> optional
> >> >> > > >> > > > >> > link to a page where
one can get more rich
> information
> >> >> > about
> >> >> > > the
> >> >> > > >> > > > state of
> >> >> > > >> > > > >> > the monitor sending/not
sending heartbeats).
> >> >> > > >> > > > >> >
> >> >> > > >> > > > >> > The main drawback is
that we may have to wait a
> >> maximum
> >> >> of
> >> >> > > one
> >> >> > > >> > > > heartbeat
> >> >> > > >> > > > >> > interval to find out
that an update should be
> paused.
> >> >> > > >> > > > >> >
> >> >> > > >> > > > >> > On Mon, Oct 13, 2014
at 4:55 PM, Maxim Khutornenko
> <
> >> >> > > >> > > maxim@apache.org <javascript:;>>
> >> >> > > >> > > > >> wrote:
> >> >> > > >> > > > >> >
> >> >> > > >> > > > >> >> The main reason
I preferred the lack-of-ACK
> approach
> >> >> over
> >> >> > an
> >> >> > > >> > > explicit
> >> >> > > >> > > > >> >> NACK one is simplicity.
As Joshua pointed out
> there
> >> is
> >> >> > more
> >> >> > > >> state
> >> >> > > >> > > to
> >> >> > > >> > > > >> >> handle in that
case. The lack-of-ACK model can be
> >> >> > completely
> >> >> > > >> > > > >> >> implemented in
volatile memory sidestepping the
> >> >> persistent
> >> >> > > >> > storage
> >> >> > > >> > > > >> >> entirely. With
the NACK we would need to reliably
> >> >> persist
> >> >> > > >> > external
> >> >> > > >> > > > >> >> service call reasons
to survive scheduler
> failovers.
> >> >> Not a
> >> >> > > huge
> >> >> > > >> > > > >> >> challenge but something
to keep in mind.
> >> >> > > >> > > > >> >>
> >> >> > > >> > > > >> >> I still think the
simplicity/reliability tradeoff
> is
> >> >> > > acceptable
> >> >> > > >> > > here
> >> >> > > >> > > > >> >> if we rely on external
service to abort
> heartbeats in
> >> >> case
> >> >> > > of a
> >> >> > > >> > > > health
> >> >> > > >> > > > >> >> alert fired. This
can be explicitly documented as
> an
> >> >> > > external
> >> >> > > >> > > > >> >> integration requirement.
However, If the consensus
> >> is to
> >> >> > go
> >> >> > > a
> >> >> > > >> > more
> >> >> > > >> > > > >> >> reliable (though
more complicated) NACK route I am
> >> happy
> >> >> > to
> >> >> > > >> > > > reconsider
> >> >> > > >> > > > >> >> the current proposal.
> >> >> > > >> > > > >> >>
> >> >> > > >> > > > >> >> On Mon, Oct 13,
2014 at 3:50 PM, Joshua Cohen <
> >> >> > > >> > > > jcohen@twopensource.com <javascript:;>>
> >> >> > > >> > > > >> >> wrote:
> >> >> > > >> > > > >> >> > "The heratbeatJobUpdate
RPC serves as an ACK,
> but
> >> we
> >> >> > don't
> >> >> > > >> > have a
> >> >> > > >> > > > >> NACK.
> >> >> > > >> > > > >> >> If
> >> >> > > >> > > > >> >> > we are going
to let lack-of-ACK serve as the
> NACK,
> >> i
> >> >> > don't
> >> >> > > >> > think
> >> >> > > >> > > > it's
> >> >> > > >> > > > >> >> safe
> >> >> > > >> > > > >> >> > to resume
when we receive another ACK.  In other
> >> >> words,
> >> >> > a
> >> >> > > >> > service
> >> >> > > >> > > > >> >> toggling
> >> >> > > >> > > > >> >> > unhealthy
might not be deemed safe to proceed."
> >> >> > > >> > > > >> >> >
> >> >> > > >> > > > >> >> > Lack-of-ACK
is the scenario where connectivity
> >> between
> >> >> > the
> >> >> > > >> > > monitor
> >> >> > > >> > > > and
> >> >> > > >> > > > >> >> the
> >> >> > > >> > > > >> >> > scheduler
is unavailable. Shouldn't the NACK
> >> scenario
> >> >> > > >> > (everything
> >> >> > > >> > > > is
> >> >> > > >> > > > >> not
> >> >> > > >> > > > >> >> > ok!) be handled
by the monitoring service
> >> triggering
> >> >> an
> >> >> > > >> > explicit
> >> >> > > >> > > > >> pause?
> >> >> > > >> > > > >> >> > I.e. section
2 should be updated to say
> "External
> >> >> > service
> >> >> > > >> > detects
> >> >> > > >> > > > >> service
> >> >> > > >> > > > >> >> > health problems
and pauses the update" and
> section
> >> 4
> >> >> > > becomes
> >> >> > > >> > the
> >> >> > > >> > > > >> current
> >> >> > > >> > > > >> >> > section 2
(i.e. "Should a heartbeat not be
> received
> >> >> the
> >> >> > > >> > scheduler
> >> >> > > >> > > > >> pauses
> >> >> > > >> > > > >> >> > the update.").
> >> >> > > >> > > > >> >> >
> >> >> > > >> > > > >> >> > I agree that
it's unsafe to to resume updates
> after
> >> >> > > >> receiving a
> >> >> > > >> > > > >> heartbeat
> >> >> > > >> > > > >> >> > after previously
pausing due to a missed
> >> heartbeat. In
> >> >> > > that
> >> >> > > >> > > > scenario
> >> >> > > >> > > > >> I'd
> >> >> > > >> > > > >> >> > think we'd
want an explicit resumeJobUpdate. If
> the
> >> >> > > scenario
> >> >> > > >> > > we're
> >> >> > > >> > > > >> trying
> >> >> > > >> > > > >> >> > to handle
is *never* received a heartbeat,
> that's a
> >> >> > > separate
> >> >> > > >> > > > matter,
> >> >> > > >> > > > >> in
> >> >> > > >> > > > >> >> > that case
unpausing upon receiving the first
> >> heartbeat
> >> >> > > would
> >> >> > > >> > make
> >> >> > > >> > > > >> sense,
> >> >> > > >> > > > >> >> > but it feels
like that complicates things quite
> a
> >> bit
> >> >> > > (now we
> >> >> > > >> > > need
> >> >> > > >> > > > to
> >> >> > > >> > > > >> >> > differentiate
between heartbeat #1 and hearbeat
> >> #N).
> >> >> > > >> > > > >> >> >
> >> >> > > >> > > > >> >> > On Mon, Oct
13, 2014 at 2:50 PM, Bill Farner <
> >> >> > > >> > wfarner@apache.org <javascript:;>
> >> >> > > >> > > >
> >> >> > > >> > > > >> wrote:
> >> >> > > >> > > > >> >> >
> >> >> > > >> > > > >> >> >> What is
the guidance for deploying while the
> >> >> heartbeat
> >> >> > > >> service
> >> >> > > >> > > is
> >> >> > > >> > > > >> >> broken?
> >> >> > > >> > > > >> >> >> I think
i know the answer, but it's important
> to
> >> >> spell
> >> >> > > out.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> > Create
a new coordinated job update in a
> paused
> >> >> > > >> > > > >> (ROLL_FORWARD_PAUSED)
> >> >> > > >> > > > >> >> >> > state
to avoid any progress until the first
> >> >> heartbeat
> >> >> > > call
> >> >> > > >> > > > arrives.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> I'm not
sold on this being ultimately
> >> beneficial.  In
> >> >> > the
> >> >> > > >> > worst
> >> >> > > >> > > > case,
> >> >> > > >> > > > >> >> >> impact
is still limited by the health check
> >> >> threshold.
> >> >> > > >> Seems
> >> >> > > >> > > like
> >> >> > > >> > > > >> >> >> premature
optimization at best, and an odd one
> if
> >> we
> >> >> > > proceed
> >> >> > > >> > > > without
> >> >> > > >> > > > >> a
> >> >> > > >> > > > >> >> >> 'NACK'
signal via the heartbeatJobUpdate RPC.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> Allow
resuming of the
> paused-due-to-no-heartbeat
> >> >> update
> >> >> > > via
> >> >> > > >> a
> >> >> > > >> > > > >> >> >> > resumeJobUpdate
call.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> Are heartbeats
required while rolling back?  If
> >> so,
> >> >> > that
> >> >> > > >> might
> >> >> > > >> > > > impact
> >> >> > > >> > > > >> >> the
> >> >> > > >> > > > >> >> >> design
here and in other places.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> Allow
resuming of the
> paused-due-to-no-heartbeat
> >> >> update
> >> >> > > via
> >> >> > > >> a
> >> >> > > >> > > > fresh
> >> >> > > >> > > > >> >> >> > heartbeatJobUpdate
call.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> The heratbeatJobUpdate
RPC serves as an ACK,
> but
> >> we
> >> >> > don't
> >> >> > > >> > have a
> >> >> > > >> > > > >> NACK.
> >> >> > > >> > > > >> >> If
> >> >> > > >> > > > >> >> >> we are
going to let lack-of-ACK serve as the
> >> NACK, i
> >> >> > > don't
> >> >> > > >> > think
> >> >> > > >> > > > it's
> >> >> > > >> > > > >> >> safe
> >> >> > > >> > > > >> >> >> to resume
when we receive another ACK.  In
> other
> >> >> > words, a
> >> >> > > >> > > service
> >> >> > > >> > > > >> >> toggling
> >> >> > > >> > > > >> >> >> unhealthy
might not be deemed safe to proceed.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> Perhaps
just sending OK (or a NOOP equivalent)
> in
> >> >> case
> >> >> > > of a
> >> >> > > >> > > > >> user-paused
> >> >> > > >> > > > >> >> job
> >> >> > > >> > > > >> >> >> > update
would make more sense as there is
> nothing
> >> >> > > >> monitoring
> >> >> > > >> > > > service
> >> >> > > >> > > > >> >> could
> >> >> > > >> > > > >> >> >> > do
in that case. This should work fine with
> >> >> > > pause/resume
> >> >> > > >> > > > >> >> -aware/-agnostic
> >> >> > > >> > > > >> >> >> > monitoring
service implementation.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> This seems
reasonable to me - heartbeats for a
> >> paused
> >> >> > > update
> >> >> > > >> > > > should
> >> >> > > >> > > > >> not
> >> >> > > >> > > > >> >> >> pose a
risk, but can be safely ignored.
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> -=Bill
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> On Mon,
Oct 13, 2014 at 12:48 PM, Maxim
> >> Khutornenko <
> >> >> > > >> > > > >> maxim@apache.org <javascript:;>>
> >> >> > > >> > > > >> >> >> wrote:
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >> >> > Agreed.
That would be a logical
> generalization
> >> of
> >> >> the
> >> >> > > post
> >> >> > > >> > > > failover
> >> >> > > >> > > > >> >> >> > behavior.
> >> >> > > >> > > > >> >> >> >
> >> >> > > >> > > > >> >> >> > I
have updated the above document with the
> >> >> following
> >> >> > > >> > changes:
> >> >> > > >> > > > >> >> >> > -
Reply with PAUSED any time a job was
> paused by
> >> >> > user;
> >> >> > > >> > > > >> >> >> > -
Start in paused state by default.
> >> >> > > >> > > > >> >> >> >
> >> >> > > >> > > > >> >> >> > On
Mon, Oct 13, 2014 at 11:32 AM, Kevin
> Sweeney
> >> <
> >> >> > > >> > > > >> kevints@apache.org <javascript:;>>
> >> >> > > >> > > > >> >> >> > wrote:
> >> >> > > >> > > > >> >> >> > >
The doc mentioned that the scheduler will
> >> start
> >> >> an
> >> >> > > >> update
> >> >> > > >> > > > >> subject to
> >> >> > > >> > > > >> >> >> the
> >> >> > > >> > > > >> >> >> > >
heartbeat countdown, and if it doesn't
> >> receive a
> >> >> > > >> heartbeat
> >> >> > > >> > > it
> >> >> > > >> > > > >> will
> >> >> > > >> > > > >> >> >> pause
> >> >> > > >> > > > >> >> >> > >
the update. Why not start with the update
> >> >> > > >> > > > >> >> paused-due-to-no-heartbeat
to
> >> >> > > >> > > > >> >> >> > >
fail-fast any connectivity issues between
> the
> >> >> > service
> >> >> > > >> > > > providing
> >> >> > > >> > > > >> the
> >> >> > > >> > > > >> >> >> > >
heartbeats and the scheduler?
> >> >> > > >> > > > >> >> >> > >
> >> >> > > >> > > > >> >> >> > >
On Fri, Oct 10, 2014 at 12:47 PM, Maxim
> >> >> > Khutornenko <
> >> >> > > >> > > > >> >> maxim@apache.org
<javascript:;>>
> >> >> > > >> > > > >> >> >> > >
wrote:
> >> >> > > >> > > > >> >> >> > >
> >> >> > > >> > > > >> >> >> > >>
Hi all,
> >> >> > > >> > > > >> >> >> > >>
> >> >> > > >> > > > >> >> >> > >>
We are proposing a new feature for the
> >> scheduler
> >> >> > > >> updater,
> >> >> > > >> > > > which
> >> >> > > >> > > > >> you
> >> >> > > >> > > > >> >> >> > >>
may find helpful.
> >> >> > > >> > > > >> >> >> > >>
> >> >> > > >> > > > >> >> >> > >>
I have posed a brief feature summary here:
> >> >> > > >> > > > >> >> >> > >>
> >> >> > > >> > > > >> >> >> > >>
> >> >> > > >> > > > >> >> >> >
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >>
> >> >> > > >> > > > >>
> >> >> > > >> > > >
> >> >> > > >> > >
> >> >> > > >> >
> >> >> > > >>
> >> >> > >
> >> >> >
> >> >>
> >>
> https://github.com/maxim111333/incubator-aurora/blob/hb_doc/docs/update-heartbeat.md
> >> >> > > >> > > > >> >> >> > >>
> >> >> > > >> > > > >> >> >> > >>
Please, reply with your
> >> >> > feedback/concerns/comments.
> >> >> > > >> > > > >> >> >> > >>
> >> >> > > >> > > > >> >> >> > >>
Thanks,
> >> >> > > >> > > > >> >> >> > >>
Maxim
> >> >> > > >> > > > >> >> >> > >>
> >> >> > > >> > > > >> >> >> >
> >> >> > > >> > > > >> >> >>
> >> >> > > >> > > > >> >>
> >> >> > > >> > > > >>
> >> >> > > >> > > >
> >> >> > > >> > >
> >> >> > > >> >
> >> >> > > >>
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Kevin Sweeney
> >> >> @kts
> >> >>
> >>
>


-- 
-=Bill

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message