aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <dmclaugh...@apache.org>
Subject Re: Reducing Failover Time by Eagerly Reading/Replaying Log in Followers
Date Wed, 26 Jul 2017 22:25:29 GMT
One thing we should make clear: we already have a working prototype for
'catch-up' logic in the replicated log built. The next step was to take
this functionality and make use of it in Aurora as a proof-of-concept
before upstreaming it. The main "threads" we're trying to explore are:

1) Reducing unplanned failovers (and API timeouts) due to stop the world GC
pauses.
2) Reducing write unavailability due to write lock contention (e.g. 40s
snapshot times leading to API timeouts every hour)
3) Reducing the cost of a failover by speeding up the leader recovery time.

The proposal here is obviously targeted at (3), whereas my patches for
snapshot deduplication and the snapshot creation proposal were aimed more
at (2). The big idea we had for (1) was moving snapshots (and backups) into
followers, which would obviously require Jordan's proposal here be shipped
first.

It wasn't clear to me how difficult this would be to add to the Scheduler,
so I wanted to make sure we shared our intentions before investing too much
effort, in case there was either some fundamental flaw in the approach or
some easier win.


On Wed, Jul 26, 2017 at 12:03 PM, Bill Farner <wfarner@apache.org> wrote:

> Some (hopefully) constructive criticism:
>
> - the doc is very high-level on the problem statement and the proposal,
> making it difficult to agree with prioritization over cheaper snapshots or
> the oft-discussed support of an external DBMS.
>
> - the supporting data is a single data point of the
> scheduler_log_recover_nanos_total metric.  More data points and more
> detail
> on this data (how many entries/bytes did this represent?) would help
> normalize the metric, and possibly indicate whether recover time is linear
> or non-linear.  Finer-grained information would also help (where was time
> spent within the replay - GC?  reading log entries?  inflating snapshots?).
>
> - the doc calls out parts (1) mesos log support and (2) scheduler support.
> Is the planned approach to gain value from (1) before (2), or are both
> needed?
>
> - for (2) scheduler support, can you add detail on the implementation?
> Much of the scheduler code assumes it is the leader
> (CallOrderEnforcingStorage is currently a gatekeeper to avoid mistakes of
> this type), so i would caution against replaying directly into the main
> Storage.
>
>
> On Wed, Jul 26, 2017 at 1:56 PM, Santhosh Kumar Shanmugham <
> sshanmugham@twitter.com.invalid> wrote:
>
> > +1
> >
> > This sets up the stage for more potential benefits by offloading work
> from
> > the leading scheduler that consumes stable data (that is not affected by
> > minor inconsistencies).
> >
> > On Wed, Jul 26, 2017 at 10:31 AM, David McLaughlin <
> dmclaughlin@apache.org
> > >
> > wrote:
> >
> > > I'm +1 to this approach over my proposal. With the enforced daily
> > failover,
> > > it's a much bigger win to make failovers "cheap" than making snapshots
> > > cheap, and this is going to be backwards compatible too.
> > >
> > > On Wed, Jul 26, 2017 at 9:51 AM, Jordan Ly <jordan.ly8@gmail.com>
> wrote:
> > >
> > > > Hello everyone!
> > > >
> > > > I've created a document with an initial proposal to reduce leader
> > > > failover time by eagerly reading and replaying the replicated log in
> > > > followers:
> > > >
> > > > https://docs.google.com/document/d/10SYOq0ehLMFKQ9rX2TGC_xpM--
> > > > GBnstzMFP-tXGQaVI/edit?usp=sharing
> > > >
> > > > We wanted to open up this topic for discussion with the community and
> > > > see if anyone had any alternate opinions or recommendations before
> > > > starting the work.
> > > >
> > > > If this solution seems reasonable, we will write and release a design
> > > > document for a more formal discussion and review.
> > > >
> > > > Please feel free to comment on the doc, or let me know if you have
> any
> > > > concerns.
> > > >
> > > > -Jordan
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message