aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <dmclaugh...@apache.org>
Subject Re: Future of storage in Aurora
Date Thu, 30 Mar 2017 22:53:31 GMT
So it sounds like before we make any decisions around removing the work
done in H2 so far, we should figure out what is remaining to move to
external storage (or if it's even still a goal).

I may still play around with reviving the in-memory stores, but will
separate that work from any goal to remove the H2 layer. Since it's
motivated by performance, I'd verify there is a benefit before submitting
any review.

Thanks all for the feedback.


On Thu, Mar 30, 2017 at 12:08 PM, Bill Farner <wfarnerapache@gmail.com>
wrote:

> Adding some background - there were several motivators to using SQL that
> come to mind:
> a) well-understood transaction isolation guarantees leading to a simpler
> programming model w.r.t. concurrency
> b) ability to offload storage to a separate system (e.g. Postgres) and
> scale it separately
> c) relief of computational burden of performing snapshots and backups due
> to (b)
> d) simpler code and operations model due to (b)
> e) schema backwards compatibility guarantees due to persistence-friendly
> migration-scripts
> f) straightforward normalization to facilitate sharing of
> otherwise-redundant state (I.e. TaskConfig)
>
> The storage overhaul comes with a huge caveat requiring the approach to
> scheduling rounds to change. I concur that the current model is hostile to
> offloaded storage, as ~all state must be read every scheduling round. If
> that cannot be worked around with lazy state or best-effort concurrency
> (I.e. in-memory caching), the approach is indeed flawed.
>
> On Mar 30, 2017, 10:29 AM -0700, Joshua Cohen <jcohen@apache.org>, wrote:
> > My understanding of the H2-backed stores is that at least part of the
> > original rationale behind them was that they were meant to be an interim
> > point on the way to external SQL-backed stores which should theoretically
> > provide significant benefits w.r.t. to GC (obviously unproven, especially
> > at scale).
> >
> > I don't disagree that the H2 stores themselves are problematic (to say
> the
> > least); do we have evidence that returning to memory based stores will be
> > an improvement on that?
> >
> > On Thu, Mar 30, 2017 at 12:16 PM, David McLaughlin <
> dmclaughlin@apache.org
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start a discussion around storage in Aurora.
> > >
> > > I think one of the biggest mistakes we made in migrating our storage
> to H2
> > > was deleting the memory stores as we moved. We made a pretty big bet
> that
> > > we could eventually make H2/relational databases work. I don't think
> that
> > > bet has paid off and that we need to revisit the direction we're
> taking.
> > >
> > > My belief is that the current H2/MyBatis approach is untenable for
> large
> > > production clusters, at least without changing our current
> single-master
> > > architecture. At Twitter we are already having to fight to keep GC
> > > manageable even without DbTaskStore enabled, so I don't see a path
> forward
> > > where we could eventually enable that. So far experiments with H2
> off-heap
> > > storage have provided marginal (if any) gains.
> > >
> > > Would anyone object to restoring the in-memory stores and creating new
> > > implementations for the missing ones (UpdateStore)? I'd even go
> further and
> > > propose that we consider in-memory H2 and MyBatis a failed experiment
> and
> > > we drop that storage layer completely.
> > >
> > > Cheers,
> > > David
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message