aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Future of storage in Aurora
Date Sun, 01 Oct 2017 22:59:06 GMT
I would like to revive this discussion in light of some work i have been
doing around the storage system.  The fruits of the DB storage system will
require a lot of additional effort to reach the beneficial outcomes i laid
out above, and i agree that we should cut our losses.

I plan to introduce patches soon to introduce non-H2 in-memory store
implementations.  *If anyone disagrees with removing the H2 implementations
as well, please chime in here.*

Disclaimer - i may propose an alternative for the persistent storage in the
near future.

On Mon, Apr 3, 2017 at 9:40 AM, Stephan Erb <serb@apache.org> wrote:

> H2 could give us fine granular data access. However, most of our code
> performs massive joins to reconstruct fully hydrated thrift objects.
> Most of the time we are then only interested in very few properties of
> those thrift structs. This applies to internal usage, but also how we
> use the API.
>
> I therefore believe we have to improve and refine our domain model in
> order to significantly improve the storage situation.
>
> I really liked Maxim's proposal from last year, and I think it is worth
> reconsidering: https://docs.google.com/document/d/1myYX3yuofGr8JIzud98x
> Xd5mqgpZ8q_RqKBpSff4-WE/edit
>
> Best regards,
> Stephan
>
> On Thu, 2017-03-30 at 15:53 -0700, David McLaughlin wrote:
> > So it sounds like before we make any decisions around removing the
> > work
> > done in H2 so far, we should figure out what is remaining to move to
> > external storage (or if it's even still a goal).
> >
> > I may still play around with reviving the in-memory stores, but will
> > separate that work from any goal to remove the H2 layer. Since it's
> > motivated by performance, I'd verify there is a benefit before
> > submitting
> > any review.
> >
> > Thanks all for the feedback.
> >
> >
> > On Thu, Mar 30, 2017 at 12:08 PM, Bill Farner <wfarnerapache@gmail.co
> > m>
> > wrote:
> >
> > > Adding some background - there were several motivators to using SQL
> > > that
> > > come to mind:
> > > a) well-understood transaction isolation guarantees leading to a
> > > simpler
> > > programming model w.r.t. concurrency
> > > b) ability to offload storage to a separate system (e.g. Postgres)
> > > and
> > > scale it separately
> > > c) relief of computational burden of performing snapshots and
> > > backups due
> > > to (b)
> > > d) simpler code and operations model due to (b)
> > > e) schema backwards compatibility guarantees due to persistence-
> > > friendly
> > > migration-scripts
> > > f) straightforward normalization to facilitate sharing of
> > > otherwise-redundant state (I.e. TaskConfig)
> > >
> > > The storage overhaul comes with a huge caveat requiring the
> > > approach to
> > > scheduling rounds to change. I concur that the current model is
> > > hostile to
> > > offloaded storage, as ~all state must be read every scheduling
> > > round. If
> > > that cannot be worked around with lazy state or best-effort
> > > concurrency
> > > (I.e. in-memory caching), the approach is indeed flawed.
> > >
> > > On Mar 30, 2017, 10:29 AM -0700, Joshua Cohen <jcohen@apache.org>,
> > > wrote:
> > > > My understanding of the H2-backed stores is that at least part of
> > > > the
> > > > original rationale behind them was that they were meant to be an
> > > > interim
> > > > point on the way to external SQL-backed stores which should
> > > > theoretically
> > > > provide significant benefits w.r.t. to GC (obviously unproven,
> > > > especially
> > > > at scale).
> > > >
> > > > I don't disagree that the H2 stores themselves are problematic
> > > > (to say
> > >
> > > the
> > > > least); do we have evidence that returning to memory based stores
> > > > will be
> > > > an improvement on that?
> > > >
> > > > On Thu, Mar 30, 2017 at 12:16 PM, David McLaughlin <
> > >
> > > dmclaughlin@apache.org
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a discussion around storage in Aurora.
> > > > >
> > > > > I think one of the biggest mistakes we made in migrating our
> > > > > storage
> > >
> > > to H2
> > > > > was deleting the memory stores as we moved. We made a pretty
> > > > > big bet
> > >
> > > that
> > > > > we could eventually make H2/relational databases work. I don't
> > > > > think
> > >
> > > that
> > > > > bet has paid off and that we need to revisit the direction
> > > > > we're
> > >
> > > taking.
> > > > >
> > > > > My belief is that the current H2/MyBatis approach is untenable
> > > > > for
> > >
> > > large
> > > > > production clusters, at least without changing our current
> > >
> > > single-master
> > > > > architecture. At Twitter we are already having to fight to keep
> > > > > GC
> > > > > manageable even without DbTaskStore enabled, so I don't see a
> > > > > path
> > >
> > > forward
> > > > > where we could eventually enable that. So far experiments with
> > > > > H2
> > >
> > > off-heap
> > > > > storage have provided marginal (if any) gains.
> > > > >
> > > > > Would anyone object to restoring the in-memory stores and
> > > > > creating new
> > > > > implementations for the missing ones (UpdateStore)? I'd even go
> > >
> > > further and
> > > > > propose that we consider in-memory H2 and MyBatis a failed
> > > > > experiment
> > >
> > > and
> > > > > we drop that storage layer completely.
> > > > >
> > > > > Cheers,
> > > > > David
> > > > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message