couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Newson" <rnew...@apache.org>
Subject Re: [DISCUSS] Reduce on FDB take 3
Date Fri, 24 Jul 2020 14:29:44 GMT

I’m happy to restrict my PR comments to the actual diff, yes. So I’m not +1 yet. 

I fixed the spurious conflicts at https://github.com/apache/couchdb/pull/3033. 

-- 
  Robert Samuel Newson
  rnewson@apache.org

On Fri, 24 Jul 2020, at 14:59, Garren Smith wrote:
> Ok so just to confirm, we keep my PR as-is with ebtree only for reduce. We
> can get that ready to merge into fdb master. We can then use that to battle
> test ebtree and then look at using it for the map side as well. At that
> point we would combine the reduce and map index into a ebtree index. Are
> you happy with that?
> 
> Cheers
> Garren
> 
> On Fri, Jul 24, 2020 at 3:48 PM Robert Newson <rnewson@apache.org> wrote:
> 
> > Hi,
> >
> > It’s not as unknown as you think but certainly we need empirical data to
> > guide us on the reduce side. I’m also fine with continuing with the
> > map-only code as it stands today until such time as we demonstrate ebtree
> > meets or exceeds our needs (and I freely accept the possibility that it
> > might not).
> >
> > I think the principal enhancement to ebtree that would address most
> > concerns is if it could store the leaf entries vertically (as they are
> > currently). I have some thoughts which I’ll try to realise as working code.
> >
> > I’ve confirmed that I do create spurious conflicts and will have a PR up
> > today to fix that.
> >
> > --
> >   Robert Samuel Newson
> >   rnewson@apache.org
> >
> > On Fri, 24 Jul 2020, at 12:43, Garren Smith wrote:
> > > Hi Bob,
> > >
> > > Thanks for that explanation, that is really helpful and it is good we
> > have
> > > some options but it also does highlight a lot of unknowns. Whereas our
> > > current map indexes are really simple and we know its behaviour. There
> > are
> > > no real unknowns. Adding ebtree here could make map indexes a fair bit
> > more
> > > complicated since we don't know the effect of managing different node
> > > sizes, concurrent doc updates, and querying performance.
> > >
> > > Could we do a compromise here, could we look at using ebtree only for
> > > reduces now? That means that we can have reduce indexes working quite
> > soon
> > > on CouchDB on FDB. At the same time, we can work on ebtree and run
> > > performance tests on ebtree for map indexes. Some interesting tests we
> > can
> > > do is see if a user emits KV's near the limits (8KB for keys and 50KB for
> > > values) how does ebtree handle that? How it handles being updated in the
> > > doc update transaction. And general query performance. What do you think?
> > >
> > > Cheers
> > > Garren
> > >
> > >
> > > On Fri, Jul 24, 2020 at 10:06 AM Robert Samuel Newson <
> > rnewson@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > A short preface at this stage is needed I think:
> > > >
> > > > My goal with ebtree was to implement a complete and correct b+tree that
> > > > also calculated and maintained inner reductions. I consciously chose
> > not to
> > > > go further than that before presenting it to the group for wider debate
> > > > (indeed, the very debate we're having in this thread).
> > > >
> > > > --
> > > >
> > > > Ebtree is not single writer, at least not inherently. Two updates to
> > the
> > > > same ebtree should both succeed as long as they don't modify the same
> > nodes.
> > > >
> > > > Modifying the same node is likely where there's a reduce function,
> > though
> > > > only if the reduction value actually changes, as that percolates up the
> > > > tree. Where the reduce value does not change and when neither
> > transaction
> > > > causes a split, rebalance, or merge that affects the other, they should
> > > > both commit without conflict. There is a rich history of optimizations
> > in
> > > > this space specifically around btrees. ebtree might cause spurious
> > > > conflicts today, I will investigate and propose fixes if so (e.g, I
> > think I
> > > > probably call erlfdb:set on nodes that have not changed).
> > > >
> > > > I certainly envisaged updating ebtree within the same txn as a doc
> > update,
> > > > which is why the first argument to all the public functions can be
> > either
> > > > an erlfdb Db or open Tx.
> > > >
> > > > Parallelising the initial build is more difficult with ebtree than
> > > > parallelising the existing couch_views map-only code, though it's worth
> > > > noting that ebtree:insert benefits a great deal from batching (multiple
> > > > calls to :insert in the same transaction). For an offline build (i.e,
> > an
> > > > index that the client cannot see until the entire build is complete)
> > the
> > > > batch size can be maximised. That is still a serial process in that
> > there
> > > > is only one transaction at a time updating the ebtree. I can't say
> > offhand
> > > > how fast that is in practice, but it is clearly less powerful than a
> > fully
> > > > parallelisable approach could be.
> > > >
> > > > Any parallel build would require a way to divide the database into
> > > > non-overlapping subsets of emitted keys. This is easy and natural if
> > the
> > > > fdb key is the emitted key, which is the case for the couch_views
> > map-only
> > > > code. For ebtree it might be enough to simply grab a large chunk of
> > > > documents, perform the map transform, and then issues multiple
> > transactions
> > > > on subsets of those.
> > > >
> > > > Another common technique for btrees is bulk loading (more or less
> > > > literally constructing the btree nodes directly from the source, as
> > long as
> > > > you can sort it), which might be an option as well.
> > > >
> > > > Parallelising a build _with_ a reduce function seems hard however we do
> > > > it. The non-ebtree approach is parallelisable by virtue of paring down
> > the
> > > > reduce functionality itself (only whole key groups, only those
> > functions
> > > > that fdb has atomic operations for).
> > > >
> > > > I will first of all verify the multi-writer nature of ebtree as it
> > stands
> > > > today and make a PR which fixes any spurious conflicts, and then ponder
> > > > further  how true parallel builds might be possible.
> > > >
> > > >
> > > > > On 24 Jul 2020, at 07:30, Garren Smith <garren@apache.org>
wrote:
> > > > >
> > > > > We haven't spoken much about updates with ebtree.
> > > > > From my understanding of ebtree it can only do single writer, is
that
> > > > > correct?. If that is true it means we would not be able to
> > parallelize
> > > > the
> > > > > background building of views to speed up view builds.
> > > > > The other thing that would mean is that we cannot use it for mango
> > where
> > > > we
> > > > > update the view in the doc transaction.
> > > > >
> > > > > Cheers
> > > > > Garren
> > > > >
> > > > > On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely <kjsnavely@gmail.com>
> > > > wrote:
> > > > >
> > > > >> When in doubt, throw us a build at Cloudant with ebtree maps
and
> > we'll
> > > > see
> > > > >> if it comes close to the crazy fast KV map queries.
> > > > >>
> > > > >> Kyle
> > > > >>
> > > > >> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson <
> > rnewson@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >>> I (perhaps obviously) don't agree that I'm tying myself to
old
> > CouchDB
> > > > or
> > > > >>> failing to embrace FDB. FDB is a toolkit and does not, to
my mind,
> > > > force
> > > > >> us
> > > > >>> down any particular path.
> > > > >>>
> > > > >>> I haven't sat down to modify couch_views in the manner I've
> > suggested
> > > > >>> (where ebtree is used as designed; being fed the output of
the
> > emit()
> > > > >> calls
> > > > >>> and calculating reductions as it does so) but I think it's
a
> > worthwhile
> > > > >>> exercise. I'd be surprised if performance of map-only traversals
> > would
> > > > be
> > > > >>> disappointing but who knows? I also expect it would allow
for
> > > > significant
> > > > >>> simplification of the code, which is one of the higher virtues.
> > > > >>>
> > > > >>> Adam, can you describe in a little more detail how you picture
> > "b+tree
> > > > is
> > > > >>> only used for incremental aggregations,"? It's implied in
your
> > reply
> > > > that
> > > > >>> it would preserve the "interesting property" of keeping user
data
> > out
> > > > of
> > > > >>> FDB Keys (for casual readers: the new native database encryption,
> > > > called
> > > > >>> "aegis", only encrypts the FDB value. It can't encrypt the
key as
> > this
> > > > >>> would change the order of keys, which the current code depends
on).
> > > > Did I
> > > > >>> misread you?
> > > > >>>
> > > > >>> B.
> > > > >>>
> > > > >>>> On 23 Jul 2020, at 20:11, Adam Kocoloski <kocolosk@apache.org>
> > wrote:
> > > > >>>>
> > > > >>>> OK thanks for the clarification. As I said I wasn’t
all that
> > confident
> > > > >> I
> > > > >>> understood the design :)
> > > > >>>>
> > > > >>>> I like the idea that the b+tree is only used for incremental
> > > > >>> aggregations, rather than storing the entire materialized
view,
> > for the
> > > > >>> same reasons that Garren stated.
> > > > >>>>
> > > > >>>> An index maintained entirely in ebtree has the interesting
> > property
> > > > >> that
> > > > >>> it does not leak any user data into FDB Keys, which could
be
> > attractive
> > > > >> for
> > > > >>> security reasons.
> > > > >>>>
> > > > >>>> Adam
> > > > >>>>
> > > > >>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith <garren@apache.org>
> > wrote:
> > > > >>>>>
> > > > >>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis <
> > > > >> paul.joseph.davis@gmail.com
> > > > >>>>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>>> I would like to keep ebtree to use just for
the reduce index.
> > > > >>>>>>
> > > > >>>>>> Could you expand on your reasoning here, Garren?
I haven't done
> > any
> > > > >>>>>> experiments on my own to understand if I'm missing
something
> > > > >>>>>> important. My initial reaction is to not diverge
too far from
> > the
> > > > >>>>>> previous shape of the implementation since we
have a decent
> > idea of
> > > > >>>>>> how that behaves already but perhaps you've seen
or measured
> > > > >> something
> > > > >>>>>> I'm not thinking of?
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> I think this must have been a misunderstanding on
my part. I
> > always
> > > > >>> thought
> > > > >>>>> of using ebtree to solve reduce and wasn't planning
to use it
> > for the
> > > > >>> map
> > > > >>>>> index.
> > > > >>>>> I don't like the idea that we have ordered distributed
key/value
> > > > store
> > > > >>> and
> > > > >>>>> then implementing a b-tree on top of that for indexing.
> > Especially
> > > > >>> since we
> > > > >>>>> know that the current map index is fast,
> > > > >>>>> easy to use, and follows recommended practices in
the FDB
> > community
> > > > on
> > > > >>> how
> > > > >>>>> to do indexing. ebtree makes sense for reduce and
is a means to
> > an
> > > > end
> > > > >>> to
> > > > >>>>> give us CouchDB's reduce api, which is heavily reliant
on a
> > b-tree,
> > > > >> with
> > > > >>>>> CouchDB on FDB. This feels like a step backward and
I worry we
> > are
> > > > >> tying
> > > > >>>>> ourselves heavily to old CouchDB instead of using
the fact that
> > we
> > > > >>> moving
> > > > >>>>> to FDB which then allows us to design new api's and
> > functionality.
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Mime
View raw message