couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: [DISCUSS] Reduce on FDB take 3
Date Fri, 24 Jul 2020 19:00:49 GMT
FWIW, a first pass at views entirely on ebtree turned out to be fairly
straightforward. Almost surprisingly simple in some cases.

https://github.com/apache/couchdb/compare/prototype/fdb-layer...prototype/fdb-layer-ebtree-views

Its currently passing all tests in `couch_views_map_test.erl` but is
failing on other suites. I only did a quick skim on the failures but
they all look superficial around some APIs I changed.

I haven't added the APIs to query reduce functions via HTTP but the
reduce functions are being executed to calculate row counts and KV
sizes. Adding the builtin reduce functions and extending those to user
defined reduce functions should be straightforward.

On Fri, Jul 24, 2020 at 9:39 AM Robert Newson <rnewson@apache.org> wrote:
>
>
> I’m happy to restrict my PR comments to the actual diff, yes. So I’m not +1 yet.
>
> I fixed the spurious conflicts at https://github.com/apache/couchdb/pull/3033.
>
> --
>   Robert Samuel Newson
>   rnewson@apache.org
>
> On Fri, 24 Jul 2020, at 14:59, Garren Smith wrote:
> > Ok so just to confirm, we keep my PR as-is with ebtree only for reduce. We
> > can get that ready to merge into fdb master. We can then use that to battle
> > test ebtree and then look at using it for the map side as well. At that
> > point we would combine the reduce and map index into a ebtree index. Are
> > you happy with that?
> >
> > Cheers
> > Garren
> >
> > On Fri, Jul 24, 2020 at 3:48 PM Robert Newson <rnewson@apache.org> wrote:
> >
> > > Hi,
> > >
> > > It’s not as unknown as you think but certainly we need empirical data to
> > > guide us on the reduce side. I’m also fine with continuing with the
> > > map-only code as it stands today until such time as we demonstrate ebtree
> > > meets or exceeds our needs (and I freely accept the possibility that it
> > > might not).
> > >
> > > I think the principal enhancement to ebtree that would address most
> > > concerns is if it could store the leaf entries vertically (as they are
> > > currently). I have some thoughts which I’ll try to realise as working code.
> > >
> > > I’ve confirmed that I do create spurious conflicts and will have a PR up
> > > today to fix that.
> > >
> > > --
> > >   Robert Samuel Newson
> > >   rnewson@apache.org
> > >
> > > On Fri, 24 Jul 2020, at 12:43, Garren Smith wrote:
> > > > Hi Bob,
> > > >
> > > > Thanks for that explanation, that is really helpful and it is good we
> > > have
> > > > some options but it also does highlight a lot of unknowns. Whereas our
> > > > current map indexes are really simple and we know its behaviour. There
> > > are
> > > > no real unknowns. Adding ebtree here could make map indexes a fair bit
> > > more
> > > > complicated since we don't know the effect of managing different node
> > > > sizes, concurrent doc updates, and querying performance.
> > > >
> > > > Could we do a compromise here, could we look at using ebtree only for
> > > > reduces now? That means that we can have reduce indexes working quite
> > > soon
> > > > on CouchDB on FDB. At the same time, we can work on ebtree and run
> > > > performance tests on ebtree for map indexes. Some interesting tests we
> > > can
> > > > do is see if a user emits KV's near the limits (8KB for keys and 50KB
for
> > > > values) how does ebtree handle that? How it handles being updated in the
> > > > doc update transaction. And general query performance. What do you think?
> > > >
> > > > Cheers
> > > > Garren
> > > >
> > > >
> > > > On Fri, Jul 24, 2020 at 10:06 AM Robert Samuel Newson <
> > > rnewson@apache.org>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > A short preface at this stage is needed I think:
> > > > >
> > > > > My goal with ebtree was to implement a complete and correct b+tree
that
> > > > > also calculated and maintained inner reductions. I consciously chose
> > > not to
> > > > > go further than that before presenting it to the group for wider
debate
> > > > > (indeed, the very debate we're having in this thread).
> > > > >
> > > > > --
> > > > >
> > > > > Ebtree is not single writer, at least not inherently. Two updates
to
> > > the
> > > > > same ebtree should both succeed as long as they don't modify the
same
> > > nodes.
> > > > >
> > > > > Modifying the same node is likely where there's a reduce function,
> > > though
> > > > > only if the reduction value actually changes, as that percolates
up the
> > > > > tree. Where the reduce value does not change and when neither
> > > transaction
> > > > > causes a split, rebalance, or merge that affects the other, they
should
> > > > > both commit without conflict. There is a rich history of optimizations
> > > in
> > > > > this space specifically around btrees. ebtree might cause spurious
> > > > > conflicts today, I will investigate and propose fixes if so (e.g,
I
> > > think I
> > > > > probably call erlfdb:set on nodes that have not changed).
> > > > >
> > > > > I certainly envisaged updating ebtree within the same txn as a doc
> > > update,
> > > > > which is why the first argument to all the public functions can be
> > > either
> > > > > an erlfdb Db or open Tx.
> > > > >
> > > > > Parallelising the initial build is more difficult with ebtree than
> > > > > parallelising the existing couch_views map-only code, though it's
worth
> > > > > noting that ebtree:insert benefits a great deal from batching (multiple
> > > > > calls to :insert in the same transaction). For an offline build (i.e,
> > > an
> > > > > index that the client cannot see until the entire build is complete)
> > > the
> > > > > batch size can be maximised. That is still a serial process in that
> > > there
> > > > > is only one transaction at a time updating the ebtree. I can't say
> > > offhand
> > > > > how fast that is in practice, but it is clearly less powerful than
a
> > > fully
> > > > > parallelisable approach could be.
> > > > >
> > > > > Any parallel build would require a way to divide the database into
> > > > > non-overlapping subsets of emitted keys. This is easy and natural
if
> > > the
> > > > > fdb key is the emitted key, which is the case for the couch_views
> > > map-only
> > > > > code. For ebtree it might be enough to simply grab a large chunk
of
> > > > > documents, perform the map transform, and then issues multiple
> > > transactions
> > > > > on subsets of those.
> > > > >
> > > > > Another common technique for btrees is bulk loading (more or less
> > > > > literally constructing the btree nodes directly from the source,
as
> > > long as
> > > > > you can sort it), which might be an option as well.
> > > > >
> > > > > Parallelising a build _with_ a reduce function seems hard however
we do
> > > > > it. The non-ebtree approach is parallelisable by virtue of paring
down
> > > the
> > > > > reduce functionality itself (only whole key groups, only those
> > > functions
> > > > > that fdb has atomic operations for).
> > > > >
> > > > > I will first of all verify the multi-writer nature of ebtree as it
> > > stands
> > > > > today and make a PR which fixes any spurious conflicts, and then
ponder
> > > > > further  how true parallel builds might be possible.
> > > > >
> > > > >
> > > > > > On 24 Jul 2020, at 07:30, Garren Smith <garren@apache.org>
wrote:
> > > > > >
> > > > > > We haven't spoken much about updates with ebtree.
> > > > > > From my understanding of ebtree it can only do single writer,
is that
> > > > > > correct?. If that is true it means we would not be able to
> > > parallelize
> > > > > the
> > > > > > background building of views to speed up view builds.
> > > > > > The other thing that would mean is that we cannot use it for
mango
> > > where
> > > > > we
> > > > > > update the view in the doc transaction.
> > > > > >
> > > > > > Cheers
> > > > > > Garren
> > > > > >
> > > > > > On Fri, Jul 24, 2020 at 2:35 AM Kyle Snavely <kjsnavely@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> When in doubt, throw us a build at Cloudant with ebtree
maps and
> > > we'll
> > > > > see
> > > > > >> if it comes close to the crazy fast KV map queries.
> > > > > >>
> > > > > >> Kyle
> > > > > >>
> > > > > >> On Thu, Jul 23, 2020, 2:17 PM Robert Samuel Newson <
> > > rnewson@apache.org>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> I (perhaps obviously) don't agree that I'm tying myself
to old
> > > CouchDB
> > > > > or
> > > > > >>> failing to embrace FDB. FDB is a toolkit and does not,
to my mind,
> > > > > force
> > > > > >> us
> > > > > >>> down any particular path.
> > > > > >>>
> > > > > >>> I haven't sat down to modify couch_views in the manner
I've
> > > suggested
> > > > > >>> (where ebtree is used as designed; being fed the output
of the
> > > emit()
> > > > > >> calls
> > > > > >>> and calculating reductions as it does so) but I think
it's a
> > > worthwhile
> > > > > >>> exercise. I'd be surprised if performance of map-only
traversals
> > > would
> > > > > be
> > > > > >>> disappointing but who knows? I also expect it would
allow for
> > > > > significant
> > > > > >>> simplification of the code, which is one of the higher
virtues.
> > > > > >>>
> > > > > >>> Adam, can you describe in a little more detail how you
picture
> > > "b+tree
> > > > > is
> > > > > >>> only used for incremental aggregations,"? It's implied
in your
> > > reply
> > > > > that
> > > > > >>> it would preserve the "interesting property" of keeping
user data
> > > out
> > > > > of
> > > > > >>> FDB Keys (for casual readers: the new native database
encryption,
> > > > > called
> > > > > >>> "aegis", only encrypts the FDB value. It can't encrypt
the key as
> > > this
> > > > > >>> would change the order of keys, which the current code
depends on).
> > > > > Did I
> > > > > >>> misread you?
> > > > > >>>
> > > > > >>> B.
> > > > > >>>
> > > > > >>>> On 23 Jul 2020, at 20:11, Adam Kocoloski <kocolosk@apache.org>
> > > wrote:
> > > > > >>>>
> > > > > >>>> OK thanks for the clarification. As I said I wasn’t
all that
> > > confident
> > > > > >> I
> > > > > >>> understood the design :)
> > > > > >>>>
> > > > > >>>> I like the idea that the b+tree is only used for
incremental
> > > > > >>> aggregations, rather than storing the entire materialized
view,
> > > for the
> > > > > >>> same reasons that Garren stated.
> > > > > >>>>
> > > > > >>>> An index maintained entirely in ebtree has the interesting
> > > property
> > > > > >> that
> > > > > >>> it does not leak any user data into FDB Keys, which
could be
> > > attractive
> > > > > >> for
> > > > > >>> security reasons.
> > > > > >>>>
> > > > > >>>> Adam
> > > > > >>>>
> > > > > >>>>> On Jul 23, 2020, at 1:54 PM, Garren Smith <garren@apache.org>
> > > wrote:
> > > > > >>>>>
> > > > > >>>>> On Thu, Jul 23, 2020 at 6:55 PM Paul Davis <
> > > > > >> paul.joseph.davis@gmail.com
> > > > > >>>>
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>>> I would like to keep ebtree to use just
for the reduce index.
> > > > > >>>>>>
> > > > > >>>>>> Could you expand on your reasoning here,
Garren? I haven't done
> > > any
> > > > > >>>>>> experiments on my own to understand if I'm
missing something
> > > > > >>>>>> important. My initial reaction is to not
diverge too far from
> > > the
> > > > > >>>>>> previous shape of the implementation since
we have a decent
> > > idea of
> > > > > >>>>>> how that behaves already but perhaps you've
seen or measured
> > > > > >> something
> > > > > >>>>>> I'm not thinking of?
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> I think this must have been a misunderstanding
on my part. I
> > > always
> > > > > >>> thought
> > > > > >>>>> of using ebtree to solve reduce and wasn't planning
to use it
> > > for the
> > > > > >>> map
> > > > > >>>>> index.
> > > > > >>>>> I don't like the idea that we have ordered distributed
key/value
> > > > > store
> > > > > >>> and
> > > > > >>>>> then implementing a b-tree on top of that for
indexing.
> > > Especially
> > > > > >>> since we
> > > > > >>>>> know that the current map index is fast,
> > > > > >>>>> easy to use, and follows recommended practices
in the FDB
> > > community
> > > > > on
> > > > > >>> how
> > > > > >>>>> to do indexing. ebtree makes sense for reduce
and is a means to
> > > an
> > > > > end
> > > > > >>> to
> > > > > >>>>> give us CouchDB's reduce api, which is heavily
reliant on a
> > > b-tree,
> > > > > >> with
> > > > > >>>>> CouchDB on FDB. This feels like a step backward
and I worry we
> > > are
> > > > > >> tying
> > > > > >>>>> ourselves heavily to old CouchDB instead of
using the fact that
> > > we
> > > > > >>> moving
> > > > > >>>>> to FDB which then allows us to design new api's
and
> > > functionality.
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >

Mime
View raw message