From dev-return-49432-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org  Thu Jun 25 09:20:32 2020
Return-Path: <dev-return-49432-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 67E1C180181
	for <archive-asf-public@cust-asf.ponee.io>; Thu, 25 Jun 2020 11:20:32 +0200 (CEST)
Received: (qmail 45083 invoked by uid 500); 25 Jun 2020 09:20:31 -0000
Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@couchdb.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@couchdb.apache.org>
List-Post: <mailto:dev@couchdb.apache.org>
List-Id: <dev.couchdb.apache.org>
Reply-To: dev@couchdb.apache.org
Delivered-To: mailing list dev@couchdb.apache.org
Received: (qmail 45071 invoked by uid 99); 25 Jun 2020 09:20:31 -0000
Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2020 09:20:31 +0000
Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179])
	by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id EB8E140839
	for <dev@couchdb.apache.org>; Thu, 25 Jun 2020 09:20:30 +0000 (UTC)
Received: by mail-lj1-f179.google.com with SMTP id 9so5685383ljc.8
        for <dev@couchdb.apache.org>; Thu, 25 Jun 2020 02:20:30 -0700 (PDT)
X-Gm-Message-State: AOAM5320zHtnEhBGV+SR2Dx8f19+3jTP7v3AJzUWT/PfnVv1VTMqM6xo
	5KAjA4L3YYyC74VOhc/jpE6XwzTrGNFCryYW4VAy/Q==
X-Google-Smtp-Source: ABdhPJwmOuD7nigk+w+e17IzWdowzxCximvKWuvt/gwmp9turKeft22yjhaihPhYEI35VXqVNT0uXqrQ5ver1bFAFxQ=
X-Received: by 2002:a2e:9c97:: with SMTP id x23mr15968251lji.36.1593076829961;
 Thu, 25 Jun 2020 02:20:29 -0700 (PDT)
MIME-Version: 1.0
References: <CAOFTT0yp9o6uXzog6N7ba3sm+Bu-ni+80WY=-X2DPduJiBZsyQ@mail.gmail.com>
 <33b82b5d-0cb8-2b75-1122-14f3f7b5915c@apache.org> <CAOFTT0zcRntyuRESRKq3akreEag+BdCz7aD=dFbYch8Z-xvMuQ@mail.gmail.com>
 <d82545af-22cf-0a87-0473-733a69861b7b@apache.org>
In-Reply-To: <d82545af-22cf-0a87-0473-733a69861b7b@apache.org>
From: Garren Smith <garren@apache.org>
Date: Thu, 25 Jun 2020 11:20:18 +0200
X-Gmail-Original-Message-ID: <CAOFTT0wn=u1gLagRNm4+3nSMfS1_NGjOT5hEvroK+O7w8XfQww@mail.gmail.com>
Message-ID: <CAOFTT0wn=u1gLagRNm4+3nSMfS1_NGjOT5hEvroK+O7w8XfQww@mail.gmail.com>
Subject: Re: [DISCUSS] New Reduce design for FDB
To: CouchDB Developers <dev@couchdb.apache.org>
Content-Type: multipart/alternative; boundary="0000000000008ce8ad05a8e51ac2"

--0000000000008ce8ad05a8e51ac2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Thanks for the example. The new design wouldn't support that. The idea of
supporting multiple group_levels for query at once is interesting.

On Wed, Jun 24, 2020 at 8:00 PM Joan Touzet <wohali@apache.org> wrote:

>
>
> On 2020-06-24 1:32 p.m., Garren Smith wrote:
> > On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet <wohali@apache.org> wrote:
> >
> >> Hi Garren,
> >>
> >> If the "options" field is left out, what is the default behaviour?
> >>
> >
> > All group_levels will be indexed. I imagine this is what most CouchDB
> uses
> > will want.
>
> Great!
>
> >
> >
> >> Is there no way to specify multiple group_levels to get results that
> >> match the original CouchDB behaviour? Your changed behaviour would be
> >> acceptable if I could do something like `?group_level=3D2,3,4,5`.
> >>
> >
> > I imagine we could, it would make the code a lot more complex. What is
> the
> > reason for that?
> > I find the fact that we return multiple group_levels for a set
> group_level
> > very confusing. To me it feels like
> > the reason we return extra group_levels is because of how b-tree's work
> > rather than it being a useful thing for a user.
>
> This is the canonical example (and the previous 2-3 slides)
>
>
> https://speakerdeck.com/wohali/10-common-misconceptions-about-apache-couc=
hdb?slide=3D25
>
> There are ways to do this with your approach, but they'll require
> retooling.
>
> >
> >
> >> -Joan
> >>
> >> On 24/06/2020 08:03, Garren Smith wrote:
> >>> Quick Note I have a gist markdown version of this that might be easie=
r
> to
> >>> read
> >> https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180
> >>>
> >>> Hi Everyone,
> >>>
> >>> The team at Cloudant have been relooking at Reduce indexes for CouchD=
B
> on
> >>> FDB and we want to simply what we had initially planned and change so=
me
> >> of
> >>> the reduce behaviour compared to CouchDB 3.x
> >>>
> >>> Our initial design was to use a skip list. However this hasn=E2=80=99=
t proven
> to
> >> be
> >>> particularly useful approach. It would take very long to update and I
> >> can=E2=80=99t
> >>> find a good algorithm to query the skip list effectively.
> >>>
> >>> So instead I would like to propose a much simpler reduce
> implementation.
> >> I
> >>> would like to use this as the base for reduce and we can look at addi=
ng
> >>> more functionality later if we need to.
> >>>
> >>> For the new reduce design, instead of creating a skip list, we will
> >> instead
> >>> create group_level indexes for a key. For example say we have the
> >> following
> >>> keys we want to add to a reduce index:
> >>>
> >>> ```
> >>> ([2019, 6, 1] , 1)
> >>> ([2019, 6, 20] , 1)
> >>> ([2019, 7, 3] , 1)
> >>> ```
> >>>
> >>> We would then create the following group_level indexes:
> >>>
> >>> ```
> >>> Level 0:
> >>> (null, 3)
> >>>
> >>> Level=3D1:
> >>> ([2019], 3)
> >>>
> >>> Level 2:
> >>> ([2019,6], 2)
> >>> ([2019, 7] , 1)
> >>>
> >>> Level3:
> >>> ([2019, 6, 1,] , 1)
> >>> ([2019, 6, 20,] , 1)
> >>> ([2019, 7, 3,] , 1)
> >>> ```
> >>>
> >>> All of these group_level indexes would form part of the reduce index
> and
> >>> would be updated at the same time. We don=E2=80=99t need to know the =
actual
> >>> `group_levels` ahead of time as we would take any key we need to inde=
x
> >> look
> >>> at its length and add it to the group_levels it would belong to.
> >>>
> >>> Another nice optimization we can do with this is when a user creates =
a
> >> view
> >>> they can specify the number of group levels to index e.g:
> >>>
> >>> ```
> >>> {
> >>> _id: _design/my-ddoc
> >>>     views: {
> >>>        one: {
> >>>          map: function (doc) {emit(doc.val, 1)},
> >>>          reduce: "_sum"
> >>>        },
> >>>
> >>>        two: {
> >>>          map: function (doc) {emit(doc.age, 1)},
> >>>             reduce: "_count"
> >>>        }
> >>>      },
> >>>
> >>>      options: {group_levels: [1,3,5]}
> >>> }
> >>> ```
> >>> This gives the user the ability to trade off index build speed, stora=
ge
> >>> overhead and performance.
> >>>
> >>> One caveat of that, for now, is if a user changes the number of
> >>> `group_levels` to be indexed, the index is invalidated and we would
> have
> >> to
> >>> build it from scratch again. Later we could look at doing some work
> >> around
> >>> that so that isn=E2=80=99t the case.
> >>>
> >>> This design will result in a behaviour change. Previously with reduce
> if
> >>> you set `group_level=3D2`. It will return all results with
> `group_level=3D2`
> >>> and below. E.g  reduce key/values of the following:
> >>>
> >>> ```
> >>> # group =3D true
> >>> ("key":1,"value":2},
> >>> {"key":2,"value":2},
> >>> {"key":3,"value":2},
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2,6],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3,6],"value":1},
> >>> {"key":[3,1],"value":1},
> >>> {"key":[3,1,5],"value":1},
> >>> {"key":[3,4,5],"value":1}
> >>> ```
> >>>
> >>> Then doing a query group_level=3D2 returns:
> >>>
> >>> ```
> >>> # group_level =3D 2
> >>> {"rows":[
> >>> {"key":1,"value":2},
> >>> {"key":2,"value":2},
> >>> {"key":3,"value":2},
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3],"value":1},
> >>> {"key":[3,1],"value":2},
> >>> {"key":[3,4],"value":1}
> >>> ]}
> >>> ```
> >>>
> >>> I want to **CHANGE** this behaviour, so if a query specifies
> >>> `group_level=3D2` then **only** `group_level=3D2` returns would be
> returned.
> >>> E.g from the example above the results would be:
> >>>
> >>> ```
> >>> # group_level =3D 2
> >>> {"rows":[
> >>> {"key":[1,1],"value":1},
> >>> {"key":[1,2],"value":1},
> >>> {"key":[2,1],"value":1},
> >>> {"key":[2,3],"value":1},
> >>> {"key":[3,1],"value":2},
> >>> {"key":[3,4],"value":1}
> >>> ]}
> >>> ```
> >>>
> >>>
> >>> ## Group_level=3D0
> >>> `Group_level=3D0` queries would work as follows:
> >>> 1. `group_level=3D0` without startkey/endkey and then the group_level=
=3D0
> >> index
> >>> is used
> >>> 2. For a `group_level=3D0` with a startkey/endkey or where
> `group_level=3D0`
> >> is
> >>> not indexed, the query will look for the smallest `group_level` and u=
se
> >>> that to calculate the `group_level=3D0` result
> >>> 3. `group_level=3D0` indexes with a startkey/endkey could timeout and=
 be
> >> slow
> >>> in some cases because we having to do quite a lot of aggregation when
> >>> reading keys. But I don=E2=80=99t think that is much different from h=
ow it is
> >> done
> >>> now.
> >>>
> >>> ## Group=3Dtrue
> >>> We will always build the `group=3Dtrue` index.
> >>>
> >>> ## Querying non-indexed group_level
> >>> If a query has a `group_level` that is not indexed. We can do two
> things
> >>> here, CouchDB could use the nearest  `group_level` to service the que=
ry
> >> or
> >>> it could return an error that this `group_level` is not available to
> >> query.
> >>> I would like to make this configurable so that an admin can choose ho=
w
> >>> reduce indexes behave.
> >>>
> >>> ## Supported Builtin Reduces
> >>> Initially, we would support reduces that can be updated by calculatin=
g
> a
> >>> delta change and applying it to all the group_levels. That means we c=
an
> >>> support `_sum` and `_count` quite easily. Initially, we won=E2=80=99t=
 implement
> >>> `max` and `min`. However, I would like to add them as soon after.
> >>>
> >>> I would also later like to add support for `_stats` reducer. The best
> >>> option I can think of is breaking up each field in `_stats` into its
> own
> >>> k/v row in FDB.
> >>>
> >>> I=E2=80=99m not sure how to handle the `_approx_count_distinct`. At t=
he moment
> I
> >>> don=E2=80=99t understand the algorithm well enough to know if we coul=
d support
> >> it.
> >>>
> >>> ## Custom Reduces
> >>> Custom reduces will not be supported. This is because it will be tric=
ky
> >> to
> >>> implement with this design and would not be very performant. Later if
> >> there
> >>> is a demand for this I guess we could look at doing it but with the
> >> caveat
> >>> that performance will be pretty bad. My view on custom reduces are th=
at
> >> in
> >>> 99% of cases they are a bad choice so I would prefer we don=E2=80=99t=
 support
> >> them.
> >>>
> >>> ## Replication
> >>> Some basic replication rules:
> >>> 1. If a design document is replicated across from an old database and
> >> does
> >>> not have the group_level option set. All `group_levels` will be
> >> supported.
> >>>
> >>> 2. If a design document is replicated across with a custom reduce or =
a
> >>> reduce we don=E2=80=99t support we will return an error when that red=
uce index
> is
> >>> queried. I=E2=80=99m thinking we would still build the map part of th=
e index.
> >>>
> >>> ## limits
> >>> Reduce keys will have the same limits as map keys which are 8KB.
> >>>
> >>> Hopefully that all makes sense, let me know if I need to explain a
> point
> >>> further and I would love to hear your thoughts on this.
> >>>
> >>
> >
>

--0000000000008ce8ad05a8e51ac2--