From dev-return-49432-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Thu Jun 25 09:20:32 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 67E1C180181 for ; Thu, 25 Jun 2020 11:20:32 +0200 (CEST) Received: (qmail 45083 invoked by uid 500); 25 Jun 2020 09:20:31 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 45071 invoked by uid 99); 25 Jun 2020 09:20:31 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2020 09:20:31 +0000 Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id EB8E140839 for ; Thu, 25 Jun 2020 09:20:30 +0000 (UTC) Received: by mail-lj1-f179.google.com with SMTP id 9so5685383ljc.8 for ; Thu, 25 Jun 2020 02:20:30 -0700 (PDT) X-Gm-Message-State: AOAM5320zHtnEhBGV+SR2Dx8f19+3jTP7v3AJzUWT/PfnVv1VTMqM6xo 5KAjA4L3YYyC74VOhc/jpE6XwzTrGNFCryYW4VAy/Q== X-Google-Smtp-Source: ABdhPJwmOuD7nigk+w+e17IzWdowzxCximvKWuvt/gwmp9turKeft22yjhaihPhYEI35VXqVNT0uXqrQ5ver1bFAFxQ= X-Received: by 2002:a2e:9c97:: with SMTP id x23mr15968251lji.36.1593076829961; Thu, 25 Jun 2020 02:20:29 -0700 (PDT) MIME-Version: 1.0 References: <33b82b5d-0cb8-2b75-1122-14f3f7b5915c@apache.org> In-Reply-To: From: Garren Smith Date: Thu, 25 Jun 2020 11:20:18 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [DISCUSS] New Reduce design for FDB To: CouchDB Developers Content-Type: multipart/alternative; boundary="0000000000008ce8ad05a8e51ac2" --0000000000008ce8ad05a8e51ac2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the example. The new design wouldn't support that. The idea of supporting multiple group_levels for query at once is interesting. On Wed, Jun 24, 2020 at 8:00 PM Joan Touzet wrote: > > > On 2020-06-24 1:32 p.m., Garren Smith wrote: > > On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet wrote: > > > >> Hi Garren, > >> > >> If the "options" field is left out, what is the default behaviour? > >> > > > > All group_levels will be indexed. I imagine this is what most CouchDB > uses > > will want. > > Great! > > > > > > >> Is there no way to specify multiple group_levels to get results that > >> match the original CouchDB behaviour? Your changed behaviour would be > >> acceptable if I could do something like `?group_level=3D2,3,4,5`. > >> > > > > I imagine we could, it would make the code a lot more complex. What is > the > > reason for that? > > I find the fact that we return multiple group_levels for a set > group_level > > very confusing. To me it feels like > > the reason we return extra group_levels is because of how b-tree's work > > rather than it being a useful thing for a user. > > This is the canonical example (and the previous 2-3 slides) > > > https://speakerdeck.com/wohali/10-common-misconceptions-about-apache-couc= hdb?slide=3D25 > > There are ways to do this with your approach, but they'll require > retooling. > > > > > > >> -Joan > >> > >> On 24/06/2020 08:03, Garren Smith wrote: > >>> Quick Note I have a gist markdown version of this that might be easie= r > to > >>> read > >> https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180 > >>> > >>> Hi Everyone, > >>> > >>> The team at Cloudant have been relooking at Reduce indexes for CouchD= B > on > >>> FDB and we want to simply what we had initially planned and change so= me > >> of > >>> the reduce behaviour compared to CouchDB 3.x > >>> > >>> Our initial design was to use a skip list. However this hasn=E2=80=99= t proven > to > >> be > >>> particularly useful approach. It would take very long to update and I > >> can=E2=80=99t > >>> find a good algorithm to query the skip list effectively. > >>> > >>> So instead I would like to propose a much simpler reduce > implementation. > >> I > >>> would like to use this as the base for reduce and we can look at addi= ng > >>> more functionality later if we need to. > >>> > >>> For the new reduce design, instead of creating a skip list, we will > >> instead > >>> create group_level indexes for a key. For example say we have the > >> following > >>> keys we want to add to a reduce index: > >>> > >>> ``` > >>> ([2019, 6, 1] , 1) > >>> ([2019, 6, 20] , 1) > >>> ([2019, 7, 3] , 1) > >>> ``` > >>> > >>> We would then create the following group_level indexes: > >>> > >>> ``` > >>> Level 0: > >>> (null, 3) > >>> > >>> Level=3D1: > >>> ([2019], 3) > >>> > >>> Level 2: > >>> ([2019,6], 2) > >>> ([2019, 7] , 1) > >>> > >>> Level3: > >>> ([2019, 6, 1,] , 1) > >>> ([2019, 6, 20,] , 1) > >>> ([2019, 7, 3,] , 1) > >>> ``` > >>> > >>> All of these group_level indexes would form part of the reduce index > and > >>> would be updated at the same time. We don=E2=80=99t need to know the = actual > >>> `group_levels` ahead of time as we would take any key we need to inde= x > >> look > >>> at its length and add it to the group_levels it would belong to. > >>> > >>> Another nice optimization we can do with this is when a user creates = a > >> view > >>> they can specify the number of group levels to index e.g: > >>> > >>> ``` > >>> { > >>> _id: _design/my-ddoc > >>> views: { > >>> one: { > >>> map: function (doc) {emit(doc.val, 1)}, > >>> reduce: "_sum" > >>> }, > >>> > >>> two: { > >>> map: function (doc) {emit(doc.age, 1)}, > >>> reduce: "_count" > >>> } > >>> }, > >>> > >>> options: {group_levels: [1,3,5]} > >>> } > >>> ``` > >>> This gives the user the ability to trade off index build speed, stora= ge > >>> overhead and performance. > >>> > >>> One caveat of that, for now, is if a user changes the number of > >>> `group_levels` to be indexed, the index is invalidated and we would > have > >> to > >>> build it from scratch again. Later we could look at doing some work > >> around > >>> that so that isn=E2=80=99t the case. > >>> > >>> This design will result in a behaviour change. Previously with reduce > if > >>> you set `group_level=3D2`. It will return all results with > `group_level=3D2` > >>> and below. E.g reduce key/values of the following: > >>> > >>> ``` > >>> # group =3D true > >>> ("key":1,"value":2}, > >>> {"key":2,"value":2}, > >>> {"key":3,"value":2}, > >>> {"key":[1,1],"value":1}, > >>> {"key":[1,2,6],"value":1}, > >>> {"key":[2,1],"value":1}, > >>> {"key":[2,3,6],"value":1}, > >>> {"key":[3,1],"value":1}, > >>> {"key":[3,1,5],"value":1}, > >>> {"key":[3,4,5],"value":1} > >>> ``` > >>> > >>> Then doing a query group_level=3D2 returns: > >>> > >>> ``` > >>> # group_level =3D 2 > >>> {"rows":[ > >>> {"key":1,"value":2}, > >>> {"key":2,"value":2}, > >>> {"key":3,"value":2}, > >>> {"key":[1,1],"value":1}, > >>> {"key":[1,2],"value":1}, > >>> {"key":[2,1],"value":1}, > >>> {"key":[2,3],"value":1}, > >>> {"key":[3,1],"value":2}, > >>> {"key":[3,4],"value":1} > >>> ]} > >>> ``` > >>> > >>> I want to **CHANGE** this behaviour, so if a query specifies > >>> `group_level=3D2` then **only** `group_level=3D2` returns would be > returned. > >>> E.g from the example above the results would be: > >>> > >>> ``` > >>> # group_level =3D 2 > >>> {"rows":[ > >>> {"key":[1,1],"value":1}, > >>> {"key":[1,2],"value":1}, > >>> {"key":[2,1],"value":1}, > >>> {"key":[2,3],"value":1}, > >>> {"key":[3,1],"value":2}, > >>> {"key":[3,4],"value":1} > >>> ]} > >>> ``` > >>> > >>> > >>> ## Group_level=3D0 > >>> `Group_level=3D0` queries would work as follows: > >>> 1. `group_level=3D0` without startkey/endkey and then the group_level= =3D0 > >> index > >>> is used > >>> 2. For a `group_level=3D0` with a startkey/endkey or where > `group_level=3D0` > >> is > >>> not indexed, the query will look for the smallest `group_level` and u= se > >>> that to calculate the `group_level=3D0` result > >>> 3. `group_level=3D0` indexes with a startkey/endkey could timeout and= be > >> slow > >>> in some cases because we having to do quite a lot of aggregation when > >>> reading keys. But I don=E2=80=99t think that is much different from h= ow it is > >> done > >>> now. > >>> > >>> ## Group=3Dtrue > >>> We will always build the `group=3Dtrue` index. > >>> > >>> ## Querying non-indexed group_level > >>> If a query has a `group_level` that is not indexed. We can do two > things > >>> here, CouchDB could use the nearest `group_level` to service the que= ry > >> or > >>> it could return an error that this `group_level` is not available to > >> query. > >>> I would like to make this configurable so that an admin can choose ho= w > >>> reduce indexes behave. > >>> > >>> ## Supported Builtin Reduces > >>> Initially, we would support reduces that can be updated by calculatin= g > a > >>> delta change and applying it to all the group_levels. That means we c= an > >>> support `_sum` and `_count` quite easily. Initially, we won=E2=80=99t= implement > >>> `max` and `min`. However, I would like to add them as soon after. > >>> > >>> I would also later like to add support for `_stats` reducer. The best > >>> option I can think of is breaking up each field in `_stats` into its > own > >>> k/v row in FDB. > >>> > >>> I=E2=80=99m not sure how to handle the `_approx_count_distinct`. At t= he moment > I > >>> don=E2=80=99t understand the algorithm well enough to know if we coul= d support > >> it. > >>> > >>> ## Custom Reduces > >>> Custom reduces will not be supported. This is because it will be tric= ky > >> to > >>> implement with this design and would not be very performant. Later if > >> there > >>> is a demand for this I guess we could look at doing it but with the > >> caveat > >>> that performance will be pretty bad. My view on custom reduces are th= at > >> in > >>> 99% of cases they are a bad choice so I would prefer we don=E2=80=99t= support > >> them. > >>> > >>> ## Replication > >>> Some basic replication rules: > >>> 1. If a design document is replicated across from an old database and > >> does > >>> not have the group_level option set. All `group_levels` will be > >> supported. > >>> > >>> 2. If a design document is replicated across with a custom reduce or = a > >>> reduce we don=E2=80=99t support we will return an error when that red= uce index > is > >>> queried. I=E2=80=99m thinking we would still build the map part of th= e index. > >>> > >>> ## limits > >>> Reduce keys will have the same limits as map keys which are 8KB. > >>> > >>> Hopefully that all makes sense, let me know if I need to explain a > point > >>> further and I would love to hear your thoughts on this. > >>> > >> > > > --0000000000008ce8ad05a8e51ac2--