couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan Touzet <woh...@apache.org>
Subject Re: [DISCUSS] New Reduce design for FDB
Date Wed, 24 Jun 2020 18:00:31 GMT


On 2020-06-24 1:32 p.m., Garren Smith wrote:
> On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet <wohali@apache.org> wrote:
> 
>> Hi Garren,
>>
>> If the "options" field is left out, what is the default behaviour?
>>
> 
> All group_levels will be indexed. I imagine this is what most CouchDB uses
> will want.

Great!

> 
> 
>> Is there no way to specify multiple group_levels to get results that
>> match the original CouchDB behaviour? Your changed behaviour would be
>> acceptable if I could do something like `?group_level=2,3,4,5`.
>>
> 
> I imagine we could, it would make the code a lot more complex. What is the
> reason for that?
> I find the fact that we return multiple group_levels for a set group_level
> very confusing. To me it feels like
> the reason we return extra group_levels is because of how b-tree's work
> rather than it being a useful thing for a user.

This is the canonical example (and the previous 2-3 slides)

https://speakerdeck.com/wohali/10-common-misconceptions-about-apache-couchdb?slide=25

There are ways to do this with your approach, but they'll require retooling.

> 
> 
>> -Joan
>>
>> On 24/06/2020 08:03, Garren Smith wrote:
>>> Quick Note I have a gist markdown version of this that might be easier to
>>> read
>> https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180
>>>
>>> Hi Everyone,
>>>
>>> The team at Cloudant have been relooking at Reduce indexes for CouchDB on
>>> FDB and we want to simply what we had initially planned and change some
>> of
>>> the reduce behaviour compared to CouchDB 3.x
>>>
>>> Our initial design was to use a skip list. However this hasn’t proven to
>> be
>>> particularly useful approach. It would take very long to update and I
>> can’t
>>> find a good algorithm to query the skip list effectively.
>>>
>>> So instead I would like to propose a much simpler reduce implementation.
>> I
>>> would like to use this as the base for reduce and we can look at adding
>>> more functionality later if we need to.
>>>
>>> For the new reduce design, instead of creating a skip list, we will
>> instead
>>> create group_level indexes for a key. For example say we have the
>> following
>>> keys we want to add to a reduce index:
>>>
>>> ```
>>> ([2019, 6, 1] , 1)
>>> ([2019, 6, 20] , 1)
>>> ([2019, 7, 3] , 1)
>>> ```
>>>
>>> We would then create the following group_level indexes:
>>>
>>> ```
>>> Level 0:
>>> (null, 3)
>>>
>>> Level=1:
>>> ([2019], 3)
>>>
>>> Level 2:
>>> ([2019,6], 2)
>>> ([2019, 7] , 1)
>>>
>>> Level3:
>>> ([2019, 6, 1,] , 1)
>>> ([2019, 6, 20,] , 1)
>>> ([2019, 7, 3,] , 1)
>>> ```
>>>
>>> All of these group_level indexes would form part of the reduce index and
>>> would be updated at the same time. We don’t need to know the actual
>>> `group_levels` ahead of time as we would take any key we need to index
>> look
>>> at its length and add it to the group_levels it would belong to.
>>>
>>> Another nice optimization we can do with this is when a user creates a
>> view
>>> they can specify the number of group levels to index e.g:
>>>
>>> ```
>>> {
>>> _id: _design/my-ddoc
>>>     views: {
>>>        one: {
>>>          map: function (doc) {emit(doc.val, 1)},
>>>          reduce: "_sum"
>>>        },
>>>
>>>        two: {
>>>          map: function (doc) {emit(doc.age, 1)},
>>>             reduce: "_count"
>>>        }
>>>      },
>>>
>>>      options: {group_levels: [1,3,5]}
>>> }
>>> ```
>>> This gives the user the ability to trade off index build speed, storage
>>> overhead and performance.
>>>
>>> One caveat of that, for now, is if a user changes the number of
>>> `group_levels` to be indexed, the index is invalidated and we would have
>> to
>>> build it from scratch again. Later we could look at doing some work
>> around
>>> that so that isn’t the case.
>>>
>>> This design will result in a behaviour change. Previously with reduce if
>>> you set `group_level=2`. It will return all results with `group_level=2`
>>> and below. E.g  reduce key/values of the following:
>>>
>>> ```
>>> # group = true
>>> ("key":1,"value":2},
>>> {"key":2,"value":2},
>>> {"key":3,"value":2},
>>> {"key":[1,1],"value":1},
>>> {"key":[1,2,6],"value":1},
>>> {"key":[2,1],"value":1},
>>> {"key":[2,3,6],"value":1},
>>> {"key":[3,1],"value":1},
>>> {"key":[3,1,5],"value":1},
>>> {"key":[3,4,5],"value":1}
>>> ```
>>>
>>> Then doing a query group_level=2 returns:
>>>
>>> ```
>>> # group_level = 2
>>> {"rows":[
>>> {"key":1,"value":2},
>>> {"key":2,"value":2},
>>> {"key":3,"value":2},
>>> {"key":[1,1],"value":1},
>>> {"key":[1,2],"value":1},
>>> {"key":[2,1],"value":1},
>>> {"key":[2,3],"value":1},
>>> {"key":[3,1],"value":2},
>>> {"key":[3,4],"value":1}
>>> ]}
>>> ```
>>>
>>> I want to **CHANGE** this behaviour, so if a query specifies
>>> `group_level=2` then **only** `group_level=2` returns would be returned.
>>> E.g from the example above the results would be:
>>>
>>> ```
>>> # group_level = 2
>>> {"rows":[
>>> {"key":[1,1],"value":1},
>>> {"key":[1,2],"value":1},
>>> {"key":[2,1],"value":1},
>>> {"key":[2,3],"value":1},
>>> {"key":[3,1],"value":2},
>>> {"key":[3,4],"value":1}
>>> ]}
>>> ```
>>>
>>>
>>> ## Group_level=0
>>> `Group_level=0` queries would work as follows:
>>> 1. `group_level=0` without startkey/endkey and then the group_level=0
>> index
>>> is used
>>> 2. For a `group_level=0` with a startkey/endkey or where `group_level=0`
>> is
>>> not indexed, the query will look for the smallest `group_level` and use
>>> that to calculate the `group_level=0` result
>>> 3. `group_level=0` indexes with a startkey/endkey could timeout and be
>> slow
>>> in some cases because we having to do quite a lot of aggregation when
>>> reading keys. But I don’t think that is much different from how it is
>> done
>>> now.
>>>
>>> ## Group=true
>>> We will always build the `group=true` index.
>>>
>>> ## Querying non-indexed group_level
>>> If a query has a `group_level` that is not indexed. We can do two things
>>> here, CouchDB could use the nearest  `group_level` to service the query
>> or
>>> it could return an error that this `group_level` is not available to
>> query.
>>> I would like to make this configurable so that an admin can choose how
>>> reduce indexes behave.
>>>
>>> ## Supported Builtin Reduces
>>> Initially, we would support reduces that can be updated by calculating a
>>> delta change and applying it to all the group_levels. That means we can
>>> support `_sum` and `_count` quite easily. Initially, we won’t implement
>>> `max` and `min`. However, I would like to add them as soon after.
>>>
>>> I would also later like to add support for `_stats` reducer. The best
>>> option I can think of is breaking up each field in `_stats` into its own
>>> k/v row in FDB.
>>>
>>> I’m not sure how to handle the `_approx_count_distinct`. At the moment I
>>> don’t understand the algorithm well enough to know if we could support
>> it.
>>>
>>> ## Custom Reduces
>>> Custom reduces will not be supported. This is because it will be tricky
>> to
>>> implement with this design and would not be very performant. Later if
>> there
>>> is a demand for this I guess we could look at doing it but with the
>> caveat
>>> that performance will be pretty bad. My view on custom reduces are that
>> in
>>> 99% of cases they are a bad choice so I would prefer we don’t support
>> them.
>>>
>>> ## Replication
>>> Some basic replication rules:
>>> 1. If a design document is replicated across from an old database and
>> does
>>> not have the group_level option set. All `group_levels` will be
>> supported.
>>>
>>> 2. If a design document is replicated across with a custom reduce or a
>>> reduce we don’t support we will return an error when that reduce index is
>>> queried. I’m thinking we would still build the map part of the index.
>>>
>>> ## limits
>>> Reduce keys will have the same limits as map keys which are 8KB.
>>>
>>> Hopefully that all makes sense, let me know if I need to explain a point
>>> further and I would love to hear your thoughts on this.
>>>
>>
> 

Mime
View raw message