couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garren Smith <gar...@apache.org>
Subject Re: [DISCUSS] Built-in reduce indexes
Date Wed, 23 Oct 2019 13:50:41 GMT
Hi,

I've updated this RFC
https://github.com/apache/couchdb-documentation/pull/441 with an
implementation of the skiplist algorithm using FDB.

Cheers
Garren

On Tue, Sep 24, 2019 at 4:01 PM Garren Smith <garren@apache.org> wrote:

> Hi All,
>
> Digging up this thread again just to let you know I wrote an RFC on
> implementing reduce functions on top of FoundationDB
> https://github.com/apache/couchdb-documentation/pull/441
>
> Cheers
> Garren
>
> On Thu, Apr 25, 2019 at 6:05 PM Adam Kocoloski <kocolosk@apache.org>
> wrote:
>
>> Hiya Garren, thanks for starting this one. A few comments inline:
>>
>> > On Apr 23, 2019, at 11:38 AM, Garren Smith <garren@apache.org> wrote:
>> >
>> > Hi All,
>> >
>> > Following on from the map discussion, I want to start the discussion on
>> > built-in reduce indexes.
>> >
>> > ## Builtin reduces
>> > Builtin reduces are definitely the easier of the two reduce options to
>> > reason about and design. The one factor to keep in mind for reduces is
>> that
>> > we need to be able reduce at different group levels. So a data model for
>> > that would like this:
>> >
>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>> > <group_level>, <group_key>, <_reduce_function_name>} ->
>> <aggregrate_value>}
>>
>> - I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re
>> intending for the actual checksum to go in that element of the tuple,
>> right? I wonder about using a Directory in that element of the tuple;
>> adding the full signature to every key will blow up the index substantially.
>>
>> - I don’t think the second ?VIEWS is needed — you already scoped to
>> ?VIEWS above it.
>>
>> - Why is the <_reduce_function_name> after the <group_key>? If someone
>> defines multiple reduce functions on a single view and then wants to do a
>> range query on the grouped result this will mean reading and discarding the
>> output from the other reduce function. It seems to me that it would make
>> more sense to put this directly after ?REDUCE.
>>
>> - You could consider having specific constants for each builtin reduce
>> function, e.g. ?SUM, instead of the full “_sum” byte string in
>> <_reduce_function_name>, which would save several bytes per key.
>>
>> >
>> > Most of that is similar to the map data model, where it changes is from
>> the
>> > ?REDUCE subspace, we add the group_level (from 1 -> number of keys
>> emitted
>> > in the map function), then the group key used in the reduce, the reduce
>> > function name e.g _sum, _count and then we store the aggregated value as
>> > the FDB value.
>>
>> I think we need a better definition of <group_level> here. It’s not
>> really the number of keys emitted from the map function, it’s the number of
>> elements of an array key emitted by the map function that are used to
>> determine the degree of aggregation.
>>
>> Also, before we even get to group_level we should describe how reduce
>> functions work with keys that are not arrays. In the current API this is
>> group=true (separate aggregations for each set of KV pairs that share the
>> same key) or group=false (one aggregated result for the entire view).
>> Internally in the current codebase we map these two settings to
>> group_level=exact and group_level=0, respectively. Basically, we need a way
>> to represent “exact” in the <group_level>.
>>
>> > ### Index management
>> >
>> > To update the reduce indexes, we will rely on the `id_index` and the
>> > `update_seq` defined in the map discussion. Then to apply changes,  we
>> > calculate the change of an aggregate value for the keys at the highest
>> > group level, then apply that change to all the group levels lower than
>> it
>> > using fdb’s atomic operations [1].
>>
>> This is a really important point and worth calling out more explicitly.
>> One reason most of the builtins are significantly simpler than the custom
>> functions is because we know exactly how the aggregate values change when
>> we get the map output for each updated doc in isolation. We don’t need to
>> go retrieve all the rest of the map output that happens to share the same
>> map key. Moreover, the change to the aggregate value can be applied using
>> atomic operations so we don’t need to do anything special to avoid txn
>> conflicts for views where lots of documents emit the same map key.
>>
>> > ### Reducer functions
>> >
>> > The FDB’s atomic functions support all the built in reduce functions
>> > CouchDB supports. So we can use those as part of our red function. For
>> the
>> > `_stats` reduce function, we will have to split that across multiple key
>> > values. So its data model will have an extra key in it to record what
>> stat
>> > it is for the _stats reducer:
>> >
>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>> > <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>}
->
>> > <aggregrate_value>}
>> >
>> > We do have some problems, with `_approx_count_distinct`  because it does
>> > not support removing keys from the filter. So we have three options:
>> >
>> > 1. We can ignore key removal entirely in the filter since this is just
>> an
>> > estimate
>> > 2.  Implement a real COUNT DISTINCT function, we can do because we’re
>> not
>> > trying to merge results from different local shards
>> > 3. Don’t support it going forward
>>
>> There’s actually a fourth option here, which is to maintain
>> _approx_count_distinct using the future machinery for custom reduce
>> functions, generating a new filter from the raw map output at the lowest
>> level when keys are removed. I don’t have an informed opinion yet about
>> that approach.
>>
>> > ### Group_level=0
>> >
>> > One tricker situation is if a user does a group_level=0 query with a key
>> > range, this would require us to do some client level aggregation. We
>> would
>> > have to get the aggregate values for a `group_level=1` for the supplied
>> key
>> > range and then aggregate those values together.
>>
>> This isn’t unique to group_level=0, it could happen for any
>> group_level!=exact. It can also require descending more than one
>> group_level depending on the precision of the supplied key range; e.g. for
>> a date-based key like [YYYY, MM, DD]; someone could ask for yearly
>> aggregations with group_level=1 but supply a startkey and endkey with
>> specific dates rather than months.
>>
>> Good start! Cheers, Adam
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message