couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wout Mertens <wout.mert...@gmail.com>
Subject Re: Proposal: Review DBs
Date Tue, 28 Apr 2009 14:41:11 GMT

On Apr 28, 2009, at 3:46 PM, Adam Kocoloski wrote:

> I'm not an expert on the btree code, but I think this statement
>
>> So whenever a reduce call results in a new value for a b-tree node,  
>> AND that node is the upper node of a subtree that is completely  
>> part of a group key, that group key needs to be marked for  
>> recalculation.
>
>
> misses some updates.  I thought things looked something like
>
>          root
>            |
>     ---------------
>    kp1           kp2
>     |             |
> ---------     ---------
> foo foo bar   bar bar bar
>
> where foo and bar are emitted keys from a map (values not show for  
> the sake of brevity).  kp1 and kp2 hold reduce(foo,foo,bar) and  
> reduce(bar,bar,bar) respectively, and root holds rereduce(kp1,kp2).

That is my understanding as well.

> When we request a view with group=true, Couch is smart enough to use  
> the reduction stored in kp2 directly, but it has to call  
> reduce(foo,foo) and reduce(bar) on the fly for the nodes underneath  
> kp1 (and then rereduce the bar reductions from kp1 and kp2 to get  
> the result for bar).  If I interpreted Wout's statement correctly,  
> it ignores the case where any of the nodes under kp1 change, since  
> kp1 is not "the upper node of a subtree that is completely part of a  
> group key".

In this case, both foos are the upper node of a subtree that is  
completely part of a group key. bar isn't.

Same graph with the group keys indicated:

          root
            |
     ------BAR------
    kp1           kp2
     |             |
--FOO----     ---------
foo foo bar   bar bar bar

So whenever the first bar changes, kp1 needs to be calculated and then  
BAR is marked as needing updating, since kp1 is a top node under BAR.

FOO is only marked for updating when either of the foos change.

> I think the problem of tracking which group keys to update can be  
> made simpler.  We really only need to see the key updates coming out  
> of the incremental map.  Couch knows the lists of keys to add and  
> keys to remove at that point; the set of all unique keys in those  
> two lists (the "changeset") is the set of group keys that would need  
> to be updated in the Review DB. Here I'm using "update" in the  
> general sense of add/remove/change; the way I see it, we could just  
> query the view for all the keys in the changeset.  If the MR view  
> has no results for a given key, that obviously means delete the  
> associated document from the Review DB.

That is a very good idea! It's not as efficient as marking group keys  
like I proposed (maps need not necessarily change reduce values), but  
it's a lot easier to code.

> group_levels are a straightforward extension -- we just check what  
> group_level we'll be using in the Review DB when we calculate the  
> changeset.

Exactly.

So the algorithm becomes:

- When updating a view, keep track of all mentioned keys in the  
previous and current map() output, keep only group_level key parts.

- After updating the reduce() results, for each of the marked group  
keys:
- If a group key gets removed:
- look up doc with key=group key in review db. If exists:
   - delete doc
- If a group key gets added:
- look up doc with key=group key in review db. If exists:
   - set doc.value to the row value
- else
   - create doc with id=group key in string form, key=group key,  
value=value
- If a group key gets updated:
- look up doc with key=group key in review db. If exists:
   - set doc.value to the row value
- else
   - create doc with id=group key in string form, key=group key,  
value=value

Good thinking!

Wout.
Mime
View raw message