Jan, Thanks for replying. Just to be clear, and I'm 90% sure you see my point as it was intended, the functionality I'm asking for I'm pretty sure is outside the Google paper and Damien's papers on map / reduce / combine. In that sense, combine() was a poor choice of words on my part, I just didn't know what else to call it. Maybe aggregate() is better, but essentially the aggregate() function would be the same as reduce() in all likelihood, except the "key" parameter would be invalid and couldn't be used in the function. The reduce function in Damien and Google's papers restrict computation to a single key. My main point is, that is a good thing, but since you have this b-tree index and an associative function that can take its own output as input, why not do both, why not do what reduce does now, but additionally keep applying the function all the way up the tree so that you could extremely rapidly compute arbitrary aggregations of any range of keys. That's why I suggested a new function to do this additional aggregation, since the key itself shouldn't be an argument of the new function. It would be used to compute arbitrary aggregations of already reduced data. An additional example might be this. Assume the b-tree index was two layers, a root node and 3 leaf nodes. The root node would have 10 entries, each of those entries would store the output of this new function applied to all of the children. Querying the aggregate function, for the start and end conditions you'd have to process an aggregate for a fractional b-tree node, but for all of the nodes in between you could use the pre-computed aggregates in the middle, then compute the aggregate of the result. The aggregate function itself would generally be exactly the same as the reduce() function. Continuing with the simplistic hours per day reduction in my original example (but with better data), the b-tree index might look like this. Interior and root node: {child keys are < this value, aggregate for child node, pointer to next node} Leaf node: {key, value} Root: 2008-01-05, 41, Leaf 1 2008-01-08, 24, Leaf 2 null, 43, Leaf 3 Leaf 1: 2008-01-01, 10 2008-01-02, 14 2008-01-03, 14 2008-01-04, 3 Leaf 2: 2008-01-05, 3 2008-01-06, 12 2008-01-07, 9 Leaf 3: 2008-01-08, 1 2008-01-09, 16 2008-01-10, 17 2008-01-11, 9 If we wanted the aggregate of 2008-01-02 to 2008-01-10: aggregate( aggregate(2008-01-02 to 2008-01-04 of Leaf 1) (all of Leaf 2, answer is pre-computed in parent node, i.e. root) aggregate(2008-01-08 to 2008-01-10 of Leaf 3) ) Only the beginning and end of the list would potentially have to be computed on the fly plus a final aggregate for everything. I believe it would be a rapid operation and could be done on the fly, but I could be wrong. The problem with doing this client side is the client doesn't have access to the b-tree index itself. You could work around this by creating multiple map / reduce indexes at different granularities with similar effect (i.e., have a monthly, yearly, and decade level views) but it seems a bit awkward. I realize my suggestion here might shoot down this idea altogether :) but if this can be done in the index itself efficiently, I think there is a real win here. Thanks, -Jason On Wed, Apr 30, 2008 at 4:06 AM, Jan Lehnardt wrote: > > > On Apr 30, 2008, at 04:36, Jason Eggleston wrote: > > > Hello, > > > > I recently tried explaining the power of map / reduce to someone, > > using a simple example of adding up hours spent on a project over a > > period of time. > > > > The output of the mapping function would be: > > > > key, value > > > > 4/28/2008, 1 > > 4/28/2008, 4 > > 4/29/2008, 7 > > 4/29/2008, 3 > > ... > > > > The output of the reduce function would be: > > > > key, value > > > > 4/28/2008, 5 > > 4/29/2008, 10 > > ... > > etc. > > > > Shouldn't I then be able to issue a query for an arbitrary summation > > of some sequential portion of the output of reduce? For example, a > > sum of all of the hours between two dates / keys? It's the same > > reduction function except applied across multiple keys, it could be > > annotated in the b-tree index the same way the reduction per key is > > done now (or will be done). > > > > /db/_view/sum_hours/count?start_combine=4/28/2008&end_combine=4/29/2009 > > > > If it isn't safe to use the reduce function as the combine function, > > it could be explicit: > > > > "views": { > > > > "all": "function (doc) {...snip...}", > > > > "count": { > > > > "map": "function (doc){...snip...}" > > > > "reduce": "function (tag, counts){...snip...}" > > > > "combine": "function (counts){...snip...}" > > > > } > > > > } > > > > } > > > > It seems very powerful to me, if this functionality were available. > > > > If understand Damien's "whitepaper(s)" correctly, the implementation > of our reduce would be functionally equivalent to the combine you > ask for. What I haven't seen is how to run reduce again on its own > output. Damien? > > See http://damienkatz.net/2008/02/incremental_map.html > http://damienkatz.net/2008/02/incremental_map_1.html and the > comments for details. > > > Cheers > Jan > -- > Jaon,