couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Incremental map/reduce
Date Sat, 31 Jan 2009 21:42:54 GMT
On Sat, Jan 31, 2009 at 1:00 PM, Brian Candler <B.Candler@pobox.com> wrote:
> On Fri, Jan 30, 2009 at 10:32:15AM -0800, Chris Anderson wrote:
>> Once you understand how normal reduce queries (with group=false) work,
>> eg: those that return a single reduction value for whatever key-range
>> you specify, group_level queries are not more complex. Group_level
>> queries are essentially a macro, which run one normal (group=false)
>> reduce query automatically for each interval on a set of intervals as
>> defined by the level.
>
> Ah - it was new to me that map/reduce queries with group=false could run
> over arbitary key intervals:
>
> $ kurl 'http://localhost:5984/maptest/_view/test/sum'
> {"rows":[{"key":null,"value":45}]}
> $ kurl 'http://localhost:5984/maptest/_view/test/sum?startkey="doc3"'
> {"rows":[{"key":null,"value":42}]}
> $ kurl 'http://localhost:5984/maptest/_view/test/sum?startkey="doc3"&endkey="doc5"'
> {"rows":[{"key":null,"value":33}]}
>
> This means that couchdb *must* be performing the reduce part of the query
> on-demand, as opposed to keeping precomputed values stored like the map
> part.
>
> In SQL terms, this is like "count(*)" doing an index scan, rather than
> having the answer precomputed in a materialised view. And suddenly the
> various forms of reduce make much more sense.
>
> However at http://damienkatz.net/2008/02/incremental_map.html it says:
>
> "This requirement of reduce functions allows CouchDB to store off
> intermediated reductions directly into inner nodes of btree indexes, and the
> view index updates and retrievals will have logarithmic cost. It also allows
> the indexes to be spread across machines and reduced at query time with
> logarithmic cost."
>
> Is storing the reductions a planned future feature, rather than describing
> how it works today?

It's how it works today. The reason we see a small cost with each
reduce query is that the intermediate reduction values are cached
according to the btree structure, instead of according to the query
params. So unless your range happens to match exactly the keys
underneath a given inner node (and probably a this point even if it
does) you'll end up running at least one javascript reduction per
reduce query.

Thanks for looking after the wiki!

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message