incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Wolff <awo...@gmail.com>
Subject Re: reduce/rereduce confusion
Date Wed, 21 Jan 2009 16:56:53 GMT
Ok great. Thanks for the clarification. So, as far as the implementation
goes, if I ask for a key range from a view with a reduce function, is the
value recomputed for that key range every time, or is it cached somehow? The
detailed answer will probably make my eyes glaze over; I'm just trying to
understand if it's constant time, or it's a complex algorithm but cached, or
whatever.
And then, just to state it explicitly, since reduce can be called with
arbitrary key ranges, is there any meaning to the way the key ranges get
broken up? Are there any guarantees about where the boundaries in the key
ranges will be?

Finally, why isn't my reduce function always run a final time with
rereduce=true to produce final, consolidated output? Does rereduce sometimes
run at query time?

Thanks again,
A

On Tue, Jan 20, 2009 at 9:13 PM, Chris Anderson <jchris@apache.org> wrote:

> On Tue, Jan 20, 2009 at 8:05 PM, Adam Wolff <awolff@gmail.com> wrote:
> > After looking at this more, let me restate. I would totally get all of
> this
> > if the signature of reduce was:reduce: function(key, values, rereduce)
> >
> > What I don't get is: why does reduce get called with an arbitrarily long
> > list of keys? I thought reduce was precisely for reducing all of the
> mapped
> > inputs that are indexed under the *same* key. I think if I can get that,
> the
> > rest will come clear.
>
> The thing that makes CouchDB's reduce different from, say, the Hadoop
> implementation, is that it does not group by key at computation time.
>
> Instead, a reduce function should aim to return a single value for the
> entire view. Eg, 15,346, which could be the total number of posts in
> your view
>
> CouchDB allows you to query for reduction values for any arbitrary key
> range very efficiently. So depending on your key structure, if you
> want the total number of posts by jchris in January, you could ask for
> reduce for all keys between
>
> ["jchris",[2009,0]] and ["jchris",[2009,0,{}]]
>
> and get a result of, say, 14.
>
> For details about specifying start and end keys see
> http://wiki.apache.org/couchdb/View_collation
>
> The group=true and group_level parameters may seem confusing at first,
> but once you understand that they are just macros for running a series
> of reduce queries (where CouchDB will pick key ranges for you), they
> aren't so mysterious.
>
> >
> > Thanks again,
> > A
> >
> > On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <awolff@gmail.com> wrote:
> >
> >> Thanks for the reply!
> >> I'd seen all of this, though I re-read the wikipedia entry carefully.
> >> Damien's blog entries don't appear to match the APIs in the version I'm
> >> running, which is 0.8.1
> >> The wikipedia entry suggests that reduce is called only with values that
> >> match a single key. Using the log() function in CouchDB, I can see
> that's
> >> not the case for its reduce function -- it's called with multiple
> different
> >> keys, though it does appear that the input values are *ordered* by
> matching
> >> keys.
> >>
> >> Anyway, I totally get how re-reduce (or "combine") works in conventional
> >> map/reduce, but I'm hazy on the details w/r/t to CouchDB. I'm starting
> to
> >> understand the answer to #1, but I'm really unclear on #2 (how/why
> rereduce
> >> is run.)
> >>
> >> Thanks again,
> >> A
> >>
> >>
> >> On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T <
> dundeemt@gmail.com>wrote:
> >>
> >>> On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <awolff@gmail.com> wrote:
> >>> > Hi everyone,I'm really excited about CouchDB and I've started playing
> >>> with
> >>> > it. I get all of it, except for reduce, and especially re-reduce.
> >>> >
> >>> > My first question is: how does CouchDB maintain all the separate
> output
> >>> for
> >>> > a given key from the map function? I mean: given a simple reduce that
> >>> just
> >>> > sums results, how does couch maintain separate results for each
> possible
> >>> > key/key range that can be given as input to that view?
> >>> >
> >>> > My second question: when and why does rereduce get called? Is this
> >>> simply to
> >>> > allow the server to chunk the processing, or is there semantic
> meaning
> >>> to
> >>> > it? I had assumed the former -- it's just a way of limiting the size
> of
> >>> the
> >>> > input to the reduce function -- but then this really confused me: if
> I
> >>> log
> >>> > each time my reduce function gets called, I see that the last time
> it's
> >>> > called, it's with rereduce=false. How is this possible? Don't all the
> >>> > results have to be funneled through rereduce to produce a single
> result
> >>> > value?
> >>> >
> >>> > Any help here would be much appreciated. If there's a resource on the
> >>> web I
> >>> > should look at, please send it my way. Thanks!
> >>> >
> >>> > A
> >>> Being that I just went through the learning process on reduce, I'll
> >>> point you here:
> >>> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
> >>> "Reduce Functions"
> >>>
> >>> As a good place to start.
> >>> Also, the mailing list, is an excellent resource.
> >>>
> >>>
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c61B374C7-34D7-45C3-9F8B-F11EFD77303D@apache.org%3e
> >>>
> >>> along with:
> >>> http://en.wikipedia.org/wiki/MapReduce
> >>> http://labs.google.com/papers/mapreduce.html
> >>> and
> >>> http://damienkatz.net/2008/02/incremental_map.html
> >>>
> >>> Regards,
> >>>
> >>> Jeff
> >>>
> >>
> >>
> >
>
>
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message