incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: two view questions: group=true, inverted indices
Date Sun, 07 Feb 2010 23:30:35 GMT
1) it's reduce(key, values, rereduce). The method should be called
with 1 or more values for the same key, which you can then reduce to a
summary value. It's called 'reduce' because the result must be smaller
than the input. Building a result as large as the input (in fact, as
large as the sum of the inputs) isn't really what map/reduce is for.

2) In your example, just remove the reduce method altogether for a
simplistic "lookup by work" index. If you query it with ?key=<word>
then you'll get a lot of rows back, one per document with that work in
it.

I should defend couchdb-lucene a little on principle and just say that
it's fun, perhaps inelegant, but actually quite fast and a more
appropriate means to do full-text search than a couchdb view (which is
why I wrote it).

B.

On Sun, Feb 7, 2010 at 11:15 PM, Harold Cooper <harold@mit.edu> wrote:
> Hi there,
>
> I'm new to CouchDB and have two questions about the use of mapreduce
> in views.
>
> 1.
> As far as I can tell, even when I pass group=true to a view,
> reduce(keys, values) is still passed different keys,
> e.g. keys = [["a", "551a50e574ccd439af28428db2401ab4"],
> ["b", "94d13f9e969786c6d653555a7e94f61e"]].
>
> Isn't the whole point of group=true that this shouldn't happen?
>
>
> 2.
> When I've read about mapreduce before, a classic example use is
> constructing an inverted index. But if I make a view like:
> {
> map: "function(doc) {
>  var words = doc.text.split(' ');
>  for (var i in words) {
>    emit(words[i], [doc._id]);
>  }
> }",
> reduce: "function(keys, values) {
>  // concatenate the lists of docIds together:
>  return Array.prototype.concat.apply([], values);
> }"
> }
> then couchdb complains that the reduce result is growing too fast.
>
> I did read that this is the way things are, but it's too bad because
> it would be a perfect application of mapreduce, and the only other
> text search option I've heard of is couchdb-lucene which doesn't
> sound nearly as fun/elegant.
>
> Is there another way to approach this?
> Should I just not reduce and end up with one row per word-occurrence?
>
> Thanks for any help,
> and sorry if this has been covered before, I did try to search around first.
> --
> Harold
>

Mime
View raw message