incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harold Cooper <hrld...@gmail.com>
Subject Re: two view questions: group=true, inverted indices
Date Sun, 07 Feb 2010 23:39:03 GMT
Haha, thanks for the info. I'm sure couchdb-lucene is the best way to go for
full text search; I should've simply said that I think mapreduce can be
"fun" and "elegant" when it fits really well, but I look forward to trying
out couchdb-lucene and I expect I'll enjoy using it as well.

As for question 1, I think Paul's answer is what I was looking for, so now I
understand where those calls were coming from.

Thanks for the quick and helpful replies!
--
H


On Sun, Feb 7, 2010 at 6:30 PM, Robert Newson <robert.newson@gmail.com>wrote:

> 1) it's reduce(key, values, rereduce). The method should be called
> with 1 or more values for the same key, which you can then reduce to a
> summary value. It's called 'reduce' because the result must be smaller
> than the input. Building a result as large as the input (in fact, as
> large as the sum of the inputs) isn't really what map/reduce is for.
>
> 2) In your example, just remove the reduce method altogether for a
> simplistic "lookup by work" index. If you query it with ?key=<word>
> then you'll get a lot of rows back, one per document with that work in
> it.
>
> I should defend couchdb-lucene a little on principle and just say that
> it's fun, perhaps inelegant, but actually quite fast and a more
> appropriate means to do full-text search than a couchdb view (which is
> why I wrote it).
>
> B.
>
> On Sun, Feb 7, 2010 at 11:15 PM, Harold Cooper <harold@mit.edu> wrote:
> > Hi there,
> >
> > I'm new to CouchDB and have two questions about the use of mapreduce
> > in views.
> >
> > 1.
> > As far as I can tell, even when I pass group=true to a view,
> > reduce(keys, values) is still passed different keys,
> > e.g. keys = [["a", "551a50e574ccd439af28428db2401ab4"],
> > ["b", "94d13f9e969786c6d653555a7e94f61e"]].
> >
> > Isn't the whole point of group=true that this shouldn't happen?
> >
> >
> > 2.
> > When I've read about mapreduce before, a classic example use is
> > constructing an inverted index. But if I make a view like:
> > {
> > map: "function(doc) {
> >  var words = doc.text.split(' ');
> >  for (var i in words) {
> >    emit(words[i], [doc._id]);
> >  }
> > }",
> > reduce: "function(keys, values) {
> >  // concatenate the lists of docIds together:
> >  return Array.prototype.concat.apply([], values);
> > }"
> > }
> > then couchdb complains that the reduce result is growing too fast.
> >
> > I did read that this is the way things are, but it's too bad because
> > it would be a perfect application of mapreduce, and the only other
> > text search option I've heard of is couchdb-lucene which doesn't
> > sound nearly as fun/elegant.
> >
> > Is there another way to approach this?
> > Should I just not reduce and end up with one row per word-occurrence?
> >
> > Thanks for any help,
> > and sorry if this has been covered before, I did try to search around
> first.
> > --
> > Harold
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message