From user-return-8699-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Sun Feb 07 23:31:07 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 3816 invoked from network); 7 Feb 2010 23:31:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Feb 2010 23:31:06 -0000 Received: (qmail 95039 invoked by uid 500); 7 Feb 2010 23:31:05 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 94958 invoked by uid 500); 7 Feb 2010 23:31:05 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 94948 invoked by uid 99); 7 Feb 2010 23:31:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Feb 2010 23:31:05 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robert.newson@gmail.com designates 209.85.220.226 as permitted sender) Received: from [209.85.220.226] (HELO mail-fx0-f226.google.com) (209.85.220.226) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Feb 2010 23:30:58 +0000 Received: by fxm26 with SMTP id 26so4786392fxm.13 for ; Sun, 07 Feb 2010 15:30:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=UW20+wai42nYigxuSTmCUs8hB+5Iw6NElTLTnIG8PuI=; b=ZU2q9SQVVNmaMywnuzG7IHUve0l8Zkmt7VkULFpM3pruujpab5rfyLcGO8YuVbCv8B 4uQRzAWW5waSSDp5cU7hSI8JNWKHa9UfN60/nS8feQff3XUnELA+H++Z5WYHZeXPNRW2 zUbsVtFUfkzJdhBT7LZItCKEhtsxyuz8/3aFY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=f4ZQlCC/6SFd0sjyD0RZMbuKybNT8Q/Nolfh8HX72Zi1idMWbGOzVZhQk3p4RQCKDN 0fb63u7c+Q3yjSyy/yLAsJ9rB4lJ3UWnbCfGRLAKsC09Q9Y8M/tyxZ0Q74Yb1M2fptmg OuPG0u5VprthV8xxR2mJ2HIugNfotQ+jTMSUI= MIME-Version: 1.0 Received: by 10.103.79.35 with SMTP id g35mr3940049mul.82.1265585435565; Sun, 07 Feb 2010 15:30:35 -0800 (PST) In-Reply-To: References: Date: Sun, 7 Feb 2010 23:30:35 +0000 Message-ID: <46aeb24f1002071530p55d370bem8da4683451697df1@mail.gmail.com> Subject: Re: two view questions: group=true, inverted indices From: Robert Newson To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 1) it's reduce(key, values, rereduce). The method should be called with 1 or more values for the same key, which you can then reduce to a summary value. It's called 'reduce' because the result must be smaller than the input. Building a result as large as the input (in fact, as large as the sum of the inputs) isn't really what map/reduce is for. 2) In your example, just remove the reduce method altogether for a simplistic "lookup by work" index. If you query it with ?key=3D then you'll get a lot of rows back, one per document with that work in it. I should defend couchdb-lucene a little on principle and just say that it's fun, perhaps inelegant, but actually quite fast and a more appropriate means to do full-text search than a couchdb view (which is why I wrote it). B. On Sun, Feb 7, 2010 at 11:15 PM, Harold Cooper wrote: > Hi there, > > I'm new to CouchDB and have two questions about the use of mapreduce > in views. > > 1. > As far as I can tell, even when I pass group=3Dtrue to a view, > reduce(keys, values) is still passed different keys, > e.g. keys =3D [["a", "551a50e574ccd439af28428db2401ab4"], > ["b", "94d13f9e969786c6d653555a7e94f61e"]]. > > Isn't the whole point of group=3Dtrue that this shouldn't happen? > > > 2. > When I've read about mapreduce before, a classic example use is > constructing an inverted index. But if I make a view like: > { > map: "function(doc) { > =A0var words =3D doc.text.split(' '); > =A0for (var i in words) { > =A0 =A0emit(words[i], [doc._id]); > =A0} > }", > reduce: "function(keys, values) { > =A0// concatenate the lists of docIds together: > =A0return Array.prototype.concat.apply([], values); > }" > } > then couchdb complains that the reduce result is growing too fast. > > I did read that this is the way things are, but it's too bad because > it would be a perfect application of mapreduce, and the only other > text search option I've heard of is couchdb-lucene which doesn't > sound nearly as fun/elegant. > > Is there another way to approach this? > Should I just not reduce and end up with one row per word-occurrence? > > Thanks for any help, > and sorry if this has been covered before, I did try to search around fir= st. > -- > Harold >