Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 14130 invoked from network); 8 Feb 2010 11:31:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Feb 2010 11:31:38 -0000 Received: (qmail 46515 invoked by uid 500); 8 Feb 2010 11:31:37 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 46423 invoked by uid 500); 8 Feb 2010 11:31:36 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 46408 invoked by uid 99); 8 Feb 2010 11:31:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Feb 2010 11:31:35 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dionne@dionne-associates.com designates 69.89.21.11 as permitted sender) Received: from [69.89.21.11] (HELO outbound-mail-01.bluehost.com) (69.89.21.11) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 08 Feb 2010 11:31:26 +0000 Received: (qmail 17675 invoked by uid 0); 8 Feb 2010 11:31:05 -0000 Received: from unknown (HELO host183.hostmonster.com) (74.220.207.183) by outboundproxy4.bluehost.com with SMTP; 8 Feb 2010 11:31:05 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=dionne-associates.com; h=Received:Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Content-Transfer-Encoding:Message-Id:References:To:X-Mailer:X-Identified-User; b=RRr368Xu0RH4YWl8SHWzKzfUv7nbKVZea4eNNVU8evmWPaDqoMUdYC1pvkX6+7p5I+sFWDwu53NVTM6g5HjMvjIkl+mE5v7aNeVFSmA32TP1odoFitD28uGoEi6u70gY; Received: from adsl-99-33-198-105.dsl.wlfrct.sbcglobal.net ([99.33.198.105] helo=[192.168.1.100]) by host183.hostmonster.com with esmtpa (Exim 4.69) (envelope-from ) id 1NeRpY-0004G6-Sk for user@couchdb.apache.org; Mon, 08 Feb 2010 04:31:05 -0700 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1077) Subject: Re: two view questions: group=true, inverted indices From: Robert Dionne In-Reply-To: Date: Mon, 8 Feb 2010 06:31:04 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1077) X-Identified-User: {2551:host183.hostmonster.com:dionneas:dionne-associates.com} {sentby:smtp auth 99.33.198.105 authed with dionne@dionne-associates.com} On Feb 7, 2010, at 6:29 PM, Paul Davis wrote: > On Sun, Feb 7, 2010 at 6:15 PM, Harold Cooper wrote: >> Hi there, >>=20 >> I'm new to CouchDB and have two questions about the use of mapreduce >> in views. >>=20 >> 1. >> As far as I can tell, even when I pass group=3Dtrue to a view, >> reduce(keys, values) is still passed different keys, >> e.g. keys =3D [["a", "551a50e574ccd439af28428db2401ab4"], >> ["b", "94d13f9e969786c6d653555a7e94f61e"]]. >>=20 >=20 > Even when you query with group=3Dtrue, the ungrouped reduction is = still > calculated. Generally you should be able to just ignore such things. >=20 >> Isn't the whole point of group=3Dtrue that this shouldn't happen? >>=20 >>=20 >> 2. >> When I've read about mapreduce before, a classic example use is >> constructing an inverted index. But if I make a view like: >> { >> map: "function(doc) { >> var words =3D doc.text.split(' '); >> for (var i in words) { >> emit(words[i], [doc._id]); >> } >> }", >> reduce: "function(keys, values) { >> // concatenate the lists of docIds together: >> return Array.prototype.concat.apply([], values); >> }" >> } >> then couchdb complains that the reduce result is growing too fast. >>=20 >> I did read that this is the way things are, but it's too bad because >> it would be a perfect application of mapreduce, and the only other >> text search option I've heard of is couchdb-lucene which doesn't >> sound nearly as fun/elegant. >>=20 >> Is there another way to approach this? >> Should I just not reduce and end up with one row per word-occurrence? >=20 > CouchDB Map/Reduce isn't like Google Map/Reduce. Its much more like > the old school map/reduce pattern that expects to be calculating a > single reduction value. The CouchDB internals make doing things like > inverted indices hard. The 'proper' way would be to do as you say and > return a single row per key with only some intermediary values handled > by reductions. >=20 > Also, while couchdb-lucene may not present near as much fun, its got > quite a bit to it. Full-Text indexing is hard. Many examples show it > as nothing more than an inverted index, but that's hiding 95% of the > knowledge on information retrieval and scoring algorithms that are in > Lucene. And there's the integration with Tika to do things like > attachment indexing. I quite dislike Java but I've come to accept that > there really isn't much competition that's compatible with CouchDB's > document model. >=20 I think it does have challenges and couchdb-lucene offers a good = solution for most use cases, plus it's mature and well known, but at some point, perhaps post 1.0 I think a native FTI implementation will = add a lot of value to CouchDB if only by removing the dependency on Java.=20 > HTH, > Paul Davis >=20 >> Thanks for any help, >> and sorry if this has been covered before, I did try to search around = first. >> -- >> Harold >>=20