From user-return-14038-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Thu Dec 02 11:47:07 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 80264 invoked from network); 2 Dec 2010 11:47:07 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Dec 2010 11:47:07 -0000 Received: (qmail 39410 invoked by uid 500); 2 Dec 2010 11:47:05 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 39271 invoked by uid 500); 2 Dec 2010 11:47:05 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 39262 invoked by uid 99); 2 Dec 2010 11:47:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 11:47:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of robert.newson@gmail.com designates 209.85.216.52 as permitted sender) Received: from [209.85.216.52] (HELO mail-qw0-f52.google.com) (209.85.216.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 11:46:58 +0000 Received: by qwe4 with SMTP id 4so2571678qwe.11 for ; Thu, 02 Dec 2010 03:46:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=X3oFdfD8QPQ5OKa+t4sTNTWh3uUvvdcl9op3TiPGdrU=; b=B8bW/rUbEqjcwtbPHwvEeEX5TPWWu7BcTUq5N5DwVEF3lfrxDPFtOYmsWyg+ugcSGX lQqqEhC83azvGIkVYJjgk1x53vpB+7CaokU3cBmrroaSuEtgt1ES6cXMAozV66IoBKFI 4Jh1W5QK+hZfTE1GLOY8l5P8WY4UeEKz++iKs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=SnSyom+aewWse70RINCEkp0Xk9ED0nj0HYJ+XoSUmpx+4TXj2bBTbyI8wVr5Ob9nHB pwT/VOHst/86r3WfUNsA3g89It6OOQXj+n0983cJ+YgdWVYuFC+WgcBdqeGjZBN54Wij ZhuYe/PzSBuKr6WYBKzIDPRd+5fbldcXAAQyI= MIME-Version: 1.0 Received: by 10.224.37.78 with SMTP id w14mr8984259qad.295.1291290397396; Thu, 02 Dec 2010 03:46:37 -0800 (PST) Received: by 10.220.176.137 with HTTP; Thu, 2 Dec 2010 03:46:37 -0800 (PST) In-Reply-To: References: Date: Thu, 2 Dec 2010 11:46:37 +0000 Message-ID: Subject: Re: Map/Reduce Question From: Robert Newson To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org The simplest means to dedupe this is; function(keys, values, rereduce) { return values[0]; } This assumes that all values for the same key are identical, but I think that's what you're saying. B. On Thu, Dec 2, 2010 at 12:43 AM, Matthew Woodward w= rote: > I'm catching on to the map bit of map/reduce decently, but now that I nee= d > to reduce something I'm having some issues, so I'm hoping someone can ste= er > me in the right direction. > > I have a view that outputs a key, and then an array as the value using th= e > following map function: > { > =A0 =A0"viewname": { > =A0 =A0 =A0 =A0"map":"function(doc) { > =A0 =A0 =A0 =A0 =A0 =A0var value; > =A0 =A0 =A0 =A0 =A0 =A0if (doc.foo !=3D '' && (doc.bar !=3D '' || doc.baz= !=3D '')) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0value =3D [doc.bar, doc.baz]; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0emit(doc.foo, value); > =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0}" > =A0 =A0} > } > > To explain that a bit--basically in my documents foo (which is my key for > this map function) might be a zero-length string, in which case I don't w= ant > to output that document's information. Additionally if bar and baz are bo= th > zero-length strings I don't want the document included in that case eithe= r, > but if either bar or baz has a value then I want to include it. And then = my > value is an array of bar and baz. This is all working great. > > The issue is that I have numerous duplicates in my output, i.e. where foo= , > bar, and baz have the same values as another document. This is to be > expected in the documents themselves so there's no issue with the data. > > For the purposes of this view, however, I only want to output unique resu= lts > for each value of foo (my key). > > To use a concrete example, let's say currently using the map function abo= ve > I'm getting this output: > {"total_rows":3800,"offset":0,"rows":[ > {"id":"guid1","key":"key1","value":["value1", "value2"]}, > {"id":"guid2","key":"key1","value":["value1", "value2"]}, > {"id":"guid3","key":"key2","value":["value1", "value2"]}, > {"id":"guid4","key":"key2","value":["value1", "value2"]}, > ... etc. ... > ]} > > What I need to wind up with is this: > {"total_rows":3800,"offset":0,"rows":[ > {"id":"guid1","key":"key1","value":["value1", "value2"]}, > {"id":"guid3","key":"key2","value":["value1", "value2"]}, > ... etc. ... > ]} > > In other words, if the key and value are identical across records I want = to > only output one result, but if the value is the same as another document > *but the key is different*, then I do want to include it in the output. H= ope > I'm explaining that clearly. > > Happy to clarify further, and really appreciate any suggestions anyone ha= s. > > Thanks! > -- > Matthew Woodward > matt@mattwoodward.com > http://blog.mattwoodward.com > identi.ca / Twitter: @mpwoodward > > Please do not send me proprietary file formats such as Word, PowerPoint, > etc. as attachments. > http://www.gnu.org/philosophy/no-word-attachments.html >