Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 76C5110D96 for ; Sat, 27 Jul 2013 11:06:38 +0000 (UTC) Received: (qmail 84242 invoked by uid 500); 27 Jul 2013 11:06:36 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 83446 invoked by uid 500); 27 Jul 2013 11:06:33 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 83438 invoked by uid 99); 27 Jul 2013 11:06:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Jul 2013 11:06:32 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of matthieu.rakotojaona@gmail.com designates 209.85.212.170 as permitted sender) Received: from [209.85.212.170] (HELO mail-wi0-f170.google.com) (209.85.212.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Jul 2013 11:06:26 +0000 Received: by mail-wi0-f170.google.com with SMTP id hn3so967089wib.1 for ; Sat, 27 Jul 2013 04:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:from:to:subject:in-reply-to:references:x-pgp-key:date :message-id:user-agent:content-transfer-encoding; bh=wJv4aLQDAyaboe3FCCdRh51RiRI4thhAcHOqb+CVH8E=; b=gJmVRroqEB+oe6lBu8CPsLGSiahReExdy/KZ28ZwFeXGmKnARde0UMh3mg6sMwOdBA MNaiNfTtxWdCMNYUWlxCLcEVdne3daVvBLtRj0PzRkUgWc94uU9kIfWIDZgkJrIaEvYi RuueIxtoU4MYoV1uru2PtYipbuSkVXg7PzgWh3sF95Pm3DdC44Pu2w5Xqz49WvcgJh+h M1H1WN4BiJGVNqH1tl8G1Ogg6uVnvZlB2MbGZq/k+2bjjs4xGXQsbBNCEk7FzsjG8M3O OHltNKDA64A3r+Dy40Gw+r4EOZlZ8clqSmaX0nSMJ26uLCz+yrhuuJ8MPhZMpvlApET0 zXGw== X-Received: by 10.180.20.228 with SMTP id q4mr1801909wie.1.1374923165588; Sat, 27 Jul 2013 04:06:05 -0700 (PDT) Received: from kpad (155-29-190-109.dsl.ovh.fr. [109.190.29.155]) by mx.google.com with ESMTPSA id u7sm10013588wiw.9.2013.07.27.04.06.03 for (version=TLSv1.2 cipher=AES128-SHA bits=128/128); Sat, 27 Jul 2013 04:06:04 -0700 (PDT) Received: from localhost (localhost [IPv6:::1]); by kpad (OpenSMTPD) with ESMTP id 000a21c1; for ; Sat, 27 Jul 2013 11:01:01 +0000 (UTC) Content-Type: text/plain; charset=UTF-8 From: Matthieu Rakotojaona To: user Subject: Re: CouchDB: Group results by unique values In-reply-to: References: X-pgp-key: http://otokar.looc2011.eu/static/matthieu.rakotojaona.asc Date: Sat, 27 Jul 2013 13:01:00 +0200 Message-Id: <1374920833-sup-8354@kpad> User-Agent: Sup/git Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org This is not how you use couchdb views to query your data. Couchdb views use the buzz-compliant map-reduce logic to give you what you are looking for. There are plenty of resources out there, but here's a very basic way to put it. Consider you have these 5 documents: {"id": "doc1", "fruit": "banana"} {"id": "doc2"} {"id": "doc3"} {"id": "doc4", "fruit": "apple"} {"id": "doc5", "fruit": "coconut"} These are 5 random documents in your database, with no schema at all, no expected keys/values, no nothing. First step is to map your documents to something you are interested in. You are going to walk through all your documents and emit a key (and a value) for each document you want to work with, and this key will be used to index your documents in regard to this view (and this view only; you're not doing anything to the original doc, you're just moving in some parallel workspace where you rearrange your docs differently). In your example, the key would be the fruit each doc has: {"id": "doc1", "fruit": "banana"} -> {"_id": "doc1", "key": "banana"} {"id": "doc2"} -> {"id": "doc3"} -> {"id": "doc4", "fruit": "banana"} -> {"_id": "doc4", "key": "banana"} {"id": "doc5", "fruit": "coconut"} -> {"_id": "doc5", "key": "coconut"} Note that doc2 and doc3 don't emit anything, since you're not interested in them. Also note that there is an _id field in the data you emit. This is done automatically by couchdb, you don't have to do anything for this to happen (nor can you prevent it). Also note that each key/value emitted by a doc refers to the doc only, and to nothing else outside of it. Second step is to reduce the emitted values to the "summary" you are interested in. In your example, you want to know how many of each fruit you have; the result will be 2 for "banana" and 1 for "coconut". Here's a way you would write it (untested): ``` function (keys, values, rereduce) { if (rereduce) { return sum(values) } else { return keys.length } } ``` For all the details about what this function does, what's this rereduce thing, please read the wiki: https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views To put it shortly, this function will count all the emitted values that have the same keys, and sum the result. In the end you're gonna have the number of each fruit in your db. Seeing how common this function is, it's available as a built-in function. Just type "_count" and the result will be the same (except it will run faster) I hope I've been clear enough for you to grasp the general idea. Use the temp views in Futon to play around and get to know it better, because it sure isn't natural, but it sure is powerful. Oh, and the docs too, of course. -- Matthieu Rakotojaona