Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A8376D1EF for ; Mon, 15 Oct 2012 17:59:25 +0000 (UTC) Received: (qmail 5814 invoked by uid 500); 15 Oct 2012 17:59:24 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 5782 invoked by uid 500); 15 Oct 2012 17:59:23 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 5774 invoked by uid 99); 15 Oct 2012 17:59:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Oct 2012 17:59:23 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of matthieu.rakotojaona@gmail.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Oct 2012 17:59:17 +0000 Received: by mail-ie0-f180.google.com with SMTP id e10so8067208iej.11 for ; Mon, 15 Oct 2012 10:58:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=C+EhubQ401m+nRLK1DH2a72bZMFsNyWASkkVonmgvnM=; b=0LOK+ekMhJM3RKvsUPC/6nvPgcm5MunWLv5NgJhDIN8iJqdghZLvjNVkwxIOYCAU2v 7HtKQ2y6WLuLQJJDTqShSwvrvbVDFZ8K5ayeMiEMJDR1v/nxNCGbAYsjgRdwcCa+svxY a4b6DhVmHUq73PF4rOmP2eBwMZdshCFuG0nxeeVXsuAPkpvRNcIFYDakcIfOH6WIexpm 8H7gNkAKQWwne1NipJjpXrzZXcpEQCS576p/te/Oiwpqaodxz7Z7qRlaUGjfy3aW+8Mw IBQyAXGbGg0gnWH3ZftuV0Dl1UILtZMcjZHAFA2kUIWcJGVmp/xFiHpvSVObDxXmEroG vDNw== Received: by 10.50.195.196 with SMTP id ig4mr9630210igc.33.1350323936119; Mon, 15 Oct 2012 10:58:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.22.101 with HTTP; Mon, 15 Oct 2012 10:58:35 -0700 (PDT) In-Reply-To: References: From: Matthieu Rakotojaona Date: Mon, 15 Oct 2012 19:58:35 +0200 Message-ID: Subject: Re: Doubt on map/reduce and "joins" by id To: user@couchdb.apache.org, anddimario@gmail.com Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Hello, The wiki has a page regarding reduce functions : See http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Reduce_Functions Here are a few more notes about how to use them. You might be accustomed to some parts of it, but I thought it could serve for other bypassers too : 1/ reduce takes 3 arguments: keys (an array), values (an array) and rereduce (a boolean) 2/ reduce will be called multiple times in a database life, and those calls can be categorized in 2 groups: 1/ "First" they will be called on maps outputs. In this case: * rereduce = false * keys is an array of [key, id] where key is the key you emitted in the map and id is the id of the doc * values is an array of the values you emitted in the map. keys and values ids are correlated : keys[7] and values[7] will hold related information (ie for the same emitted row) 2/ "Then" they will be called on reduce outputs. In this case: * rereduce = true * keys = nil * values is an array of objects you returned in previous reduce function This second case has a fundamental conclusion : a reduce output _must_ be usable as an other reduce input. "First" and "Then" are misleading, because you never know _when_ the reduce will be called. But you know _how_, and it's enough. 3/ Differentiating if rereduce is true or false can be a pain because usually you have to think about 2 different data structures (one is the map output, one is the reduce output). There might be views where you don't care very much about map-only output, and you want only map+reduce output : in this case, you should try outputting structures that will already be usable by reduce functions in your map. I just had some "similar" case recently, and I did what you can find in https://github.com/rakoo/pfeed/tree/master/pfeed-couch/views/feeds-stats. Basically : * I have 2 types of data in my db : "feed" and "entry". "feed"s have unique id and a unique title; "entry"s have a feed_id and a state(isRead or isUnread). One feed can have any number of entries. * I wanted to count the number of each state for a feed * I resorted to this kind of structure, as a target output: { "title": , "isUnread": <some number>, "isRead": <some other number> } of which I would have one for each feed id. * Obviously, since this is not an output I can have on each doc with a map, I will not emit this for everyone and just merge them. But I emit the same structure only with the fields I know : For a feed: { "title": <title> } For an entry: { "isUnread": <1 if it is unread or...>, "isRead": <... 1 if it is read> } both with the feed id as a key, so I can call the views reduced and grouped exactly and have one structure for each id * This was for the "map" part. The "reduce" part now only has to take all those structures and merge them : * "merge" the titles, but they should be the same (remember : I call the view reduced, so the feed_id will be the same, thus the title will be the same) * "merge" the state count by just adding them * BUT this view has to be grouped _exactly_, which means I will have one stats structure for each id. If I don't group them exactly, the output will be nonsense, since titles are merged with no idea which one will be the last. As a conclusion, I would say that couchDB's map/reduce views ask for a total rethinking of how you organize your data. I set out with this because this is how I would have made my docs, but this means I have to do some kind of joins. Depending on your application needs, you might want to try put that relational model aside and arrange your data in another way, such that retrieving the data your application wants (which is, in the end, all that matters) is easier than understanding the data structure at first sight (which is important only at prototyping/debug time, and this should be negligible against usage time) -- Matthieu RAKOTOJAONA