incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hank Knight <hknight...@gmail.com>
Subject Re: CouchDB: Group results by unique values
Date Sun, 28 Jul 2013 13:52:10 GMT
Thanks, Matthieu.

Your explanation makes more senses than pages of documentation I read.
 Now I actually understand it.

On Sat, Jul 27, 2013 at 8:01 AM, Matthieu Rakotojaona
<matthieu.rakotojaona@gmail.com> wrote:
> This is not how you use couchdb views to query your data. Couchdb views
> use the buzz-compliant map-reduce logic to give you what you are looking
> for. There are plenty of resources out there, but here's a very basic
> way to put it.
>
> Consider you have these 5 documents:
>
> {"id": "doc1", "fruit": "banana"}
> {"id": "doc2"}
> {"id": "doc3"}
> {"id": "doc4", "fruit": "apple"}
> {"id": "doc5", "fruit": "coconut"}
>
> These are 5 random documents in your database, with no schema at all, no
> expected keys/values, no nothing.
>
> First step is to map your documents to something you are interested in.
> You are going to walk through all your documents and emit a key (and a
> value) for each document you want to work with, and this key will be
> used to index your documents in regard to this view (and this view only;
> you're not doing anything to the original doc, you're just moving in
> some parallel workspace where you rearrange your docs differently).
>
> In your example, the key would be the fruit each doc has:
>
>
> {"id": "doc1", "fruit": "banana"}   -> {"_id": "doc1", "key": "banana"}
> {"id": "doc2"}                      ->
> {"id": "doc3"}                      ->
> {"id": "doc4", "fruit": "banana"}   -> {"_id": "doc4", "key": "banana"}
> {"id": "doc5", "fruit": "coconut"}  -> {"_id": "doc5", "key": "coconut"}
>
> Note that doc2 and doc3 don't emit anything, since you're not interested
> in them. Also note that there is an _id field in the data you emit. This
> is done automatically by couchdb, you don't have to do anything for this
> to happen (nor can you prevent it). Also note that each key/value
> emitted by a doc refers to the doc only, and to nothing else outside of
> it.
>
> Second step is to reduce the emitted values to the "summary" you are
> interested in. In your example, you want to know how many of each fruit
> you have; the result will be 2 for "banana" and 1 for "coconut". Here's
> a way you would write it (untested):
>
> ```
> function (keys, values, rereduce) {
>   if (rereduce) {
>     return sum(values)
>   } else {
>     return keys.length
>   }
> }
> ```
>
> For all the details about what this function does, what's this rereduce
> thing, please read the wiki:
> https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
>
> To put it shortly, this function will count all the emitted values that
> have the same keys, and sum the result. In the end you're gonna have the
> number of each fruit in your db. Seeing how common this function is,
> it's available as a built-in function. Just type "_count" and the result
> will be the same (except it will run faster)
>
> I hope I've been clear enough for you to grasp the general idea. Use the
> temp views in Futon to play around and get to know it better, because it
> sure isn't natural, but it sure is powerful. Oh, and the docs too, of
> course.
>
> --
> Matthieu Rakotojaona

Mime
View raw message