couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Rakotojaona <matthieu.rakotoja...@gmail.com>
Subject Re: CouchDB: Group results by unique values
Date Sat, 27 Jul 2013 11:01:00 GMT
This is not how you use couchdb views to query your data. Couchdb views
use the buzz-compliant map-reduce logic to give you what you are looking
for. There are plenty of resources out there, but here's a very basic
way to put it.

Consider you have these 5 documents:

{"id": "doc1", "fruit": "banana"}
{"id": "doc2"}
{"id": "doc3"}
{"id": "doc4", "fruit": "apple"}
{"id": "doc5", "fruit": "coconut"}

These are 5 random documents in your database, with no schema at all, no
expected keys/values, no nothing.

First step is to map your documents to something you are interested in.
You are going to walk through all your documents and emit a key (and a
value) for each document you want to work with, and this key will be
used to index your documents in regard to this view (and this view only;
you're not doing anything to the original doc, you're just moving in
some parallel workspace where you rearrange your docs differently).

In your example, the key would be the fruit each doc has:


{"id": "doc1", "fruit": "banana"}   -> {"_id": "doc1", "key": "banana"}
{"id": "doc2"}                      ->
{"id": "doc3"}                      ->
{"id": "doc4", "fruit": "banana"}   -> {"_id": "doc4", "key": "banana"}
{"id": "doc5", "fruit": "coconut"}  -> {"_id": "doc5", "key": "coconut"}

Note that doc2 and doc3 don't emit anything, since you're not interested
in them. Also note that there is an _id field in the data you emit. This
is done automatically by couchdb, you don't have to do anything for this
to happen (nor can you prevent it). Also note that each key/value
emitted by a doc refers to the doc only, and to nothing else outside of
it.

Second step is to reduce the emitted values to the "summary" you are
interested in. In your example, you want to know how many of each fruit
you have; the result will be 2 for "banana" and 1 for "coconut". Here's
a way you would write it (untested):

```
function (keys, values, rereduce) {
  if (rereduce) {
    return sum(values)
  } else {
    return keys.length
  }
}
```

For all the details about what this function does, what's this rereduce
thing, please read the wiki:
https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

To put it shortly, this function will count all the emitted values that
have the same keys, and sum the result. In the end you're gonna have the
number of each fruit in your db. Seeing how common this function is,
it's available as a built-in function. Just type "_count" and the result
will be the same (except it will run faster)

I hope I've been clear enough for you to grasp the general idea. Use the
temp views in Futon to play around and get to know it better, because it
sure isn't natural, but it sure is powerful. Oh, and the docs too, of
course.

-- 
Matthieu Rakotojaona

Mime
View raw message