incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: how to count the number of unique values
Date Tue, 12 Oct 2010 19:42:57 GMT
Something like the following, maybe?

function(doc) {
  //as before, emit every tag
}

function(keys, values, rereduce) {
  if(!rereduce) {
    //return the following, in an object or list to unpack:
    // 1) the first and last tag in keys (or the same tag twice, if
all are equal or length == 1)
    // 2) a count of unique tags (not including and not equal to the
first and/or last)
    // This assumes keys is sorted (which I think is safe), but if not
you can always sort them yourself
  } else {
    var total = //sum of all inner unique counts from !rereduce step
    var merged = //merge all the [first, last] lists
    if(merged.length != 2) {
      //Run the !rereduce clause (maybe put this in a function) on merged
      //Add the resulting inner count to total, set merged = [first,
last] result.
    }
    //Return the new inner total and new [first, last]
  }
}

I'm pretty sure I've messed things up horribly somewhere in there, but
the basic idea feels right. Explore it.
A word of caution though: this algorithm assumes that there is no
overlap in key range for each independent initial reduction. I believe
this is a safe assumption when running on a single CouchDB node.
However, this will not be the case if something like lounge or
bigcouch is rereducing results from many shards with overlapping tag
ranges.

If anyone can think of a better, more general algorithm that avoids
building a huge dictionary in the reduce phase, I think it'd be
interesting to see, and probably useful to more people than just
Anand. :)

-Randall

On Tue, Oct 12, 2010 at 08:00, Anand Chitipothu <anandology@gmail.com> wrote:
> 2010/10/12 Michael Zedeler <michael@zedeler.dk>:
>>  On 2010-10-12 15:09, Anand Chitipothu wrote:
>>>
>>> Is it possible to count the number of unique values by writing a couchdb
>>> view?
>>>
>>> Consider the following 2 docs.
>>>
>>> {
>>>     "_id": "posts/1",
>>>     "title": "post 1",
>>>     "tags": ["foo", "bar"]
>>> }
>>>
>>> {
>>>     "_id": "posts/2",
>>>     "title": "post 2",
>>>     "tags": ["foo", "foobar"]
>>> }
>>>
>>> Is it possible to find that there are 3 tags?
>>
>> Yes. Just write a map function that emits all tags it finds (not checked and
>> probably wrong):
>>
>> function(doc) {
>>    for(tag in doc.tags) {
>>        emit(tag, null);
>>    }
>> }
>>
>> In the reduce-function, just use _count
>> (http://wiki.apache.org/couchdb/Built-In_Reduce_Functions).
>
> That gives counts of each tag, not the total number of unique tags.
>
> The above reduce function with group_level=1 will give:
>
> {"rows":[
> {"key":"bar","value":1},
> {"key":"foo","value":2},
> {"key":"foobar","value":1}
> ]}
>
> And without any group_level it will return 4, which is the total
> number of tags occurrences.
>
> {"rows":[
> {"key":"null","value":4}
> ]}
>
> Is there any way to find the number of *unique* tags?
>
> Anand
>

Mime
View raw message