couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Create a view with only unique records
Date Mon, 14 Apr 2008 15:19:34 GMT

On Apr 14, 2008, at 02:34, Ralf Nieuwenhuijsen wrote:
> Well, that doesn't really apply. I am not looking for way to create  
> unique
> documents.
> I'm looking for a way to get a view with only unique documents.
>
> Imagine some portion of all the documents having the key 'adres'.
> Then I want a list of unique adresses; a view with only the adres  
> keys for
> documents that have it, and then only unique entries.
>
> It seems currently i can solve this problem in two ways:
> - creating a separate adres document that stores an array of all  
> unique
> addresses. But without any sane default merging behavior, this  
> breaks at
> replication.
> - creating a separate document for _each_ adres using put and the  
> md5 of
> the adres of doc-id. This seems like an enormous waste of space.  
> Esspcially
> since I will be doing this with almost every key in every document.
>
> In the future this should be doable with the reduce/combinator  
> behavior, i
> expect.But even there, i think the suggested approach is too  
> limiting. The
> reducer is going to return one json object. I would rather have it  
> emit
> (key,value) and use default view operations on it for stuff like  
> pagination.
>
> Using the above example and assuming the reducer is implemented. How  
> to get
> the X most used addresses? the value of X needs to be hard-coded  
> with the
> suggested implemenation. Whereas using emit(key,value) in the  
> reducer as
> well, would allow for pagination.

I might be totally off here, but the reduce function actually does  
only return one key-value pair for the view:

map: /* _id = md5(address) */
function(doc) {
   emit(doc._id, 1);
}

produces:

abc | 1
abc | 1
def | 1
xyz | 1
yyy | 1
yyy | 1
yyy | 1

for fictional _id values.

reduce:
function(keys, values) {
   var sum = 0;
   for(var i in values) {
     sum += values[i];
   }

   return sum;
}

produces:

abc | 2
def | 1
xyz | 1
yyy | 3

as the output of the view, which can be paginated just as easy as the  
list that map alone produces. This gives you a count for all addresses  
but not yet a sorted list. got to think about that one a bit more.

Cheers
Jan
--


> Greetings,
> Ralf
>
> 2008/4/13, Chris Anderson <jchris@mfdz.com>:
>>
>> Ralf,
>>
>> If you use an algorithm to generate a deterministic _id for records
>> before PUT-ing them to CouchDB, you can ensure that each unique  
>> record
>> only appears once in the database. This discussion might be relevant
>> for you:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/incubator-couchdb-user/200803.mbox/%3Ce282921e0803161811s4a98a946tc061be37766c7618@mail.gmail.com%3E
>>
>> Chris
>>
>>
>> On Sat, Apr 12, 2008 at 8:43 PM, Ralf Nieuwenhuijsen
>> <ralf.nieuwenhuijsen@gmail.com> wrote:
>>> Is it possible to create a view with only unique-records?
>>> I assume it would be possible using the future reduce/combinator
>> behavor?
>>>
>>> What time-frame is the reduce-behavior planned?
>>>
>>> Greetings,
>>> Ralf
>>>
>>
>>
>>
>>
>> --
>> Chris Anderson
>> http://jchris.mfdz.com
>>


Mime
View raw message