couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ralf Nieuwenhuijsen" <ralf.nieuwenhuij...@gmail.com>
Subject Re: Create a view with only unique records
Date Mon, 14 Apr 2008 20:48:04 GMT
I was under the assumption the reduce functionality hasn't been implemented
yet.
Are there architectual reasons for the future implementation to not do it
exactly as you specified?

Greetings,
Ralf

2008/4/14, Jan Lehnardt <jan@apache.org>:
>
>
> On Apr 14, 2008, at 17:19, Jan Lehnardt wrote:
>
> >
> > On Apr 14, 2008, at 02:34, Ralf Nieuwenhuijsen wrote:
> >
> > > Well, that doesn't really apply. I am not looking for way to create
> > > unique
> > > documents.
> > > I'm looking for a way to get a view with only unique documents.
> > >
> > > Imagine some portion of all the documents having the key 'adres'.
> > > Then I want a list of unique adresses; a view with only the adres keys
> > > for
> > > documents that have it, and then only unique entries.
> > >
> > > It seems currently i can solve this problem in two ways:
> > > - creating a separate adres document that stores an array of all
> > > unique
> > > addresses. But without any sane default merging behavior, this breaks
> > > at
> > > replication.
> > > - creating a separate document for _each_ adres using put and the md5
> > > of
> > > the adres of doc-id. This seems like an enormous waste of space.
> > > Esspcially
> > > since I will be doing this with almost every key in every document.
> > >
> > > In the future this should be doable with the reduce/combinator
> > > behavior, i
> > > expect.But even there, i think the suggested approach is too limiting.
> > > The
> > > reducer is going to return one json object. I would rather have it
> > > emit
> > > (key,value) and use default view operations on it for stuff like
> > > pagination.
> > >
> > > Using the above example and assuming the reducer is implemented. How
> > > to get
> > > the X most used addresses? the value of X needs to be hard-coded with
> > > the
> > > suggested implemenation. Whereas using emit(key,value) in the reducer
> > > as
> > > well, would allow for pagination.
> > >
> >
> > I might be totally off here, but the reduce function actually does only
> > return one key-value pair for the view:
> >
> > map: /* _id = md5(address) */
> > function(doc) {
> >  emit(doc._id, 1);
> > }
> >
> > produces:
> >
> > abc | 1
> > abc | 1
> > def | 1
> > xyz | 1
> > yyy | 1
> > yyy | 1
> > yyy | 1
> >
> > for fictional _id values.
> >
> > reduce:
> > function(keys, values) {
> >  var sum = 0;
> >  for(var i in values) {
> >   sum += values[i];
> >  }
> >
> >  return sum;
> > }
> >
> > produces:
> >
> > abc | 2
> > def | 1
> > xyz | 1
> > yyy | 3
> >
> > as the output of the view, which can be paginated just as easy as the
> > list that map alone produces. This gives you a count for all addresses but
> > not yet a sorted list. got to think about that one a bit more.
> >
>
> I checked back with Damien and we can't do that now. You'd need to collate
> that reduce result in your application or use Lucene or some other
> technology to do that for you.
>
> Cheers
> Jan
> --
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message