couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Bates <rba...@gmail.com>
Subject Re: getting unique set of document id's
Date Tue, 07 Jul 2009 18:08:14 GMT
Thank you for the follow up Brian. After looking at your examples I think I
understand where I wasn't clear in how I planned to use the POST {"keys":
["foo", "bar"]} statement.

What you and Paul suggested is the fastest route to getting documents that
are tagged with both foo AND bar, but for my search I am hoping to get foo
AND/OR bar.

The reason I was including the doc id in the emit was that I was trying to
play with the reduce to see if I could get it to "reduce" it to a unique
set.

I can use just the map() function then make the result unique on the client,
but from a performance standpoint I imagine the couchdb view already knows
what those unique id's are. I could be totally wrong there though.

The original foo/bar example is simple, but extending it out would prove to
be powerful for set based analysis (database marketing, customer
segmentation). To be able to pass {"keys": ["foo", "bar","baz","boo","xyz"]}
to a view and get back a set of docs that matched *one or more* of the keys
would be fantastic.

-Ross


On Tue, Jul 7, 2009 at 4:05 AM, Brian Candler <B.Candler@pobox.com> wrote:

> On Sun, Jul 05, 2009 at 03:18:33PM -0500, Ross Bates wrote:
> > Hi Paul - thank you for the pointers. Something I'm unclear on though...
> > using a sum in the reduce returns something like this for all the tags:
> >
> > foo, 3
> > bar, 5
> > baz, 7
> >
> > When I use the multi-key fetch against the view it doesn't return
> specific
> > docid's for each tag, just a subset of tags and their counts.
> >
> > POST {"keys": ["foo", "bar"]}
> >
> > foo, 3
> > bar, 5
> >
> > How can I get access to the list of docid's which make up the total?
>
> You query the view again with reduce=false, which turns it back into a
> regular map view.
>
> The idea is you'd query in this case only for key "foo", because this
> returns the smallest number of documents, and then filter the result set
> client-side to documents which also include key "bar".
>
> You can read the view using include_docs=true to get the whole docs, or in
> your view you can emit the parts of doc you're interested in as the value.
>
> Your original code was:
>
> function(doc) {
>  for(i in doc.tags) {
>      emit(doc.tags[i], doc._id);
>    }
> }
>
> Note that emitting doc._id is not useful (every K/V pair also emits the id
> as well), so you could have
>
> // Query with include_docs=true
> function(doc) {
>  for(i in doc.tags) {
>       emit(doc.tags[i], null);
>    }
> }
>
> Or:
>
> // Index is bigger but faster to read
> function(doc) {
>  for(i in doc.tags) {
>       emit(doc.tags[i], doc);
>    }
> }
>
> Or:
>
> // Just emit the pieces you need when processing the view
> function(doc) {
>  for(i in doc.tags) {
>       emit(doc.tags[i], {tags: doc.tags});
>    }
> }
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message