couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: getting unique set of document id's
Date Tue, 07 Jul 2009 18:53:38 GMT
Your goal is achievable with couchdb-lucene
(http://github.com/rnewson/couchdb-lucene), fwiw.

That is, you would add all of the tags for each document to a
full-text view with;

{
  "_id":"lucene",

  "fulltext": {
    "tags": {
      "index":"function(doc) { var ret=new Document();
ret.add(doc.tags); return ret }"
    },
  }
}


and query with something like;

http://localhost:5984/dbname/_fti/lucene/tags?q="key:foo AND key:bar"

couchdb-lucene supports include_docs=true too, so it might be a
one-stop shop solution for you, assuming you can bear the indignity of
Java.

B.

On Tue, Jul 7, 2009 at 7:08 PM, Ross Bates<rbates@gmail.com> wrote:
> Thank you for the follow up Brian. After looking at your examples I think I
> understand where I wasn't clear in how I planned to use the POST {"keys":
> ["foo", "bar"]} statement.
>
> What you and Paul suggested is the fastest route to getting documents that
> are tagged with both foo AND bar, but for my search I am hoping to get foo
> AND/OR bar.
>
> The reason I was including the doc id in the emit was that I was trying to
> play with the reduce to see if I could get it to "reduce" it to a unique
> set.
>
> I can use just the map() function then make the result unique on the client,
> but from a performance standpoint I imagine the couchdb view already knows
> what those unique id's are. I could be totally wrong there though.
>
> The original foo/bar example is simple, but extending it out would prove to
> be powerful for set based analysis (database marketing, customer
> segmentation). To be able to pass {"keys": ["foo", "bar","baz","boo","xyz"]}
> to a view and get back a set of docs that matched *one or more* of the keys
> would be fantastic.
>
> -Ross
>
>
> On Tue, Jul 7, 2009 at 4:05 AM, Brian Candler <B.Candler@pobox.com> wrote:
>
>> On Sun, Jul 05, 2009 at 03:18:33PM -0500, Ross Bates wrote:
>> > Hi Paul - thank you for the pointers. Something I'm unclear on though...
>> > using a sum in the reduce returns something like this for all the tags:
>> >
>> > foo, 3
>> > bar, 5
>> > baz, 7
>> >
>> > When I use the multi-key fetch against the view it doesn't return
>> specific
>> > docid's for each tag, just a subset of tags and their counts.
>> >
>> > POST {"keys": ["foo", "bar"]}
>> >
>> > foo, 3
>> > bar, 5
>> >
>> > How can I get access to the list of docid's which make up the total?
>>
>> You query the view again with reduce=false, which turns it back into a
>> regular map view.
>>
>> The idea is you'd query in this case only for key "foo", because this
>> returns the smallest number of documents, and then filter the result set
>> client-side to documents which also include key "bar".
>>
>> You can read the view using include_docs=true to get the whole docs, or in
>> your view you can emit the parts of doc you're interested in as the value.
>>
>> Your original code was:
>>
>> function(doc) {
>>  for(i in doc.tags) {
>>      emit(doc.tags[i], doc._id);
>>    }
>> }
>>
>> Note that emitting doc._id is not useful (every K/V pair also emits the id
>> as well), so you could have
>>
>> // Query with include_docs=true
>> function(doc) {
>>  for(i in doc.tags) {
>>       emit(doc.tags[i], null);
>>    }
>> }
>>
>> Or:
>>
>> // Index is bigger but faster to read
>> function(doc) {
>>  for(i in doc.tags) {
>>       emit(doc.tags[i], doc);
>>    }
>> }
>>
>> Or:
>>
>> // Just emit the pieces you need when processing the view
>> function(doc) {
>>  for(i in doc.tags) {
>>       emit(doc.tags[i], {tags: doc.tags});
>>    }
>> }
>>
>

Mime
View raw message