incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Borwankar <ni...@borwankar.com>
Subject Re: multiple key word count query problem
Date Mon, 20 Jul 2009 03:28:51 GMT
Tommy Chheng wrote:
> so for keys with two or more parameters, only the first parameter can 
> be used for range selection? the 2nd and remaining keys can only be 
> used for grouping/sorting?
>
> the problem with having two views:
> If i had two views, one for [word, doc] => count and [doc, word] => 
> count; it would be re-doing the same word counting function twice.
>
> I'm gonna try to compute the docs word counts and store the results in 
> database itself.

Yes but the advantage with letting the db do it is that indexes (views) 
are updated incrementally and dynamically whenever a new doc is added.
To get that functionality from your approach you would have to invoke 
the view explicitly via REST call everytime you or someone added a new doc.
And then you would have to update all your stored counts or do some 
diffing to find out which one had changed.  If you expect yur document 
store to be growing this could create performance issues - however if 
you have a static data store your approach may be fine.

I suspect the db can do all this more efficiently for you, though. So 
unless you are severely disk space constrained you may want to just have 
the two views.

Nitin Borwankar


( P.S. I see some NSF related text in there - I am also working on an 
NSF funded project and using Couch - I'd be happy to exchange notes off 
line also if you want)
> thanks,
> tommy
>
> On Jul 19, 2009, at 7:16 PM, Paul Davis wrote:
>
>> On Sun, Jul 19, 2009 at 9:14 PM, Tommy Chheng<tommy.chheng@gmail.com> 
>> wrote:
>>> I have a simple word count view defined as:
>>> --------
>>> function(doc) {
>>>  if(doc['couchrest-type'] == 'NsfGrant'){
>>>    var words = doc['abstract'].split(/\W+/);
>>>    words.forEach(function(word){
>>>      if (word.length > 1) emit([word, doc['_id']],1);
>>>    });
>>>  }
>>> }
>>>
>>> function(keys, values, rereduce) {
>>>  return sum(values);
>>> }
>>> --------
>>> where the key's first parameter is the word and the 2nd parameter is 
>>> the
>>> document_id.
>>>
>>> so i can do a query like this to get all the documents with the word 
>>> "the"
>>> correctly.
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey=["the"]&endkey=["the",{}]&group_level=2

>>>
>>>
>>> I'm having trouble doing queries on the 2nd parameter, how can i 
>>> find all
>>> the words in a particular document?
>>> I tried
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?key=[null,"0808605"]&group_level=2

>>>
>>> which gives nothing(thinking that null would match all words)
>>> and
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey=[null,"0808605"]&endkey=[{},"0808605"]&group_level=2

>>>
>>> which gives all results. Why is this?
>>>
>>> Thanks,
>>> Tommy
>>>
>>
>> Querying a view is asking for a slice of a sorted list. Start and end
>> keys delimit the range of rows returned. The solution to your problem
>> is to create a second view so you can query by docid.
>>
>> Paul Davis
>


Mime
View raw message