incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Barnes <mrtr...@gmail.com>
Subject Re: Mapping multiple entries in an array field? (like tags)
Date Mon, 23 May 2011 03:35:59 GMT
With your original idea; to emit for each combination of tags... you may 
still end up with an unworkable number of records in your view.

Say for each document you emit combinations of tags, in alphabetical 
order. Limiting the maximum number of matched tags to 3, for example, 
will help limit the size of the view.

5-10 tags being your typical examples, you would get:
(5 choose 1)+(5 choose 2)+(5 choose 3) = 25
to
(10 choose 1)+(10 choose 2)+(10 choose 3) = 175
view records PER document.

And what exactly do you need to reduce? Just tag and tag intersection 
counts?

---------------------------------------------------------------------

It might be better to use couchdb-lucene; you can have a view that just has:

function(doc) {
     var ret=new Document();
     for (var i in doc.tags)
        ret.add(doc.tags[i], {"index":"not_analyzed"});
     return ret;
}

Then a query like:
http://localhost:5984/db/_fti/_design/db/tags?q=tag1&include_docs=true
   gets you all the docs with tag1,

For intersections:
http://localhost:5984/db/_fti/_design/db/tags?q=tag1%20AND%20tag2&include_docs=true
   gets you all the docs with both tag1 and tag2 (AND, not and)

And, to get the number of docs that *would* be returned from any query 
without having to count them:
http://localhost:5984/db/_fti/_design/db/tags?q=tag1&limit=1
and read the 'total_rows' element out of the response.

You could even do more complex queries too, if necessary; "tag1 AND 
(tag2 OR tag3)"

The only thing I can see that you might miss from this approach is being 
able to get the list of available tags - you could combine the above 
with a view that emits single tags, has '_count' as the reduce and use 
reduce=true&group=true to fetch that primary list.

-Patrick

On 23/05/2011 11:57 AM, He Shiming wrote:
> @Mark, hmm... eventually, I'm expecting the number of docs to be in
> the millions. Most of them will be tagged, and the number of tags will
> be in the thousands. Many "hot" tags will return a lot of documents.
> Unlike the single tag situation, which I can put a limit on. Finding
> intersections requires the full list.
>
> I'm trying to use a schedule to pre-fetch content for all the tags.
> But even under this circumstance, I'm looking for a lightweight way.
>
> So far I came to think that my original idea wasn't so bad. Because
> each document will only have like 5 or 10 tags max. The number of
> possible combinations isn't that huge. I think this way, there will be
> less calculation than fetching the full list several times at client
> side. The plus side would be, as long as it's still map/reduce, I can
> use bigcouch to scale the calculation easily.
>
> On Mon, May 23, 2011 at 9:17 AM, Mark Hahn<mark@boutiquing.com>  wrote:
>>
>> I'm using the method of emitting a result for each tag in the document and
>> I'm not seeing any huge client calculation.  I just get the list of doc ids
>> for each tag requested and do an intersection of the results.  Not a big
>> deal.  It isn't as if I have to load all the docs.
>>
>>
>>
>> --
>> Mark Hahn
>> Website Manager
>> mark@boutiquing.com
>> 949-229-1012
>>
>
>
>

Mime
View raw message