incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Barnes <mrtr...@gmail.com>
Subject Re: Mapping multiple entries in an array field? (like tags)
Date Tue, 24 May 2011 00:08:34 GMT
On 23/05/2011 6:08 PM, He Shiming wrote:
> @Patrick. The number of combination isn't that scary, given tags are sorted.
>
> For a 2-tag query, 2-tag documents will have 1 combination only, and 3
> for 3-tag docs. 45 combinations for 10-tag docs.

How do you get 45? '10 choose 3' is 120.

Also, you don't expect to have to match a smaller number of tags? If you 
only emit 3-tag combinations, how would you match a 2-tag intersection? 
With use of startkey and endkey, you could slightly reduce the required 
records - but this would only work for the tags that always get sorted 
to the front. (eg intersection of ['a' and 'b'], not ['b' and 'f'])

> However, based on what you've said. I'm under the impression that
> calling ``emit`` this many times per doc is a bad idea. I'm not
> familiar with the underlying mechanisms of lucene. But are you saying
> emitting just several dozen times per doc will definitely be much
> slower than lucene?

I'm just saying that if you have millions of documents, you'll have a 
view with more than a hundred million rows in it - so it will take a 
long time to generate, and take up a lot more disk space than the 
equivalent lucene view.
At an estimate, I'd say generation speed would be much slower than 
lucene. Retrieval speed might be a little bit faster, but queries like 
intersections are something lucene is built for.

-Patrick

Mime
View raw message