couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hammond <skippy.hamm...@gmail.com>
Subject Re: View Filter
Date Fri, 15 May 2009 01:25:01 GMT
On 15/05/2009 4:47 AM, Brian Candler wrote:
> On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote:
>> (1) people who are storing large documents in CouchDB but not indexing them
>> at all (I guess this is possible, e.g. if the doc ids are well-known or
>> stored in other documents, but this isn't the most common way of working)
>
> The proposal would exclude a document from *all* views in a particular
> design doc. So you're only going to get a benefit from this if you have a
> large number of documents (or a number of large documents) which are not
> required to be indexed in any view in that design doc.

Yep - and that is the point.  Consider Jan's example, where it was 
filtering on doc['type'].  If a database had (say) 10 potential values 
of 'type', then all filters that only care about a single type will only 
care about 1 in 10 of those documents.

Taking this to its extreme, we tested Jan's patch on a view which 
matches very few document in a large database.  Rebuilding that view 
with a filter was 18 times faster than without the filter.  We put this 
down to the fact the filter managed to avoid the json encode/decode step 
for the vast majority of the docs in the database.  IOW, on my test 
database, 6 minutes is spent before the filters can actually do anything 
(ie, that is just the json processing), whereas using the filter to 
avoid that json step brings it down to 20 seconds.

So while not everyone will be able to see such significant speedups, 
many may find it extremely useful.

> And it's reasonable, given that (as I understand it) each document is
> already only passed once to the view server, in order to be indexed by all
> the views in that design document.

I agree there is lots that can and should be done to speed up views that 
do indeed care about most of the docs - such views spend less time 
relatively in the json encode step and more time in the interpreter.  As 
an experiment, I "ported" one of our views that does look at most of the 
docs from javascript to erlangview, and the performance increase was far 
more modest (20% maybe).  I suspect the javascript interpreter is faster 
than erlang, so I suspect that there will be a level of view complexity 
where using javascript *increases* view performance over erlang, even 
when factoring in the json processing...

Cheers,

Mark

Mime
View raw message