incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: View Filter
Date Fri, 15 May 2009 10:09:12 GMT

On 15 May 2009, at 09:38, Brian Candler wrote:

> On Fri, May 15, 2009 at 11:25:01AM +1000, Mark Hammond wrote:
>>> The proposal would exclude a document from *all* views in a  
>>> particular
>>> design doc. So you're only going to get a benefit from this if you  
>>> have a
>>> large number of documents (or a number of large documents) which  
>>> are not
>>> required to be indexed in any view in that design doc.
>>
>> Yep - and that is the point.  Consider Jan's example, where it was
>> filtering on doc['type'].  If a database had (say) 10 potential  
>> values
>> of 'type', then all filters that only care about a single type will  
>> only
>> care about 1 in 10 of those documents.
>
> Sure, as long as *none* of the views in that design document care  
> about a
> significant proportion of the documents.
>
> It's unusual that people will have docs which are completely  
> unindexed, so I
> think this patch mainly helps in the case where the user has 10  
> separate
> design documents, each of which is only interested in documents of  
> one type.
>
> Of course, that's a perfectly legitimate way of using CouchDB, and I  
> don't
> oppose this change at all.
>
> It might be possible to make the feature more general though. For  
> example,
> suppose each view had its own filter, and the erlang server took the  
> *union*
> of those filters to work out which documents to send. Then, when  
> sending a
> document, it sent a list of which views to process it with. This  
> could be
> used to simplify the view code by removing the doc.type test, whilst  
> getting
> the performance benefit automatically.

Like I said in the original mail. This wouldn't be possible without a  
major rewrite
of the view serverand I'd rather not do that in the light of other,  
more important
changes.

Cheers
Jan
--


>
> Example:
>
>  views:{
>    view1:{
>      filter:[{type:"foo"}],
>      map:...
>    }
>    view2:{
>      filter:[{type:"foo"},{type:"bar"}],
>      map:...
>    }
>  }
>
> When a document of type foo is sent, it would be sent to the view  
> engine
> with a list ["view1","view2"] of the views to be invoked on it. A  
> document
> of type bar would have ["view2"]. A document of type baz would not  
> be sent
> at all.
>
> But maybe this is too complicated, and going further down this route  
> ends up
> with an erlang view server anyway.
>
>> Taking this to its extreme, we tested Jan's patch on a view which
>> matches very few document in a large database.  Rebuilding that view
>> with a filter was 18 times faster than without the filter.  We put  
>> this
>> down to the fact the filter managed to avoid the json encode/decode  
>> step
>> for the vast majority of the docs in the database.
>
> You also avoided sending the docs over the socket and waiting for the
> response. So maybe latency is also part of the problem. Depends  
> whether the
> view server interface does any sort of pipelining of requests.
>
> Regards,
>
> Brian.
>


Mime
View raw message