incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: View Filter
Date Fri, 15 May 2009 07:38:12 GMT
On Fri, May 15, 2009 at 11:25:01AM +1000, Mark Hammond wrote:
>> The proposal would exclude a document from *all* views in a particular
>> design doc. So you're only going to get a benefit from this if you have a
>> large number of documents (or a number of large documents) which are not
>> required to be indexed in any view in that design doc.
>
> Yep - and that is the point.  Consider Jan's example, where it was  
> filtering on doc['type'].  If a database had (say) 10 potential values  
> of 'type', then all filters that only care about a single type will only  
> care about 1 in 10 of those documents.

Sure, as long as *none* of the views in that design document care about a
significant proportion of the documents.

It's unusual that people will have docs which are completely unindexed, so I
think this patch mainly helps in the case where the user has 10 separate
design documents, each of which is only interested in documents of one type.

Of course, that's a perfectly legitimate way of using CouchDB, and I don't
oppose this change at all.

It might be possible to make the feature more general though. For example,
suppose each view had its own filter, and the erlang server took the *union*
of those filters to work out which documents to send. Then, when sending a
document, it sent a list of which views to process it with. This could be
used to simplify the view code by removing the doc.type test, whilst getting
the performance benefit automatically.

Example:

  views:{
    view1:{
      filter:[{type:"foo"}],
      map:...
    }
    view2:{
      filter:[{type:"foo"},{type:"bar"}],
      map:...
    }
  }

When a document of type foo is sent, it would be sent to the view engine
with a list ["view1","view2"] of the views to be invoked on it. A document
of type bar would have ["view2"]. A document of type baz would not be sent
at all.

But maybe this is too complicated, and going further down this route ends up
with an erlang view server anyway.

> Taking this to its extreme, we tested Jan's patch on a view which  
> matches very few document in a large database.  Rebuilding that view  
> with a filter was 18 times faster than without the filter.  We put this  
> down to the fact the filter managed to avoid the json encode/decode step  
> for the vast majority of the docs in the database.

You also avoided sending the docs over the socket and waiting for the
response. So maybe latency is also part of the problem. Depends whether the
view server interface does any sort of pipelining of requests.

Regards,

Brian.

Mime
View raw message