couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zachary Zolton <zachary.zol...@gmail.com>
Subject Re: View Filter
Date Fri, 15 May 2009 02:59:57 GMT
Drat... I actually may just came from place where knowing how to keep
my doc types in separate databases —and being able to speed up the
map-reduce churn of querying a reduce-with-group query with view
filters— would have save me a TON of work!

Urgh... At worst, I'll put it in my blog...  :^(

On Thu, May 14, 2009 at 8:25 PM, Mark Hammond <skippy.hammond@gmail.com> wrote:
> On 15/05/2009 4:47 AM, Brian Candler wrote:
>>
>> On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote:
>>>
>>> (1) people who are storing large documents in CouchDB but not indexing
>>> them
>>> at all (I guess this is possible, e.g. if the doc ids are well-known or
>>> stored in other documents, but this isn't the most common way of working)
>>
>> The proposal would exclude a document from *all* views in a particular
>> design doc. So you're only going to get a benefit from this if you have a
>> large number of documents (or a number of large documents) which are not
>> required to be indexed in any view in that design doc.
>
> Yep - and that is the point.  Consider Jan's example, where it was filtering
> on doc['type'].  If a database had (say) 10 potential values of 'type', then
> all filters that only care about a single type will only care about 1 in 10
> of those documents.
>
> Taking this to its extreme, we tested Jan's patch on a view which matches
> very few document in a large database.  Rebuilding that view with a filter
> was 18 times faster than without the filter.  We put this down to the fact
> the filter managed to avoid the json encode/decode step for the vast
> majority of the docs in the database.  IOW, on my test database, 6 minutes
> is spent before the filters can actually do anything (ie, that is just the
> json processing), whereas using the filter to avoid that json step brings it
> down to 20 seconds.
>
> So while not everyone will be able to see such significant speedups, many
> may find it extremely useful.
>
>> And it's reasonable, given that (as I understand it) each document is
>> already only passed once to the view server, in order to be indexed by all
>> the views in that design document.
>
> I agree there is lots that can and should be done to speed up views that do
> indeed care about most of the docs - such views spend less time relatively
> in the json encode step and more time in the interpreter.  As an experiment,
> I "ported" one of our views that does look at most of the docs from
> javascript to erlangview, and the performance increase was far more modest
> (20% maybe).  I suspect the javascript interpreter is faster than erlang, so
> I suspect that there will be a level of view complexity where using
> javascript *increases* view performance over erlang, even when factoring in
> the json processing...
>
> Cheers,
>
> Mark
>

Mime
View raw message