incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Suggestions on View performance optimization/improvement
Date Wed, 01 Apr 2009 17:43:50 GMT
On Wed, Apr 1, 2009 at 1:31 PM, Jason Smith <jhs@proven-corporation.com> wrote:
> I'd be very interested to know the performance impact of that optimization
> as well.  What is the overhead or bottleneck with large view values?
>  Estimating 100 bytes per key/value pair within each of the million
> documents, that's 2GB of raw data, which should write to a laptop disk
> within 2 minutes.
>
> I'm wondering whether it matters how large the view values are, since they
> would seem not to be involved in the view processing very much--only written
> to disk in the order defined by the keys.
>
> Of course, that goes against the common wisdom that the fastest thing to do
> is emit(key, null); but that could impact the application significantly
> since you have to query again for the documents.  (I'm unsure whether
> include_docs has a performance penalty either.)
>
> I guess what I'm asking is, why does the value side of views impact
> performance so greatly?
>

Other than the extra disk I/O, there's also the necessary
term_to_binary calls that add overhead. No idea how much that factors
in though. For include_docs, you're adding an N log(N) cost to reading
from the view but I haven't the slightest how that might translate to
wall clock time. It'd be an interesting thing to measure though. If
you include_docs for M rows with M 1 -> 1M or so and then find the
math to get an approximation for how long it takes to read through the
tree. I'm almost tempted but work is calling.

That said, the best way to find out would be to measure. I don't think
I've seen numbers on this yet so anything you can show would be
definitely be valuable information. Especially if you can demonstrate
the trade offs between emit(key, doc); vs emit(key, null) &
include_docs=True

HTH,
Paul Davis

> kowsik wrote:
>>
>> I would highly recommend that you do emit(doc.field, null) so that the
>> key space doesn't get unwieldy and large. Since the id of the document
>> is part of the map results, you can always fetch it using
>> include_docs=true.
>>
>> K.
>>
>> On Wed, Apr 1, 2009 at 10:12 AM, Manjunath Somashekhar
>> <manjunath_somashekhar@yahoo.com> wrote:
>>>
>>> hi All,
>>>
>>> We have been using couchdb (built out of trunk) for prototyping an idea
>>> and would like to thank and congratulate you folks for a simple and usable
>>> schema free db.
>>>
>>> We plan to store few million documents in couchdb and we would like to
>>> create couple of views to fetch the data appropriately. We have inserted a
>>> million documents (each containing about 20 fields). We are
>>> indexing/creating a view on a particular field of the document. The map
>>> function of the view is simple straight forward emit (emit(doc.field, doc)).
>>> It takes about 90 mins to build the required B-Tree index the first time.
>>> All the subsequent queries are performing extremely well (milli second
>>> responses). Can anything be done to reduce the 90 mins taken to build the
>>> required B-Tree index the first time?
>>>
>>> Environment details:
>>> Couchdb - 0.9.0a757326
>>> Erlang - 5.6.5
>>> Linux kernel - 2.6.24-23-generic #1 SMP Mon Jan 26 00:13:11 UTC 2009 i686
>>> GNU/Linux
>>> Ubuntu distribution
>>> Centrino Dual core, 4GB RAM laptop
>>>
>>> Thanks
>>> Manju
>>>
>>>
>>>
>>>
>
> --
> Jason Smith
> Proven Corporation
> Bangkok, Thailand
> http://www.proven-corporation.com
>

Mime
View raw message