couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brad King" <brk...@gmail.com>
Subject Re: view index build time
Date Tue, 08 Jul 2008 13:53:50 GMT
Following up on this. After moving to real hardware my view index time
for the same data set dropped from 25 minutes to 6 minutes, so
definitely was a factor. If there any other optimizations I can make
I'd love to know what they are. Thanks.

On Thu, Jul 3, 2008 at 9:35 AM, Brad King <brking@gmail.com> wrote:
> That would be fantastic, but it sounds like other users are seeing
> performance similar to what I see. When you say tuning and
> optimizations, are you talking about code changes in future versions
> of couchdb or parameters we can change now? VM is definitely a
> variable. I probably should try this out on real hardware too and
> compare.
>
> On Wed, Jul 2, 2008 at 7:30 PM, Damien Katz <damienkatz@gmail.com> wrote:
>> This sounds really slow, like somethings wrong. 25 minutes to process 300k
>> means ~500 docs sec, or each document takes 2ms. That's a really long time
>> CPU wise.
>>
>> Assuming it's not another VM bug, we should be able about to get that down
>> to under minute with some tuning, and probably closer to 10 secs after
>> serious optimizations.
>>
>> -Damien
>>
>>
>> On Jul 2, 2008, at 6:28 PM, Chris Anderson wrote:
>>
>>> On Wed, Jul 2, 2008 at 3:08 PM, Paul Davis <paul.joseph.davis@gmail.com>
>>> wrote:
>>>>
>>>> I'd have to go back and double check, but off the top of my head 25
>>>> min for 300K docs seems about like what I was getting. Ie, not orders
>>>> of magnitude slower or anything.
>>>
>>> In my experience, views generate about 1/2 as fast as that, if not
>>> more slowly. My views are often quite complex with a lot of internal
>>> looping and multiple emits, so that probably explains it. In short,
>>> the times you're reporting seem reasonable.
>>>
>>> The bottleneck (based on my extremely unscientific use of top) doesn't
>>> seem to be the view server, but rather CouchDB's beam process, which
>>> as I understand it, is busy sorting the results as they come back from
>>> the view server. So the quickest route to parallelizing this may be to
>>> manually partition your data across CouchDB instances, generate the
>>> views, and query them in parallel, merging the results in your
>>> application.
>>>
>>> I don't actually plan to do all that work until my insert rate
>>> eclipses CouchDB's view generation speed. :)
>>>
>>> Once upon a time there was a feature to return the available results
>>> of a view, even while generation is still occurring. The feature has
>>> fallen by the wayside, and it would be non-trivial to turn it back on,
>>> according to Damien on IRC. Maybe if it would be useful to enough
>>> people, we'll see it again.
>>>
>>> --
>>> Chris Anderson
>>> http://jchris.mfdz.com
>>
>>
>

Mime
View raw message