incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Liu" <>
Subject Re: view index build time
Date Sat, 12 Jul 2008 04:21:25 GMT
Late to the discussion but here's my 2 cents:

Depending on your virtualization software, disk accesses can suck. On
a "hosted" hypervisor, you're to have to rely on the host to schedule
your disk accesses. Disk io is scheduled in the guest, potentially go
through an emulation layer by the hypervisor, and then be scheduled in
the host. Furthermore there can be significant latency switching
between the host and the guest. If the disk accesses are small and
random this can cause the slowdown you are observing. Finally, your
guest is not always scheduled in since it's just like any other
processes to the host, so the actual amount of cpu time in the guest
is less than you normally have and will affect the total wall clock of
the computation time.

I'm not saying that virtualization sucks as it has many important uses
(e.g. VMotion), and some of these issues may be mitigated with proper
paravirtualization, but at the end you should still run benchmarks to
see if your workload is suited for the hypervisor you are considering.

On Tue, Jul 8, 2008 at 6:53 AM, Brad King <> wrote:
> Following up on this. After moving to real hardware my view index time
> for the same data set dropped from 25 minutes to 6 minutes, so
> definitely was a factor. If there any other optimizations I can make
> I'd love to know what they are. Thanks.
> On Thu, Jul 3, 2008 at 9:35 AM, Brad King <> wrote:
>> That would be fantastic, but it sounds like other users are seeing
>> performance similar to what I see. When you say tuning and
>> optimizations, are you talking about code changes in future versions
>> of couchdb or parameters we can change now? VM is definitely a
>> variable. I probably should try this out on real hardware too and
>> compare.
>> On Wed, Jul 2, 2008 at 7:30 PM, Damien Katz <> wrote:
>>> This sounds really slow, like somethings wrong. 25 minutes to process 300k
>>> means ~500 docs sec, or each document takes 2ms. That's a really long time
>>> CPU wise.
>>> Assuming it's not another VM bug, we should be able about to get that down
>>> to under minute with some tuning, and probably closer to 10 secs after
>>> serious optimizations.
>>> -Damien
>>> On Jul 2, 2008, at 6:28 PM, Chris Anderson wrote:
>>>> On Wed, Jul 2, 2008 at 3:08 PM, Paul Davis <>
>>>> wrote:
>>>>> I'd have to go back and double check, but off the top of my head 25
>>>>> min for 300K docs seems about like what I was getting. Ie, not orders
>>>>> of magnitude slower or anything.
>>>> In my experience, views generate about 1/2 as fast as that, if not
>>>> more slowly. My views are often quite complex with a lot of internal
>>>> looping and multiple emits, so that probably explains it. In short,
>>>> the times you're reporting seem reasonable.
>>>> The bottleneck (based on my extremely unscientific use of top) doesn't
>>>> seem to be the view server, but rather CouchDB's beam process, which
>>>> as I understand it, is busy sorting the results as they come back from
>>>> the view server. So the quickest route to parallelizing this may be to
>>>> manually partition your data across CouchDB instances, generate the
>>>> views, and query them in parallel, merging the results in your
>>>> application.
>>>> I don't actually plan to do all that work until my insert rate
>>>> eclipses CouchDB's view generation speed. :)
>>>> Once upon a time there was a feature to return the available results
>>>> of a view, even while generation is still occurring. The feature has
>>>> fallen by the wayside, and it would be non-trivial to turn it back on,
>>>> according to Damien on IRC. Maybe if it would be useful to enough
>>>> people, we'll see it again.
>>>> --
>>>> Chris Anderson

View raw message