incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Stockton <chrisstockto...@gmail.com>
Subject Re: A simple comparative benchmark and look at include_docs performance.
Date Tue, 20 Apr 2010 19:51:23 GMT
Hello,

On Tue, Apr 20, 2010 at 10:58 AM, Adam Kocoloski <kocolosk@apache.org> wrote:
> Hi Chris, for the type of access pattern in your benchmark I generally recommend to use
emit(doc.model, doc) and avoid include_docs=true.  include_docs introduces an extra lookup
back in the DB for every row of your view.  If you emit the document into the view index
the index will get large but streaming requests such as yours can be accomplished with a minimum
of disk IO.

We have tried this approach and it was indeed faster, however we wound
up with what I remember to be over 19G view file. For 300mb sized
database this trade off did not seem reasonable, although disk is
cheaper in many cases, we found the bloat to be unacceptable. Do you
know of a way to limit the size of the view when including the doc?
Additionally may I ask if include_docs = true has potential room for
optimization?

> On the other hand, your sar report shows negligible iowait, so perhaps that's not your
immediate problem.  It may be the case that you're CPU-limited in the (pure Erlang) JSON
encoder, although I would've expected JSON encoding CPU usage to scale with network traffic.

It would surprise me if 13mb of json encoding could cause such spikes
in CPU. I also expected network traffic to scale with our CPU usage.
Have you seen issues in this area before? At first thought I would
think of the encoding stage as being one of the lighter areas in the
request, given the simple nature of json.

> You might try running eprof while you do this test.  It's quite heavyweight and will
slow your system down.  If you start couchdb with the -i flag you can get an Erlang shell
and execute
> <snip>

This was good information and I will look into profiling with erlang.
May I ask if any effort is currently being put into performance and
optimization for couchdb? I am also very interested in any reads on
large-scall couchdb deployments, that are not so high-level (I.E.
hardware specs, use cases, etc).

Kind Regards,

-Chris

Mime
View raw message