couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: View Performance (was Re: The 1.0 Thread)
Date Thu, 02 Jul 2009 11:24:55 GMT
On Thu, Jul 02, 2009 at 09:06:35AM +0100, Brian Candler wrote:
> On Wed, Jul 01, 2009 at 11:01:54AM +0200, Chris Anderson wrote:
> > The slow performance you describe seems out of the ordinary.
> 
> That description matches exactly what I see. Admittedly the hardware I'm
> using is low-end (most development is on a 1.2GHz P4 laptop), but whilst I
> can insert ~600 docs per second via bulk_docs, I can only index maybe 10
> docs per second, emitting about 100 K/V pairs per second.

I've now measured it properly, using the real app data and views, and
actually it's not nearly as bad as that. There are:

- 3,877 docs
- 6 views
- 45,256 k/v pairs are emitted across all views
- total time to index: 80.7 seconds

This equates to:
* indexing 50 docs per second
* inserting 561 k/v pairs per second

I've also tried sending the views individually to temp_view:

		#emits		map only/secs	map+reduce/secs
                ------		-----		-----
view 1:		    5		 3.82		  N/A
view 2:		17608		25.75		  N/A
view 3:		 3877		 5.81		 7.00
view 4:		    3		 4.42		  N/A
view 5:		18934		27.44		36.91
view 6:		 5829		15.29		20.35

A null temp_view map - "function(doc){}" - takes 3.83 seconds. So you can
see that's how long it takes just to iterate through the docs and send them
to the view server, giving a theoretical maximum of about 1,000 docs per
second.

This also shows that temp_view can be used very accurately to predict the
performance of permanent views. If I add all the individual temp_view times,
then subtract N-1 times the iteration overhead, because each doc is only
sent once to the view server when building permanent views:

  (3.82 + 25.75 + 7.00 + 4.42 + 36.91 + 20.35) - (5 * 3.83) = 79.15

I've now started writing some Ruby code to benchmark different aspects of
view generation: sending larger and smaller docs, emitting larger or fewer
keys from the same doc, having larger and smaller numbers of views, with and
without reduce functions. So far these all appear to behave sensibly and
linearly. I'll publish this shortly.

Of course, the more keys you emit, the slower it becomes. Views 2,3 and 4
above achieve 600-700 keys per second (view 6 has some more complex map
processing). Trouble is, this application emits 11-12 keys per document on
average, and that's what brings me down to ~50 docs per second.

Considering the work which is being done, this is probably not unreasonable.
Nothing stands out as being obviously a bottleneck so far. But having apps
which emit 10x more keys than docs is pretty normal, I think (an equivalent
SQL app which might join across two or three tables and have two or three
indexes per table).

The interesting part of benchmarking will be to work out how much of the K/V
limitation is due to Javascript generating the JSON and Erlang parsing it,
versus how much Erlang spends updating the Btrees. Probably I need to get to
grips with erlview to make this comparison.

But so far it's clear that the context switches between Erlang and
Javascript are likely to be insignificant, since this is included within the
3.83 seconds overhead. That 3.83 seconds also includes all the
Erlang->JSON->Javascript conversion.

Regards,

Brian.

Mime
View raw message