couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: A simple comparative benchmark and look at include_docs performance.
Date Tue, 20 Apr 2010 17:58:29 GMT
Hi Chris, for the type of access pattern in your benchmark I generally recommend to use emit(doc.model,
doc) and avoid include_docs=true.  include_docs introduces an extra lookup back in the DB
for every row of your view.  If you emit the document into the view index the index will get
large but streaming requests such as yours can be accomplished with a minimum of disk IO.

On the other hand, your sar report shows negligible iowait, so perhaps that's not your immediate
problem.  It may be the case that you're CPU-limited in the (pure Erlang) JSON encoder, although
I would've expected JSON encoding CPU usage to scale with network traffic.

You might try running eprof while you do this test.  It's quite heavyweight and will slow
your system down.  If you start couchdb with the -i flag you can get an Erlang shell and execute

1> eprof:start().
2> eprof:start_profiling(processes()).
... do your test ...
3> eprof:stop_profiling().
4> eprof:total_analyse().

you'll get a list of function calls sorted by the amount of wallclock time they take.  Step
2 in that sequence might fail with an error about being unable to trace a particular process;
short of explicitly listing the processes you want to trace, you can do something like

eprof:start_profiling(processes() -- [pid(0,123,456)])

to prevent the process with id <0.123.456> from being profiled.  Others might have a
more elegant solution.  Best,

Adam

On Apr 20, 2010, at 1:36 PM, Chris Stockton wrote:

> Hello,
> 
> I chose to do some benchmarking/profiling on view calls to try to
> understand the slow down we are experiencing. We are trying to better
> understand the numbers that couchdb is putting out so we may tune and
> optimize couchdb. Include docs is a big subject of our curiosity as it
> causes a 5x slowdown. I am interested in any reads or documents on
> scaling couchdb, hardware tips or any good discussion on performance.
> I am also curious if any effort has been put into native profiling
> within erlang? As my method of measurement was aimed towards a system
> diagnostic and not code level as I do not know erlang very well and I
> hear the curve is on the high end.
> 
> Our setup:
> We are makign calls from a isolated load test environment, from a
> web-server to a couchdb. I am doing system level profiling. The
> web-server machine has a 100mbps link to the couchdb server,
> accessable via a single network hop. Both machines have identical
> hardware, the CPU is a 8 core XEON 2.13GHz, 50gigz of memory with a
> disk array capable of 330MB of disk reads, roughly 7GB from cache. The
> machines were completely idle between requests. I chose sar, so please
> do not expect any pretty pictures =) I may give the raw output files
> on request to be graphed however. The sar command used, was something
> like: sar -n ALL -P ALL -urRwWbBd -p.
> 
> Our couchdb:
> The total documents in the database is 170 thousand, amounting to
> 301.2 MB of disk usage. Currently with a single view, getByModel, a
> very simple example-level view.
> 
> Here is what the view we benchmarked looks like, getByModel:
> function(doc) {
>  if(doc.model) {
>    emit(doc.model, null);
>  } else {
>    emit(null);
>  }
> }
> 
> Here is a example of a document, this is test data with no real meaning:
> {
>   "_id": "<ID>",
>   "_rev": "<REV>",
>   "data": {
>       "C365": "GORGEOUS BINGO",
>       "C6263": "A Action-Packed Display of a Sumo Wrestler And a Car
> who must Overcome a Waitress in A Baloon Factory",
>       "C9636": "2006",
>       "C12589": "4",
>       "C15835": "2.99",
>       "C1842": "108",
>       "C21963": "26.99",
>       "C24935": "R",
>       "C27126": "Deleted Scenes,Behind the Scenes",
>       "C30950": "Children"
>   },
>   "type": "model",
>   "model": "<UNIQID>"
> }
> 
> 
> Calling our view results in 13.7MB of data, taking roughly 7.7 seconds
> on average, giving us a throughput of 1.7 mbs. This is a reasonable
> amount, even if we desire slightly more. Below is the gathered system
> data throughout the lifespan of the call, with a half second buffer at
> end and beginning of request. A couple things to notice is the very
> large amount of CPU usage and a reasonable (in comparison to our next
> test) of network throughput.
> 
>  04:59:50 PM   cswch/s
>  Average:     12130.19
> 
>  04:59:50 PM       CPU     %user     %nice   %system   %iowait
> %steal     %idle
>  Average:          all     11.55      0.00      6.67      1.87
> 0.00     79.91
>  Average:            0     12.87      0.00      2.33      0.00
> 0.00     84.79
>  Average:            1     14.44      0.00     32.33     15.00
> 0.00     38.22
>  Average:            2      7.43      0.00      2.55      0.00
> 0.00     90.02
>  Average:            3     11.10      0.00      5.77      0.00
> 0.00     83.13
>  Average:            4     13.01      0.00      1.89      0.00
> 0.00     85.09
>  Average:            5     26.50      0.00      7.10      0.00
> 0.00     66.41
>  Average:            6      6.89      0.00      1.22      0.00
> 0.00     91.89
>  Average:            7      0.22      0.00      0.44      0.00
> 0.00     99.33
> 
>  04:59:50 PM       tps      rtps      wtps   bread/s   bwrtn/s
>  Average:     14599.33      0.00  14599.33      0.00 223554.27
> 
>  04:59:50 PM   frmpg/s   bufpg/s   campg/s
>  Average:       -10.77      0.44     -9.99
> 
>  Average:        IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s
> rxcmp/s   txcmp/s  rxmcst/s
>  Average:         eth0   7241.51  19118.98 506945.06 2871571.70
> 0.00      0.00      0.00
> 
>  Average:        IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s
> txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
>  Average:         eth0      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00
> 
>  04:59:57 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
> avgqu-sz     await     svctm     %util
>  Average:         sda2   2642.29      0.00  74518.09     28.20
> 0.27      0.10      0.10     26.80
> 
>  04:59:50 PM    call/s retrans/s    read/s   write/s  access/s  getatt/s
>  Average:         0.00      0.00      0.00      0.00      0.00      0.00
> 
>  04:59:50 PM   scall/s badcall/s  packet/s     udp/s     tcp/s
> hit/s    miss/s   sread/s  swrite/s saccess/s
>  Average:         0.00      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00	0.00
> 
>  04:59:50 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
>  Average:         0.00  37266.59   6216.32      0.00
> 
>  04:59:50 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
> kbswpfree kbswpused  %swpused  kbswpcad
>  Average:       724271  48709493     98.53    324688  46261891
> 8388432       168      0.00         0
> 
> 
> Below is the same view, but with include_docs = true. We see we are
> still extremely hungry for CPU, but notice a extreme drop in our
> network throughput. Our download speed dropped to roughly 400kbs.
> 
>  05:03:47 PM   cswch/s
>  Average:      2185.17
> 
>  05:03:47 PM       CPU     %user     %nice   %system   %iowait
> %steal     %idle
>  Average:          all     11.76      0.00      1.62      0.00
> 0.00     86.61
>  Average:            0      9.89      0.00      0.50      0.00
> 0.00     89.60
>  Average:            1      0.15      0.00      0.45      0.00
> 0.00     99.39
>  Average:            2      9.32      0.00      0.45      0.00
> 0.00     90.23
>  Average:            3      7.65      0.00      0.94      0.00
> 0.00     91.41
>  Average:            4     44.35      0.00      2.36      0.00
> 0.00     53.30
>  Average:            5      5.78      0.00      7.33      0.00
> 0.00     86.90
>  Average:            6     16.90      0.00      0.79      0.00
> 0.00     82.31
>  Average:            7      0.06      0.00      0.16      0.00
> 0.00     99.77
> 
>  05:03:47 PM  pswpin/s pswpout/s
>  Average:         0.00      0.00
> 
>  05:03:47 PM       tps      rtps      wtps   bread/s   bwrtn/s
>  Average:        25.42      0.00     25.42      0.00    363.22
> 
>  05:03:47 PM   frmpg/s   bufpg/s   campg/s
>  Average:         0.30      0.38      1.35
> 
>  05:03:47 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s
> rxcmp/s   txcmp/s  rxmcst/s
>  Average:         eth0   1401.63   1403.10  98196.90 722379.00
> 0.00      0.00      0.00
> 
>  05:03:47 PM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s
> txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
>  Average:         eth0      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00
> 
>  05:03:47 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
> avgqu-sz     await     svctm     %util
>  Average:         sda2      5.14      0.00    121.07     23.55
> 0.00      0.26      0.05      0.02
> 
>  05:03:47 PM    call/s retrans/s    read/s   write/s  access/s  getatt/s
>  Average:         0.00      0.00      0.00      0.00      0.00      0.00
> 
>  05:03:47 PM   scall/s badcall/s  packet/s     udp/s     tcp/s
> hit/s    miss/s   sread/s  swrite/s saccess/s
>  Average:         0.00      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00      0.00
> 
>  05:03:47 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
>  Average:         0.00     60.54     68.39      0.00
> 
>  05:03:47 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
> kbswpfree kbswpused  %swpused  kbswpcad
>  Average:       730009  48703755     98.52    324807  46262164
> 8388432       168      0.00         0
> 
> 
> Now this test is 4x of the above request in paralell, this one worries
> us the most. We skyrocket to 190% cpu utilization across all cores,
> but notice it seems the total throughput is shared among all the
> requests, with only a total of 100K more bytes available then a single
> request. I.E. Above we put out 400kbs, we now are to 125kbs per
> connection, this saturation increases and persist to 8, 16, etc.
> 
>  05:10:56 PM   cswch/s
>  Average:     55929.10
> 
>  05:10:56 PM       CPU     %user     %nice   %system   %iowait
> %steal     %idle
>  Average:          all     21.32      0.00      4.49      0.00
> 0.00     74.18
>  Average:            0     25.81      0.00      2.65      0.00
> 0.00     71.54
>  Average:            1      9.74      0.00      4.31      0.02
> 0.00     85.93
>  Average:            2     22.06      0.00      2.18      0.00
> 0.00     75.76
>  Average:            3     22.78      0.00      7.32      0.00
> 0.00     69.90
>  Average:            4     22.13      0.00      2.08      0.00
> 0.00     75.79
>  Average:            5     22.93      0.00      7.88      0.00
> 0.00     69.19
>  Average:            6     22.59      0.00      2.20      0.00
> 0.00     75.21
>  Average:            7     22.56      0.00      7.33      0.00
> 0.00     70.11
> 
>  05:10:56 PM  pswpin/s pswpout/s
>  Average:         0.00      0.00
> 
>  05:10:56 PM       tps      rtps      wtps   bread/s   bwrtn/s
>  Average:        -1.98      0.37      1.37
> 
>  05:10:56 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s
> rxcmp/s   txcmp/s  rxmcst/s
>  Average:         eth0   1642.34   1648.99 115044.53 847697.31
> 0.00      0.00      0.00
> 
>  05:10:56 PM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s
> txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
>  Average:         eth0      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00
> 
>  05:10:56 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
> avgqu-sz     await     svctm     %util
>  Average:         sda2      5.14      0.00    120.68     23.47
> 0.00      0.31      0.15      0.08
> 
>  05:10:56 PM    call/s retrans/s    read/s   write/s  access/s  getatt/s
>  Average:         0.00      0.00      0.00      0.00      0.00      0.00
> 
>  05:10:56 PM   scall/s badcall/s  packet/s     udp/s     tcp/s
> hit/s    miss/s   sread/s  swrite/s saccess/s sgetatt/s
>  Average:         0.00      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00      0.00
> 
>  05:10:56 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
>  Average:         0.00     60.34     77.00      0.00
> 
>  05:17:38 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
> kbswpfree kbswpused  %swpused  kbswpcad
>  Average:       726990  48706774     98.53    325232  46263715
> 8388432       168      0.00         0
> 
> 
> I am open to offering other statistics or benchmarks if such
> information is requested.
> 
> Kind Regards,
> 
> -Chris


Mime
View raw message