incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipe David Manana <fdman...@apache.org>
Subject Re: Please report your indexing speed
Date Mon, 05 Mar 2012 07:41:43 GMT
On Sun, Mar 4, 2012 at 9:45 AM, Bob Dionne <dionne@dionne-associates.com> wrote:
> yes, I was surprised by the 30% claim as my numbers showed it only getting back to where
we were with 1.1.x
>
> I think Bob's suggestion to get to the root code change that caused this regression is
important as it will help us assess all the other cases this testing hasn't even touched yet

The explanation I gave in the 1.2.0 second round vote identifies the
reason, which is that the updater is (depending on timings) collecting
smaller batches of map results, which makes the btree updates less
efficient (besides higher number of btree updates). The patch
addresses this by queuing a batch of map results instead of queuing
map results one by one. Jan's tests and mine are evidence that this is
valid in practice and not just theory.

The original main goal of COUCHDB-1186 was to make the indexing of
views that emit reasonably large (or complex in structure) map values
more efficient.
Here's an example using Jason's slow_couchdb script with wow.tpl and
map function of  "function(doc) {emit([doc.type, doc.category],
doc);}":

1.1.x:

fdmanana 07:04:12 ~/git/hub/slow_couchdb (master)> docs=200000
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.1.2a785d32f-git (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.1.2a785d32f-git"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
(....)
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":200000,"offset":0,"rows":[
{"id":"00144af5-9f07-448e-af88-026674e3e3d0","key":["dwarf","assassin"],"value":{"_id":"00144af5-9f07-448e-af88-026674e3e3d0","_rev":"1-785fbf5e641f3d10fa083501ad82a9fe","data3":"Vl6BftQEWY6imvNs0FasOj2CrPCptP70z5d","ratio":1.6,"integers":[48028,3170,54066,95547,70643,23763,25804,33180,89061,35274,48244,91792,37936,11855],"category":"assassin","nested":{"dict":{"3XGVdTTF":31490,"SDxKa54e":40,"XIzUloRH":7,"5Mj9F4bp":192,"1sXfjgYf":1203,"XP5YSqhX":25461,"QJr0Xhxn":9941},"string1":"3Q4tvmhHwKvedKiRnoL6xUz","string2":"dWI1mrrAypRh","values":[33712,57371,88567,88361,70873,6327,17326,91004,41840,86257],"string3":"i7OGysnXvynz41VMQJ","coords":[{"x":65350.46,"y":103881.18},{"x":24180.14,"y":8474.9},{"x":88326.66,"y":43151.76},{"x":120199.77,"y":102270.29},{"x":191924.18,"y":74479.75}]},"level":21,"type":"dwarf","data1":"Vpkplo80LshlcjBE0ySJNNpfgDy2bu8byWrmZ44B","data2":"GnyNbos75Wxm1C5MLdOeXNniHamBjld70vHqoJnEtnlfekuPXJ"}}
]}

real	2m49.227s
user	0m0.006s
sys	0m0.005s


1.2.x:

fdmanana 07:13:30 ~/git/hub/slow_couchdb (master)> docs=200000
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.2.0 (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.2.0"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
(....)
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":200000,"offset":0,"rows":[
{"id":"0005cd07-49f4-4a99-b506-acef948f2acc","key":["dwarf","assassin"],"value":{"_id":"0005cd07-49f4-4a99-b506-acef948f2acc","_rev":"1-4b418e69618bf11124a03e1a3845f071","data3":"T0W2JBUET9yzRXHfUqcUBwFhYGKh14YFVxk","ratio":1.6999999999999999556,"integers":[25658,7573,47779,43217,49586,57992,13549,90984,45253,49560,1643,64085,38381,62544],"category":"assassin","nested":{"dict":{"oWhW4jJ6":199,"EPSVtKtS":5638,"8WpzvD5x":73714,"stD9Ynfh":8924,"0qh5Nc1g":5994,"pBa5vJyy":18,"s25oAkRc":165270},"string1":"fNNHb8lxtcy7GpwSU3yRyaC","string2":"rilbiZM7yAaK","values":[49632,93665,73258,75675,4229,15742,16965,76825,22049,79829],"string3":"IwX09SiOLMSSyxffMB","coords":[{"x":179620.45000000001164,"y":11539.989999999999782},{"x":68483.820000000006985,"y":110559.19999999999709},{"x":67197.940000000002328,"y":96702.210000000006403},{"x":25469.869999999998981,"y":79049.490000000005239},{"x":157059.89999999999418,"y":34963.410000000003492}]},"level":6,"type":"dwarf","data1":"njpz38JSfz00p2Lc2Jv0dON7UfTljRgz0J2B7w7K","data2":"4hpsT2szDrssbUCHEirTzHOIhSxTd83i1FO5aNXgoGAfO2srH1"}}
]}

real	1m51.989s
user	0m0.006s
sys	0m0.004s


1.2.x + patch:

fdmanana 07:29:11 ~/git/hub/slow_couchdb (master)> docs=200000
batch=5000 ./bench.sh wow.tpl
Server: CouchDB/1.2.0 (Erlang OTP/R14B03)
{"couchdb":"Welcome","version":"1.2.0"}

[INFO] Created DB named `db1'
[INFO] Uploaded 5000 documents via _bulk_docs
(....)
[INFO] Uploaded 5000 documents via _bulk_docs
Building view.
{"total_rows":200000,"offset":0,"rows":[
{"id":"0005cd07-49f4-4a99-b506-acef948f2acc","key":["dwarf","assassin"],"value":{"_id":"0005cd07-49f4-4a99-b506-acef948f2acc","_rev":"1-4b418e69618bf11124a03e1a3845f071","data3":"T0W2JBUET9yzRXHfUqcUBwFhYGKh14YFVxk","ratio":1.6999999999999999556,"integers":[25658,7573,47779,43217,49586,57992,13549,90984,45253,49560,1643,64085,38381,62544],"category":"assassin","nested":{"dict":{"oWhW4jJ6":199,"EPSVtKtS":5638,"8WpzvD5x":73714,"stD9Ynfh":8924,"0qh5Nc1g":5994,"pBa5vJyy":18,"s25oAkRc":165270},"string1":"fNNHb8lxtcy7GpwSU3yRyaC","string2":"rilbiZM7yAaK","values":[49632,93665,73258,75675,4229,15742,16965,76825,22049,79829],"string3":"IwX09SiOLMSSyxffMB","coords":[{"x":179620.45000000001164,"y":11539.989999999999782},{"x":68483.820000000006985,"y":110559.19999999999709},{"x":67197.940000000002328,"y":96702.210000000006403},{"x":25469.869999999998981,"y":79049.490000000005239},{"x":157059.89999999999418,"y":34963.410000000003492}]},"level":6,"type":"dwarf","data1":"njpz38JSfz00p2Lc2Jv0dON7UfTljRgz0J2B7w7K","data2":"4hpsT2szDrssbUCHEirTzHOIhSxTd83i1FO5aNXgoGAfO2srH1"}}
]}

real	1m45.573s
user	0m0.006s
sys	0m0.004s


Unless someone comes up with scenarios where 1.2.x with the patch is
significantly slower than 1.1.x, I think we should close this and move
to release 1.2.0.

Thanks all for the testing.

>
> On Mar 3, 2012, at 5:25 PM, Bob Dionne wrote:
>
>> I ran some tests, using Bob's latest script. The first versus the second clearly
show the regression. The third is the 1.2.x with the patch
>> to couch_os_process reverted and it seems to have no impact. The last has Filipe's
latest patch to couch_view_updater discussed in the
>> other thread and it brings the performance back to the 1.1.x level.
>>
>> I'd say that patch is a +1
>>
>> 1.2.x
>> real  3m3.093s
>> user  0m0.028s
>> sys   0m0.008s
>> ==================
>> 1.1.x
>> real  2m16.609s
>> user  0m0.026s
>> sys   0m0.007s
>> =================
>> 1.2.x with patch to couch_os_process reverted
>> real  3m7.012s
>> user  0m0.029s
>> sys   0m0.008s
>> =================
>> 1.2.x with Filipe's katest patch to couch_view_updater
>> real  2m11.038s
>> user  0m0.028s
>> sys   0m0.007s
>> On Feb 28, 2012, at 8:17 AM, Jason Smith wrote:
>>
>>> Forgive the clean new thread. Hopefully it will not remain so.
>>>
>>> If you can, would you please clone https://github.com/jhs/slow_couchdb
>>>
>>> And build whatever Erlangs and CouchDB checkouts you see fit, and run
>>> the test. For example:
>>>
>>>   docs=500000 ./bench.sh small_doc.tpl
>>>
>>> That should run the test and, God willing, upload the results to a
>>> couch in the cloud. We should be able to use that information to
>>> identify who you are, whether you are on SSD, what Erlang and Couch
>>> build, and how fast it ran. Modulo bugs.
>>
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Mime
View raw message