Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1257)
Subject: Re: Please report your indexing speed
From: Jan Lehnardt <jan@apache.org>
In-Reply-To: <3A380F55-B1E3-4710-AAFF-1E403EB8E570@dionne-associates.com>
Date: Sun, 4 Mar 2012 18:28:08 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <D602359B-A2A8-4AEE-8CFC-62D7027FD0A8@apache.org>
References: 
 <CAN-3CBKk7rykwrwdWwizwPGOUk41xP-uYFG-i49=nZsOMZe-VQ@mail.gmail.com>
 <BCFF40D0-D247-4AEE-9700-31C148004C68@apache.org>
 <3A380F55-B1E3-4710-AAFF-1E403EB8E570@dionne-associates.com>
To: dev@couchdb.apache.org


On Mar 4, 2012, at 13:03 , Bob Dionne wrote:

> Great Jan, so this confirms my back of the envelope test using Bob's =
script and Filipe's results. The patch is definitely helpful.=20
>=20
> I was wondering why no one had looked at test/bench, perhaps this more =
rigorous approach could provide the basis for a comprehensive =
performance tool

Good call!

I'd really like that our current efforts morph into a situation where we =
can `make perf` and get a bunch of good results to compare to other =
builds' `make perf`. Down the road, though, I think we need to write =
Erlang tools to do that, so Windows users can run them without too much =
trouble. (we could also bundle whatever scripting environment or C-based =
binaries with the builds, but since we already ship Erlang, we might as =
well use it :)

Cheers
Jan
--=20


>=20
> On Mar 4, 2012, at 4:24 AM, Jan Lehnardt wrote:
>=20
>> Hey all,
>>=20
>> I made another run with a bit of a different scenario.
>>=20
>>=20
>> # The Scenario
>>=20
>> I used a modified benchbulk.sh for inserting data (because it is an =
order of magnitude faster than the other methods we had). I added a =
command line parameter to specify the size of a single document in bytes =
(this was previously hardcoded in the script). Note that this script =
creates docs in a btree-friendly incrementing ID way.
>>=20
>> I added a new script benchview.sh which is basically the lower part =
of Robert Newson's script. It creates a single view and queries it, =
measuring execution time of curl.
>>=20
>> And a third matrix.sh (yay) that would run, on my system, different =
configurations.
>>=20
>> See https://gist.github.com/1971611 for the scripts.
>>=20
>> I ran ./benchbulk $size && ./benchview.sh for the following =
combinations, all on Mac OS X 10.7.3, Erlang R15B, Spidermonkey 1.8.5:
>>=20
>> - Doc sizes 10, 100, 1000 bytes
>> - CouchDB 1.1.1, 1.2.x (as of last night), 1.2.x-filipe (as of last =
night + Filipe's patch from earlier in the thread)
>> - On an SSD and on a 5400rpm internal drive.
>>=20
>> I ran each individual test three times and took the average to =
compare numbers. The full report (see below) includes each individual =
run's numbers)
>>=20
>> (The gist includes the raw output data from matrix.sh for the 5400rpm =
run, for the SSDs, I don't have the original numbers anymore. I'm happy =
to re-run this, if you want that data as well.)
>>=20
>> # The Numbers
>>=20
>> See =
https://docs.google.com/spreadsheet/ccc?key=3D0AhESVUYnc_sQdDJ1Ry1KMTQ5enB=
DY0s1dHk2UVEzMHc for the full data set. It'd be great to get a second =
pair of eyes to make sure I didn't make any mistakes.
>>=20
>> See the "Grouped Data" sheet for comparisons.
>>=20
>> tl;dr: 1.2.x is about 30% slower and 1.2.x-filipe is about 30% faster =
than 1.1.1 in the scenario above.
>>=20
>>=20
>> # Conclusion
>>=20
>> +1 to include Filipe's patch into 1.2.x.
>>=20
>>=20
>>=20
>> I'd love any feedback on methods, calculations and whatnot :)
>>=20
>> Also, I can run more variations, if you like, other Erlang or =
SpiderMokney versions e.g., just let me know.
>>=20
>>=20
>> Cheers
>> Jan
>> --=20
>>=20
>> On Feb 28, 2012, at 14:17 , Jason Smith wrote:
>>=20
>>> Forgive the clean new thread. Hopefully it will not remain so.
>>>=20
>>> If you can, would you please clone =
https://github.com/jhs/slow_couchdb
>>>=20
>>> And build whatever Erlangs and CouchDB checkouts you see fit, and =
run
>>> the test. For example:
>>>=20
>>>  docs=3D500000 ./bench.sh small_doc.tpl
>>>=20
>>> That should run the test and, God willing, upload the results to a
>>> couch in the cloud. We should be able to use that information to
>>> identify who you are, whether you are on SSD, what Erlang and Couch
>>> build, and how fast it ran. Modulo bugs.
>>=20
>=20