Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A00B49588 for ; Sun, 4 Mar 2012 17:40:42 +0000 (UTC) Received: (qmail 6219 invoked by uid 500); 4 Mar 2012 17:40:42 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 6175 invoked by uid 500); 4 Mar 2012 17:40:42 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 6167 invoked by uid 99); 4 Mar 2012 17:40:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Mar 2012 17:40:42 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [80.244.253.218] (HELO mail.traeumt.net) (80.244.253.218) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Mar 2012 17:40:37 +0000 Received: from [10.0.0.10] (91-64-198-154-dynip.superkabel.de [91.64.198.154]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.traeumt.net (Postfix) with ESMTPSA id D942B3CE88 for ; Sun, 4 Mar 2012 18:40:15 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1257) Subject: Re: Please report your indexing speed From: Jan Lehnardt In-Reply-To: <11B987B0-8C27-4C68-8DA7-7C56488702C9@apache.org> Date: Sun, 4 Mar 2012 18:40:15 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <4969999D-A6C0-469A-9120-D4C5CC2526F1@apache.org> References: <11B987B0-8C27-4C68-8DA7-7C56488702C9@apache.org> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org On Mar 4, 2012, at 18:24 , Jan Lehnardt wrote: > I updated the google doc with results from an EC2 cc1.4xlarge instance = (details are in the spreadsheet) >=20 > This on EBS and Ubuntu 11.04/64. >=20 > The results are bit different from the previous machine, but that = isn't at all unexpected. >=20 > tl;dr: for small docs (10bytes, 100bytes) 1.2.x-filipe beats 1.2.x and = 1.1.1 , for large docs (1000bytes), 1.2.x beats 1.2.x-filipe (6% = difference). Hah, I re-read through the results to make sure this is correct and I = found a mistake. A copy and paste formula error accounted for bigger = improvements of 1.2.x-filipe. This includes all my previous results. The good thing is 1.2.x-filipe is still faster, across the board than = 1.1.1 and 1.2.x. Still significantly, but not *as* much as about 30% in = all but one case. The tl;dr for the EC2 run can now be changed to that 1.2.x-filipe beats = 1.1.1 and 1.2.x for all docs, it's just that for large docs (1000bytes), = 1.2.x is faster than 1.1.1. But 1.2.x-filipe is even faster. > So far, across the board, 1.2.x-filipe is ~16% faster (stdev 9%) = than 1.1.1 for view builds. If you have any more hardware I could run this on, I'm happy to help = with the setup, it isn't hard :) Cheers Jan -- >=20 > This still makes me want to include Filipe's patch into 1.2.x. >=20 > Cheers > Jan > --=20 >=20 > On Mar 4, 2012, at 10:24 , Jan Lehnardt wrote: >=20 >> Hey all, >>=20 >> I made another run with a bit of a different scenario. >>=20 >>=20 >> # The Scenario >>=20 >> I used a modified benchbulk.sh for inserting data (because it is an = order of magnitude faster than the other methods we had). I added a = command line parameter to specify the size of a single document in bytes = (this was previously hardcoded in the script). Note that this script = creates docs in a btree-friendly incrementing ID way. >>=20 >> I added a new script benchview.sh which is basically the lower part = of Robert Newson's script. It creates a single view and queries it, = measuring execution time of curl. >>=20 >> And a third matrix.sh (yay) that would run, on my system, different = configurations. >>=20 >> See https://gist.github.com/1971611 for the scripts. >>=20 >> I ran ./benchbulk $size && ./benchview.sh for the following = combinations, all on Mac OS X 10.7.3, Erlang R15B, Spidermonkey 1.8.5: >>=20 >> - Doc sizes 10, 100, 1000 bytes >> - CouchDB 1.1.1, 1.2.x (as of last night), 1.2.x-filipe (as of last = night + Filipe's patch from earlier in the thread) >> - On an SSD and on a 5400rpm internal drive. >>=20 >> I ran each individual test three times and took the average to = compare numbers. The full report (see below) includes each individual = run's numbers) >>=20 >> (The gist includes the raw output data from matrix.sh for the 5400rpm = run, for the SSDs, I don't have the original numbers anymore. I'm happy = to re-run this, if you want that data as well.) >>=20 >> # The Numbers >>=20 >> See = https://docs.google.com/spreadsheet/ccc?key=3D0AhESVUYnc_sQdDJ1Ry1KMTQ5enB= DY0s1dHk2UVEzMHc for the full data set. It'd be great to get a second = pair of eyes to make sure I didn't make any mistakes. >>=20 >> See the "Grouped Data" sheet for comparisons. >>=20 >> tl;dr: 1.2.x is about 30% slower and 1.2.x-filipe is about 30% faster = than 1.1.1 in the scenario above. >>=20 >>=20 >> # Conclusion >>=20 >> +1 to include Filipe's patch into 1.2.x. >>=20 >>=20 >>=20 >> I'd love any feedback on methods, calculations and whatnot :) >>=20 >> Also, I can run more variations, if you like, other Erlang or = SpiderMokney versions e.g., just let me know. >>=20 >>=20 >> Cheers >> Jan >> --=20 >>=20 >> On Feb 28, 2012, at 14:17 , Jason Smith wrote: >>=20 >>> Forgive the clean new thread. Hopefully it will not remain so. >>>=20 >>> If you can, would you please clone = https://github.com/jhs/slow_couchdb >>>=20 >>> And build whatever Erlangs and CouchDB checkouts you see fit, and = run >>> the test. For example: >>>=20 >>> docs=3D500000 ./bench.sh small_doc.tpl >>>=20 >>> That should run the test and, God willing, upload the results to a >>> couch in the cloud. We should be able to use that information to >>> identify who you are, whether you are on SSD, what Erlang and Couch >>> build, and how fast it ran. Modulo bugs. >>=20 >=20