Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 48126 invoked from network); 20 Dec 2010 23:01:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Dec 2010 23:01:44 -0000 Received: (qmail 21676 invoked by uid 500); 20 Dec 2010 23:01:40 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 21646 invoked by uid 500); 20 Dec 2010 23:01:40 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 21628 invoked by uid 99); 20 Dec 2010 23:01:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Dec 2010 23:01:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sebastiancohnen@googlemail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bw0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Dec 2010 23:01:35 +0000 Received: by bwz8 with SMTP id 8so3514493bwz.35 for ; Mon, 20 Dec 2010 15:01:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:content-type:mime-version :subject:from:in-reply-to:date:content-transfer-encoding:message-id :references:to:x-mailer; bh=hT5KvSLE2E+Mqobh2FLHEIhWjRlGDZU+EGTYQywM7Rk=; b=lbASdMm1hlJ3c4d55HYlp5pkJ7bxpNzArsO8P1VTn0bH2btz2O/IFi4V2m580eK5h9 YQFg47z3JC9SeuSMBSTpS5sGdsdr0D8B68nCA3LKa8gPBJiuCjzOExyjPXCxIX8R5Yzn GE4cr9wRDLuRqe7ALKvnem2PwrbnT/N998aJA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=lwr2039g5E0Sk5KXzbkY5IKXQkO4Vp0ztWufW1V1qmTwcGNsIb+XTuwff06RRvmE8a +sq9SNy2WMYnAqyFFBqW6iRbug+7SIbutjHrUYvHd+FndMEX7UgM902NbFn+u/dZnjMa 0rn+m7IeteKg58mDzN7XN0EVI47oNgQCSjeRU= Received: by 10.204.46.154 with SMTP id j26mr3976001bkf.134.1292886073787; Mon, 20 Dec 2010 15:01:13 -0800 (PST) Received: from shakti.fritz.box (koln-5d814f55.pool.mediaWays.net [93.129.79.85]) by mx.google.com with ESMTPS id v1sm3988385bkt.17.2010.12.20.15.01.11 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 20 Dec 2010 15:01:12 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) Subject: Re: Comparison of MongoDB & CouchDB: MongoDB seems better on insert From: Sebastian Cohnen In-Reply-To: Date: Tue, 21 Dec 2010 00:01:10 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <3F821DE2-2DC5-47A8-B7F6-CE8B0B131693@googlemail.com> References: <0D367803-5B0E-4C84-9517-5EDCD96DB114@apache.org> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1082) On 20.12.2010, at 23:24, Paul Davis wrote: > On Mon, Dec 20, 2010 at 5:20 PM, Sebastian Cohnen > wrote: >> question inside :) >>=20 >> On 20.12.2010, at 23:02, Jan Lehnardt wrote: >>=20 >>> Hi, >>>=20 >>> On 20 Dec 2010, at 22:32, Chenini, Mohamed wrote: >>>=20 >>>> Hi, >>>>=20 >>>> I found this info on the net at = http://www.slideshare.net/danglbl/schemaless-databases >>>> [...] >>>> Does anyone knows if this was verified? >>>=20 >>> I think the author's comment on slide 35 sums it up pretty nicely: >>>=20 >>> "Of course this is just one (lame) test." >>>=20 >>> Coming up good numbers is hard which means that people with easy = ways to make them come up with bad ones. >>>=20 >>> I've written about the difficulties on benchmarks databases on my = blog: >>>=20 >>> = http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wro= ng.html >>> = http://jan.prima.de/~jan/plok/archives/176-Caveats-of-Evaluating-Databases= .html >>>=20 >>> They should give you a few pointers on why this is hard. >>>=20 >>> -- >>>=20 >>> To the point: CouchDB generally performs best with concurrent load. = In the case of loading data into CouchDB, bulk requests* will speed up = things again. To push CouchDB to a write limit, you want to use = concurrent bulk requests (specific numbers will depend on your data and = hardware). >>=20 >> Does this really speed up things? I've tried this approach = (concurrent bulk inserts) with small/big docs and small/big bulk chunk = sizes: the difference was not significant. I thought this was = reasonable, since writes are serialized anyways. The setup was one box = generating documents, creating bulks and keep them in memory and bulk = insert batches of complete docs (incl. simple monotonic increasing ints = as doc ids) to another node. delayed commit was off. >>=20 >=20 > I think delayed commit would need to be on there otherwise you'll be > hitting fsync barriers for every bulk docs call which are serialized > by the updater. Theoretically the speedups would come from letting the > kernel manage the file buffers and what not. delayed_commit was off because I needed to test insertion of lots of = data (more than what would fit nicely into memory). I wanted to figure = out, if normal bulk vs concurrent bulks does have an impact on insert = performance. the difference was, as I said, not significant better or = worse. btw: I didn't saturated the disks (mid-classed SSDs), since couch = was eating up the CPU (3GHz Core 2 Duo). This was some time ago, maybe = this is more disk bound now. >=20 >>>=20 >>> * http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API >>>=20 >>> Unfortunately this means that these one-off benchmarks don't show = any good numbers for CouchDB, yet fortunately this shows easily that = these one-off benchmarks don't really reflect common real-world usage = and should be discouraged. >>>=20 >>> Hope that helps, let us know if you have any more questions :) >>>=20 >>> Cheers >>> Jan >>> -- >>>=20 >>=20 >>=20