Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 627 invoked from network); 14 Apr 2010 12:24:20 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Apr 2010 12:24:20 -0000 Received: (qmail 59258 invoked by uid 500); 14 Apr 2010 12:24:20 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 59202 invoked by uid 500); 14 Apr 2010 12:24:20 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 59194 invoked by uid 99); 14 Apr 2010 12:24:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Apr 2010 12:24:20 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adam.kocoloski@gmail.com designates 74.125.92.27 as permitted sender) Received: from [74.125.92.27] (HELO qw-out-2122.google.com) (74.125.92.27) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Apr 2010 12:24:10 +0000 Received: by qw-out-2122.google.com with SMTP id 5so9807qwi.29 for ; Wed, 14 Apr 2010 05:23:49 -0700 (PDT) Received: by 10.224.73.12 with SMTP id o12mr2628863qaj.53.1271247829903; Wed, 14 Apr 2010 05:23:49 -0700 (PDT) Received: from dhcp-18-111-24-36.dyn.mit.edu (dhcp-18-111-24-36.dyn.mit.edu [18.111.24.36]) by mx.google.com with ESMTPS id 22sm150924qyk.2.2010.04.14.05.23.46 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 14 Apr 2010 05:23:47 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1078) Subject: Re: batch=ok for bulk_docs and single doc implementation concerns From: Adam Kocoloski In-Reply-To: Date: Wed, 14 Apr 2010 08:23:45 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <7D9407D6-77B5-4AA1-9955-8E1E66424BC8@apache.org> References: To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1078) X-Virus-Checked: Checked by ClamAV on apache.org On Apr 14, 2010, at 7:59 AM, Matt Goodall wrote: > Hi, >=20 > Over in couchdb-python land someone wanted to use batch=3Dok when > creating and updating documents, so we added support. >=20 > I was semi-surprised to notice that _bulk_docs does not support > batch=3Dok. I realise _bulk_docs essentially is a batch update but a > _bulk_docs batch=3Dok would presumably allow CouchDB to buffer more in > memory before writing to disk. What are your thoughts? Its probably of limited utility. If you're already batching on the = client side, you can achieve the same effect by sending in a larger = batch. I'm not opposed to it per se, just don't think it will help with = throughput all that much. >=20 > Now, this buffering is where the "implementation concerns" come in. > According to the wiki: >=20 > "There is a query option batch=3Dok which can be used to achieve = higher > throughput at the cost of lower guarantees. When a PUT (or a document > POST as described below) is sent using this option, it is not > immediately written to disk. Instead it is stored in memory on a > per-user basis for a second or so (or the number of docs in memory > reaches a certain point). After the threshold has passed, the docs are > committed to disk." >=20 > However, unless I'm missing something (quite likely ;-)), there is no > "stored in memory on a per-user basis" or any check for when "the > number of docs in memory reaches a certain point". All it seems to do > is spawn a new process so the update happens when the Erlang scheduler > gets around to it. In fact, I don't see any reference to the > batch_save_interval and batch_save_size configuration options in the > code. The wiki describes the 0.10 implementation of batch=3Dok. In 0.11 batch = mode takes advantage of the fact that couch_db_updater always merges all = waiting updates to a DB into a single write, and so doesn't bother with = the separate set of supervised processes accumulating documents. In = effect the 0.11 batch=3Dok is "I'm not going to wait around, but save = this as soon as you get a chance". This does change the performance characteristics quite a bit; in = particular, when the underlying disk is fast the new batch=3Dok behavior = will result in significantly larger uncompacted databases. > Shouldn't batch=3Dok send the doc off to some background process that > accumulates docs until either the batch interval or size threshold has > been reached? This would also ensure that batch=3Dok updates are = handled > in the order they arrive, although I'm not sure if that matters given > that the user has basically said they don't care if it succeeds or not > by using batch=3Dok. I think the documents updates are still handled in the order in which = they were received. >=20 > - Matt Best, Adam=