From user-return-14141-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Thu Dec 09 16:32:02 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 51215 invoked from network); 9 Dec 2010 16:32:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Dec 2010 16:32:02 -0000 Received: (qmail 2197 invoked by uid 500); 9 Dec 2010 16:32:01 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 2007 invoked by uid 500); 9 Dec 2010 16:31:58 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 1995 invoked by uid 99); 9 Dec 2010 16:31:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 16:31:57 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adam.kocoloski@gmail.com designates 209.85.212.52 as permitted sender) Received: from [209.85.212.52] (HELO mail-vw0-f52.google.com) (209.85.212.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 16:31:52 +0000 Received: by vws13 with SMTP id 13so1771382vws.11 for ; Thu, 09 Dec 2010 08:31:31 -0800 (PST) Received: by 10.220.179.1 with SMTP id bo1mr2526326vcb.131.1291912288979; Thu, 09 Dec 2010 08:31:28 -0800 (PST) Received: from [10.1.10.164] (c-66-31-20-188.hsd1.ma.comcast.net [66.31.20.188]) by mx.google.com with ESMTPS id e18sm499777vcf.12.2010.12.09.08.31.27 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 09 Dec 2010 08:31:27 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) Subject: Re: view building batch sizes From: Adam Kocoloski In-Reply-To: Date: Thu, 9 Dec 2010 11:31:25 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <062561F4-3C4A-4824-95C6-F26DDAB004FC@netdev.co.uk> <2C192E38-C044-4E8E-B907-2549A80FB11C@apache.org> <4C38A0B1-622E-4404-98F3-DCB2EECD8011@apache.org> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1082) On Dec 9, 2010, at 10:49 AM, Paul Davis wrote: > On Thu, Dec 9, 2010 at 10:47 AM, Jan Lehnardt wrote: >>=20 >> On 9 Dec 2010, at 15:37, Paul Davis wrote: >>=20 >>> On Thu, Dec 9, 2010 at 7:51 AM, Jan Lehnardt wrote: >>>> Hi Huw, >>>>=20 >>>>=20 >>>> On 9 Dec 2010, at 13:32, Huw Selley wrote: >>>>=20 >>>>> Hi, >>>>>=20 >>>>> I read on http://guide.couchdb.org/draft/performance.html that >>>>>=20 >>>>> "Views load a batch of updates from disk, pass them through the = view engine, and then write the view rows out. Each batch is a few = hundred documents, so the writer can take advantage of the bulk = efficiencies we see in the next section." >>>>>=20 >>>>> Is there a method to change the batch size? I would like to try = measure the impact of using smaller and larger batches. >>>>=20 >>>> Thanks for helping to profile things. You may want to take this to >>>> dev@couchdb.apache.org as it is the development-related mailing = list. >>>>=20 >>>> For tuning these values, see src/couchdb/couch_view_updater.erl >>>>=20 >>>> The `update()` function has these lines: >>>>=20 >>>> {ok, MapQueue} =3D couch_work_queue:new(100000, 500), >>>> {ok, WriteQueue} =3D couch_work_queue:new(100000, 500), >>>>=20 >>>> They set up a queue for mapping and writing each. The parameters = are >>>>=20 >>>> couch_work_queue:new(MaxSize, MaxItems) >>>>=20 >>>> If either maximum is hit, the queue is deemed full. >>>>=20 >>>> Note: This is from about 30 seconds of looking at the source, so I >>>> might miss a subtlety or three. >>>>=20 >>>> Cheers >>>> Jan >>>> -- >>>>=20 >>>>=20 >>>>=20 >>>=20 >>> The only real subtlety is that we don't wait for a minimum amount to >>> be inserted into the queue. Playing with larger or smaller queues on >>> either side might be an interesting bit. Also, for testing it might >>> not be a bad idea to add config values for these values. >>=20 >>=20 >> Good thinking, I made a patch: >>=20 >> = https://github.com/janl/couchdb/commit/547691a9f4b9895086f2763af84e1cc459e= 4d72c >>=20 >> Branch: >>=20 >> https://github.com/janl/couchdb/tree/config-view-batches >>=20 >> "Compiles for me". >>=20 >> To make this proper, we probably want to move the lookups into >> couch_view_group:init/3 and pass the values down, but it should >> be ok as is. >>=20 >=20 > Probably not worth it. ets looks like that are quick, and as a > percentage of a view build are going to be fairly inconsequential. > Though, you could make a case about coupling. This issue was initially raised in = https://issues.apache.org/jira/browse/COUCHDB-700 . I'm not opposed to = making the queue sizes configurable, but I think the more important fix = by far is to be able to configure a minimum number of items in the work = unit sent to the reducer. Cheers, Adam=