Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 98210 invoked from network); 5 Mar 2010 16:58:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Mar 2010 16:58:51 -0000 Received: (qmail 50928 invoked by uid 500); 5 Mar 2010 16:58:36 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 50886 invoked by uid 500); 5 Mar 2010 16:58:36 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 50877 invoked by uid 99); 5 Mar 2010 16:58:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Mar 2010 16:58:36 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jchris@gmail.com designates 209.85.220.226 as permitted sender) Received: from [209.85.220.226] (HELO mail-fx0-f226.google.com) (209.85.220.226) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Mar 2010 16:58:27 +0000 Received: by fxm26 with SMTP id 26so1179023fxm.35 for ; Fri, 05 Mar 2010 08:58:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:content-type:mime-version :subject:from:in-reply-to:date:content-transfer-encoding:message-id :references:to:x-mailer; bh=Bv2rwA6V5OGdY+Bz1k9kTIrBEygvJwFIhtvxYXp7MWo=; b=s1wQz6uBuQF/7LbTVag6GoFpVwkVemGXtFMSkvB8CbEOXDY3VDOuVf1BkcHpzbs9Jh JRn7n5BSVzEi6D2EsikZ8nckCSZQlVuyUl+X3gydovFcKpCOmLTda105Zb7YNFy+a526 gK0/xAIxp71hvI54R8bz9yc+5nVskjaWwXYOU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=FP7rR7HuhHmeD76troIOgujxyfKLTMJqP//4KMSxWsA7tvUvDh0tkj/1L9V7DWpqde Ok/k0Nj0QbqStFBa3gwzgcTOvnGl/6k4416UzIcW9sqV+Xjyj0aaY06rPngq1RAAzZk5 kxEdOPgxFFTrji8lmmBY2znzxma/w6fBcevHs= Received: by 10.223.127.201 with SMTP id h9mr1295606fas.56.1267808287168; Fri, 05 Mar 2010 08:58:07 -0800 (PST) Received: from [192.168.1.104] (c-98-248-172-14.hsd1.ca.comcast.net [98.248.172.14]) by mx.google.com with ESMTPS id 28sm3048700fkx.6.2010.03.05.08.58.04 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 05 Mar 2010 08:58:05 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1077) Subject: Re: Hitting the reduce overflow boundary From: J Chris Anderson In-Reply-To: Date: Fri, 5 Mar 2010 08:58:02 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <71DE26DB-136C-43CD-B6C3-F51B00FC6C8F@googlemail.com> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1077) X-Virus-Checked: Checked by ClamAV on apache.org On Mar 5, 2010, at 8:23 AM, Dirkjan Ochtman wrote: > On Fri, Mar 5, 2010 at 12:17, Dirkjan Ochtman = wrote: >> I would really like to have someone from the dev team speak up on = this >> one, since I'd kind of like to re-enable the reduce_limit option, but >> I don't think this view should be classified as overflowing. >=20 > I happily found Adam in IRC, who explained this to me: >=20 > 17:05 <+kocolosk> djc: so the current reduce_limit calculation is in = main.js > 17:06 <+kocolosk> the JSONified reduction needs to be less than 200 = bytes, and > it needs to be less than half of the size of the = input map > values > 17:06 <+kocolosk> you could try tweaking those to see which condition = you're > failing >=20 > The way I see it, the way the reduce phase should work is that the > result from an collection of documents should be smaller or not much > larger than the largest single object in the input set. This way, > you'll prevent the unbounded growth that you want to prevent. Such a > rule should also work on slightly larger inputs, because that should > just be a larger constant, not exponentional growth. >=20 > So I see two problems with the current rule: >=20 > - it has a fixed limit at 200b, which isn't very reasonable because a > larger size doesn't mean there's unbounded growth going on > - it assumes that all the values in the input map have about equal > size, which isn't really a requirement >=20 > Am I crazy, or would a scheme like I proposed above be an improvement? definitely. A patch to make the reduce_overflow_threshold configurable = (with a default of 200 bytes) would be a major improvement and not hard = to do. Chris >=20 > Cheers, >=20 > Dirkjan