From dev-return-4147-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Mon May 11 20:09:45 2009 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 86272 invoked from network); 11 May 2009 20:08:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 May 2009 20:08:14 -0000 Received: (qmail 40029 invoked by uid 500); 11 May 2009 20:08:13 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 39952 invoked by uid 500); 11 May 2009 20:08:13 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 39942 invoked by uid 99); 11 May 2009 20:08:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 May 2009 20:08:13 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.217.218 as permitted sender) Received: from [209.85.217.218] (HELO mail-gx0-f218.google.com) (209.85.217.218) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 May 2009 20:08:04 +0000 Received: by gxk18 with SMTP id 18so6147099gxk.11 for ; Mon, 11 May 2009 13:07:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=xXwe0y5Igwfddeu5do3H0HJoaMNIW2tVTbOhORRQYWA=; b=QuvBX1VlS18Y9ZmzTJKMCjZNdaa6v+37ZAo+u0uEFQdbHLz6oXd8uQVX/auU+GaKJY NFQ1u48LxO32LHQPz3OeVWDul3PnpJFCwugWeCGRdcNpCQdOJLvyY4P9baiX7uaq0O5R h+VECj4qiF0M9bIMr499ICAf6vpL9JfDFpJkU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=sqN9/N1eP07ijcD3B14/5CT970t7nL+u0lutZVALDLev7rEk4p5nPE9TZtKGWgI9o3 zTjFd/UhQaKPZ3erDZA7opNtX9Yvjwp1WaqygwhSpZintnZen4uQz2+LUCxjQBm9y7B9 zkxA4epROVJU4rTgRjlkvGUXhcc+hbsY2r/W4= MIME-Version: 1.0 Received: by 10.100.251.8 with SMTP id y8mr18183393anh.74.1242072463063; Mon, 11 May 2009 13:07:43 -0700 (PDT) In-Reply-To: References: Date: Mon, 11 May 2009 16:07:43 -0400 Message-ID: Subject: Re: Patch to couch_btree:chunkify From: Paul Davis To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, May 11, 2009 at 3:57 PM, Adam Kocoloski wrote: > I'd like to see some more concrete numbers on the performance difference = the > two versions. =A0I wasn't able to reproduce Chris' 10%+ speedup using > hovercraft:lightning; in fact, the two versions seem to be compatible wit= hin > measurement variance. > Hmm. My first question is if everyone is using the same parameters for inserting documents. I'm pretty sure that this patch isn't going to make too much of a difference unless the size of a _bulk_save is big enough to make the binary_to_term's noticeable. Other than that, the only thing I could point at for differences would be Erlang VM version or hardware. > I tried messing around with fprof for a while today, and if anything it > indicates that the original version might actually be faster (though I fi= nd > that hard to believe). =A0Anyway, I think we should get in the habit of h= aving > some quantitative, reproducible way of evaluating performance-related > patches. > We definitely need some standard benchmarks to make sure we're not getting performance regressions else where. > +1 for Bob's suggestion of stripping out Bt from the arguments, though. > Sounds good. > Adam > > On May 11, 2009, at 3:28 PM, Damien Katz wrote: > >> +1 for committing. >> >> -Damien >> >> >> On May 10, 2009, at 9:49 PM, Paul Davis wrote: >> >>> Chris reminded me that I had an optimization patch laying around for >>> couch_btree:chunkify and his tests show that it gets a bit of a speed >>> increase when running some tests with hovercraft. The basic outline of >>> what I did was to swap a call like term_to_binary([ListOfTuples]) to a >>> sequence of ListOfSizes =3D lists:map(term_to_binary, ListOfTuples), >>> Size =3D sum(ListOfSizes), and then when we go through the list of >>> tuples to split them into chunks I use the pre calculated sizes. >>> >>> Anyway, I just wanted to run it across the list before I commit it in >>> case anyone sees anything subtle I might be missing. >>> >>> chunkify(_Bt, []) -> >>> =A0[]; >>> chunkify(Bt, InList) -> >>> =A0ToSize =3D fun(X) -> size(term_to_binary(X)) end, >>> =A0SizeList =3D lists:map(ToSize, InList), >>> =A0TotalSize =3D lists:sum(SizeList), >>> =A0case TotalSize of >>> =A0Size when Size > ?CHUNK_THRESHOLD -> >>> =A0 =A0 =A0NumberOfChunksLikely =3D ((Size div ?CHUNK_THRESHOLD) + 1), >>> =A0 =A0 =A0ChunkThreshold =3D Size div NumberOfChunksLikely, >>> =A0 =A0 =A0chunkify(Bt, InList, SizeList, ChunkThreshold, [], 0, []); >>> =A0_Else -> >>> =A0 =A0 =A0[InList] >>> =A0end. >>> >>> chunkify(_Bt, [], [], _Threshold, [], 0, Chunks) -> >>> =A0lists:reverse(Chunks); >>> chunkify(_Bt, [], [], _Threshold, OutAcc, _OutAccSize, Chunks) -> >>> =A0lists:reverse([lists:reverse(OutAcc) | Chunks]); >>> chunkify(Bt, [InElement | RestInList], [InSize | RestSizes], Threshold, >>> OutAcc, >>> =A0 =A0 =A0OutAccSize, Chunks) -> >>> =A0case InSize of >>> =A0InSize when (InSize + OutAccSize) > Threshold andalso OutAcc /=3D []= -> >>> =A0 =A0 =A0chunkify(Bt, RestInList, RestSizes, Threshold, [], 0, >>> =A0 =A0 =A0 =A0 =A0[lists:reverse([InElement | OutAcc]) | Chunks]); >>> =A0InSize -> >>> =A0 =A0 =A0chunkify(Bt, RestInList, RestSizes, Threshold, [InElement | = OutAcc], >>> =A0 =A0 =A0 =A0 =A0OutAccSize + InSize, Chunks) >>> =A0end. >> > >