Return-Path: Delivered-To: apmail-subversion-dev-archive@minotaur.apache.org Received: (qmail 90247 invoked from network); 2 Jan 2011 21:05:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Jan 2011 21:05:27 -0000 Received: (qmail 56156 invoked by uid 500); 2 Jan 2011 21:05:27 -0000 Delivered-To: apmail-subversion-dev-archive@subversion.apache.org Received: (qmail 56132 invoked by uid 500); 2 Jan 2011 21:05:27 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 56124 invoked by uid 99); 2 Jan 2011 21:05:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Jan 2011 21:05:27 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jcorvel@gmail.com designates 209.85.214.171 as permitted sender) Received: from [209.85.214.171] (HELO mail-iw0-f171.google.com) (209.85.214.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Jan 2011 21:05:20 +0000 Received: by iwn2 with SMTP id 2so13042575iwn.16 for ; Sun, 02 Jan 2011 13:04:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=QM8sDYq9Uuigu2u9nBgKpxgenHUX76+RBzjwzOIHouU=; b=CAgCYjF4ITsLyPUVdqAocxQDcmz2VCt615kxYtGV3Z6OQSQbLVr3yJeREFPqdFHtla vDsM3N8ICC/civI+V4XcSj/jk3y6kY89vZpIY7Y2BoDTy/wOa0bdyMtWl2zKdkXegy+P Dzm2r5ikaDxhGHVpzOedesvSZAcGv0JnRdagg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=kaCdzTbiOPbT6vJCrcaTdSEXRXORtLDccogZhpfqjWmnwMFmaGws/gJA8xLn0rJ/tL FcaRdkmhCp+Qdx2Th9ENt3BVZ0ojRR1k0CszJdxuovIxvSGOSpWeYj+LQDRdl6tbTbY1 uKpFQmv1UaHHgILU+vO7g4uiPzoU+O/ORMiUU= Received: by 10.231.16.67 with SMTP id n3mr3564074iba.66.1294002299610; Sun, 02 Jan 2011 13:04:59 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.14.75 with HTTP; Sun, 2 Jan 2011 13:04:39 -0800 (PST) In-Reply-To: <4D1FA9E2.1070000@alice-dsl.de> References: <4D1FA9E2.1070000@alice-dsl.de> From: Johan Corveleyn Date: Sun, 2 Jan 2011 22:04:39 +0100 Message-ID: Subject: Re: My take on the diff-optimizations-bytes branch To: Stefan Fuhrmann Cc: Subversion Development Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Sat, Jan 1, 2011 at 11:25 PM, Stefan Fuhrmann wrote: > Hi Johan, > > Thursday night I did something stupid and had a look at =A0how > svn blame could be made faster based on the HEAD code in > your branch. > > One night and most of the following day later, I think I made it > a few percent faster now. Some of the code I committed directly > to /trunk and you need to pull these changes into your branch > to compile the attached patch. > > Feel free to test it, take it apart and morph it any way you like -- > I know the patch isn't very pretty in the first place. I tested the > patch on x64 LINUX and would like to hear whether it at least > somewhat improved performance on your system for your > svn blame config.xml use-case. > > -- Stefan^2. > > [[[ > Speed-up of datasource_open() and its sub-routines > by a series of optimizations: > > * allocate the file[] array on stack instead of heap > =A0(all members become directly addressible through > =A0the stack pointer because all static sub-functions > =A0will actually be in-lined) > * minor re-arragements in arithmetic expressions to > =A0maximize reuse of results (e.g. in INCREMENT_POINTERS) > * move hot spots to seperate functions and provide a > =A0specialized version for file_len =3D=3D 2 > =A0(many loops collapse to a single execution, other > =A0values can be hard-coded, etc.) > * use seperate loops to process data within a chunk > =A0so we don't need to call INCREMENT_POINTERS & friends > =A0that frequently > * scan / compare strings in machine-word granularity > =A0instead of bytes > ]]] Hi Stefan, Thanks for taking a look. I really appreciate it. When I saw your first couple of commits, to the adler32 stuff, I already thought: hmmm, he's up to something :-). And after I saw your change to eol.c#svn_eol__find_eol_start, reading per machine-word, I was thinking: hey, I could really use something like that in the prefix/suffix scanning. Nice ... :-) (I had some trouble applying your patch. It doesn't apply cleanly anymore to HEAD of the branch (but most hunks were applied correctly, and I could manually change the rest, so no problem)). However, first tests are not so great. In fact, it's about 25% slower (when blaming my settings.xml file with 2200 revisions, it's spending about 90 seconds in diff, vs. around 72 with HEAD of diff-optimizations-bytes). Analyzing further, I can see it's spending significantly less time in prefix/suffix scanning, but more time in token.c#svn_diff__get_tokens (which is where the original algorithm gets the tokens/lines and inserts them into the "token tree"). This tells me it's not working correctly: either prefix/suffix scanning fails too soon, so there's much more left for the regular algorithm. Or it's just not functioning correctly. Looking at the result of the blame operation, it seems it's the latter. The final result of the blame is not correct anymore. I'll try to look some more into it, to see what's going wrong. Maybe you can also see it with a simple diff of a large file ... (for a good example in svn's own codebase, you could try subversion/tests/cmdline/merge-tests.py (pretty large, and almost 700 revisions)). Cheers, --=20 Johan