Return-Path: Delivered-To: apmail-infrastructure-dev-archive@locus.apache.org Received: (qmail 17964 invoked from network); 5 May 2008 17:15:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 May 2008 17:15:27 -0000 Received: (qmail 19628 invoked by uid 500); 5 May 2008 17:15:29 -0000 Delivered-To: apmail-infrastructure-dev-archive@apache.org Received: (qmail 19521 invoked by uid 500); 5 May 2008 17:15:29 -0000 Mailing-List: contact infrastructure-dev-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: infrastructure-dev@apache.org Delivered-To: mailing list infrastructure-dev@apache.org Received: (qmail 19510 invoked by uid 99); 5 May 2008 17:15:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 May 2008 10:15:29 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of santiago.gala@gmail.com designates 209.85.128.185 as permitted sender) Received: from [209.85.128.185] (HELO fk-out-0910.google.com) (209.85.128.185) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 May 2008 17:14:42 +0000 Received: by fk-out-0910.google.com with SMTP id f40so798841fka.1 for ; Mon, 05 May 2008 10:14:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:in-reply-to:references:content-type:date:message-id:mime-version:x-mailer:content-transfer-encoding; bh=QnNBgR/7lnyY/Cw6vgj9HgeR+fYficTWF2HbwPiz+aQ=; b=VNDWjI+86CZbdMvfZDXwzMSCiytWPYBDHiO/UqiHeXE+QhpFGsFkGqI1aq3Ma4cFcooZ5mVFv6vE62ASrmE8SqUxcajYdlDpvNv8Qn27znkAJJRfa8tia4eBe450pEfJB4RfozUgicyppKSFln8Po6Zq2Qcg+ywqlcQxXw3/Fak= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:in-reply-to:references:content-type:date:message-id:mime-version:x-mailer:content-transfer-encoding; b=HN8RlT4ejULUF+O1QhxOCYXyxXE1eqlcYKGKBuDT3qClh/ClbSPTSDUXshXywXFCP3O4oFJWMBZLmrEfjx716WQIhhz9/1sc999TQBpZnVrpaBKNk9VyIHo12Q93XfLZM/7Bk+6dRsjrZv5g6Uk3Ec7AD9pOC5IY6YSPt53P4Y0= Received: by 10.86.89.1 with SMTP id m1mr10180188fgb.20.1210007695169; Mon, 05 May 2008 10:14:55 -0700 (PDT) Received: from ?172.27.70.188? ( [81.33.31.233]) by mx.google.com with ESMTPS id g28sm11524923fkg.1.2008.05.05.10.14.52 (version=SSLv3 cipher=RC4-MD5); Mon, 05 May 2008 10:14:53 -0700 (PDT) Subject: Re: [scm] Server load from git-svn vs. normal svn clients From: Santiago Gala To: infrastructure-dev@apache.org In-Reply-To: <5c902b9e0805050908r203d8c36o38fd2e886dae2044@mail.gmail.com> References: <510143ac0805050345v58dd906aj4b29bb878e2cee26@mail.gmail.com> <5c902b9e0805050908r203d8c36o38fd2e886dae2044@mail.gmail.com> Content-Type: text/plain; charset=utf-8 Date: Mon, 05 May 2008 19:18:08 +0200 Message-Id: <1210007888.13744.71.camel@marlow> Mime-Version: 1.0 X-Mailer: Evolution 2.22.1.1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org El lun, 05-05-2008 a las 09:08 -0700, Justin Erenkrantz escribió: > On Mon, May 5, 2008 at 3:45 AM, Jukka Zitting wrote: > > Thus, even though the startup cost for git-svn is high, I believe that > > over time the average server load is not so different from normal svn > > clients. > > I think it'd be worth quantifying this as I don't buy it. > We can try to quantify it. Obviously repositories generated once and later distributed would get the extra benefits of multiplying the savings while doing only once the cloning. > The cost of replaying all of those commits is ridiculously expensive I remember you telling me in 2006 that my repeated attempts to clone portals/bridges were unnoticeable in terms of load. I think I still have the IRC logs somewhere. I wonder what has changed since then. From undetectable to ridiculously expensive. > (in both CPU and network) that I believe it's almost certain that I guess it depends on the depth of history. When I cloned shindig I don't think it had more than 50 commits, so "all of those" is not really a big number. I have been updating incrementally since then, and for people cloning my repo git-svn can reconstruct the svn part of it using the special tag at the bottom without touching the subversion server (needs check). Now, every svn log I do needs to walk down the whole history of commits which, for instance, is now 408 or so: sgala@marlow ~/newcode/git-shindig3 (master)$ time sh -c "git log | grep -E '^commit ' | wc -l" 408 real 0m0.874s user 0m0.847s sys 0m0.021s So every svn log is touching 8x the number of commits that the original import. I'm not sure about how expensive svn log is. Ditto for svn blame or "svn diff -r" sgala@marlow ~/newcode/shindig $ time sh -c "svn log | grep -E '^r[0-9]+' | wc -l" 408 real 0m5.614s user 0m0.147s sys 0m0.044s (BTW, from the user point of view it is the real time that is important: having to wait almost 6 seconds for a log makes it difficult to use.) Being extremely naive and assuming that the client computer work of a git log is equivalent to the server computer work of a svn log, each user using git-svn would be saving, for each log operation, 0.7 seconds CPU time. But I don't really know how efficient is the server to handle those operations, or the CPU/IO it has, etc. > you'd never recoup the initial costs unless it is a project that you > work on 7x24. -- justin As the number of commits accumulates so do the savings of not touching the server for certain operations and only for incremental updates. So if the initial import is either shared or early in the history of the project I see how the maths might work. But we would need to know how expensive is a one time checkout of all revisions in a project to be able to estimate. We also would need statistics on usage of those commands (log, blame, etc.) to be certain. Both as client line and as part of libraries (subclipse & co.) Regards -- Santiago Gala http://memojo.com/~sgala/blog/