www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago Gala <santiago.g...@gmail.com>
Subject Re: [scm] Server load from git-svn vs. normal svn clients
Date Mon, 05 May 2008 17:18:08 GMT
El lun, 05-05-2008 a las 09:08 -0700, Justin Erenkrantz escribió:
> On Mon, May 5, 2008 at 3:45 AM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> >  Thus, even though the startup cost for git-svn is high, I believe that
> >  over time the average server load is not so different from normal svn
> >  clients.
> I think it'd be worth quantifying this as I don't buy it.

We can try to quantify it. Obviously repositories generated once and
later distributed would get the extra benefits of multiplying the
savings while doing only once the cloning.

> The cost of replaying all of those commits is ridiculously expensive

I remember you telling me in 2006 that my repeated attempts to clone
portals/bridges were unnoticeable in terms of load. I think I still have
the IRC logs somewhere. I wonder what has changed since then. From
undetectable to ridiculously expensive.

> (in both CPU and network) that I believe it's almost certain that

I guess it depends on the depth of history. When I cloned shindig I
don't think it had more than 50 commits, so "all of those" is not really
a big number. I have been updating incrementally since then, and for
people cloning my repo git-svn can reconstruct the svn part of it using
the special tag at the bottom without touching the subversion server
(needs check).

Now, every svn log I do needs to walk down the whole history of commits
which, for instance, is now 408 or so:

sgala@marlow ~/newcode/git-shindig3 (master)$ time sh -c "git log | grep
-E '^commit ' | wc -l"

real	0m0.874s
user	0m0.847s
sys	0m0.021s

So every svn log is touching 8x the number of commits that the original
import. I'm not sure about how expensive svn log is. Ditto for svn blame
or "svn diff -r"

sgala@marlow ~/newcode/shindig $ time sh -c "svn log | grep -E
'^r[0-9]+' | wc -l"

real	0m5.614s
user	0m0.147s
sys	0m0.044s

(BTW, from the user point of view it is the real time that is important:
having to wait almost 6 seconds for a log makes it difficult to use.)

Being extremely naive and assuming that the client computer work of a
git log is equivalent to the server computer work of a svn log, each
user using git-svn would be saving, for each log operation, 0.7 seconds
CPU time. But I don't really know how efficient is the server to handle
those operations, or the CPU/IO it has, etc.

> you'd never recoup the initial costs unless it is a project that you
> work on 7x24.  -- justin

As the number of commits accumulates so do the savings of not touching
the server for certain operations and only for incremental updates. So
if the initial import is either shared or early in the history of the
project I see how the maths might work. But we would need to know how
expensive is a one time checkout of all revisions in a project to be
able to estimate.

We also would need statistics on usage of those commands (log, blame,
etc.) to be certain. Both as client line and as part of libraries
(subclipse & co.)

Santiago Gala

View raw message