www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago Gala <santiago.g...@gmail.com>
Subject Re: Best Practices so far?
Date Sun, 04 May 2008 16:24:31 GMT
El dom, 04-05-2008 a las 16:25 +0200, Sander Striker escribió:
> On Sun, May 4, 2008 at 2:18 PM, Henning Schmiedehausen <hps@intermeta.de> wrote:
> > On Fri, 2008-05-02 at 12:04 -0700, Justin Erenkrantz wrote:
> >  > As I've repeatedly stated, a 'zone' does not offer you anything at
> >  > all.  The fact that you're asking for a zone indicates that you don't
> >  > understand how git is designed to work.  It is meant that you have a
> >  > local copy of the entire repository.  Having a remote copy sitting on
> >  > another server is to defeat the purpose of a dSCM.
> >
> >  Any central place with <other SCM> mirror copies would allow
> >
> >  - defined poll intervals from the SVN server, which could minimize
> >  traffic or push this traffic to off-times
> That is correct.
> >  - reduce or remove the need for svn-<something> bridges because
> > there would be a central place to access Apache project source code
> > in another format
> It would *be* the svn-<something> bridge.
> And would make it a new service [in a central place] that then needs to
> be supported.  That's what always happens when it starts being used.
> That said, I shudder at the thought when someone comes up with
> <something-else>.

Not really true, it depends on how it is used. Typically, for git-svn,
each repository is updated independently against the subversion one. It
behaves as a svn client. So, once this zone "delivers" a git repository
it can as easily be shut off as kept alive. No need to provide a
sustained service beyond initial clones.

This is why a would actually prefer that the access from the IP of p.a.o
skips mod_dontdothat, so that each interested party can produce a clone
there and move it somewhere else later.

> >  - allow interested parties to toy with these tools *without* interfering
> >  with day-to-day SVN operations
> The word "toy" is interesting.  If it is toying around, the impact should
> not even be noticeable.

So Justin told me at #asfinfra in 2006 when I started my experiments. I
was doing repeated attempts to clone portals-bridges, I was getting
errors and trying to debug it. This is how I learned that we were
breaking on purpose some subversion commands (REPORT on the root, I
think), which was impeding git-svn to completely clone any TLP.

i.e. we are not allowing people to toy, which actually causes more
attempts, trying to route around our svn API breakage.

> >  - allow infra to tighten the access rules on the SVN server for tools
> >  and simultaneously relaxing the access rules for this central place to
> >  allow efficient transfers off the SVN server
> The only thing it needs relaxing for is this "toying" around.  Regular
> clients don't need this.  The reason the access rules are there, is
> to protect the SVN server against a) abusive tools, and b) protect
> users against themselves that weren't thinking when they typed their
> "svn checkout root/of/project"... (been there, done that).

done that too, one of my first attempts to svn checkout I started
getting cocoon with all branches and tags, which is big.

> >  - give a defined testbed to work *with* infra instead of rogue
> >  experiments from all over the Internet.
> There is nothing wrong with rogue experiments; actually rogue
> experiments are great because they don't need any attention from
> the people maintaining the systems.  That said, when the experiments
> start to become so noticeable that is seen as abuse, access may
> be restricted.  And from then on it becomes interesting; apparently
> the tool is not well behaved towards our infrastructure, yet the one
> conducting the experiment presumably wants to continue to toy around.

The person conducting the experiment is trying to avoid an arbitrary
restriction on our server that impedes it to complete a clone. I have
cloned a number of subversion repositories, whole history with branches
and tags, with no problem: the django one with full history is slightly
larger than one single subversion working copy, for instance. I have
also roundup, gajim, a number of smallish google-code projects, etc. The
only ones I was never able to clone were the ASF ones, except for
shindig, (3 levels far from root) and I still needed several attempts to
skip errors, I guess mod_dontdothat limits, but I'm not really sure. I
ca say for certain that I consistently get errors on import attempts
from the ASF, and *never* got a single one on git-svn imports from any
other project.

> >  Think of this as another way of creating "daily snapshots" of our
> > source tree. Maybe this would ease your mind.
> I'm not sure, but I would think that there is little objection in pointing
> svnsync* to the main repository, for say a project root**.  What is done
> beyond that is not of any concern as it wouldn't be visible.

Can you please elaborate on that? I don't really understand what do you
mean. I understand you mean producing a clone of (part of) the ASF
subversion repo and keep working with it. This has significant problems.

Importing some ASF projects into git would follow moves. For instance,
and to quote one that I have already done with git-svn:

To clone portals/bridges/trunk git-svn would start by getting all
revisions in portals/jetspeed-2/trunk/bridges , and the same for each
tag and branch in the initial location. I'm not sure if svnsync can get
this synced from the master without needing to clone the whole portals
stuff. This would mean that the cloning any project moved from incubator
or jakarta to TLP would require cloning the whole svn repository, which
means a very unreasonable size and bandwidth requirement.

I have cloned portals/bridges (about 35M svn WC, turned into about the
same size git whole history plus WC) easily using my ADSL connection in
a couple hours. I don't think requiring /me to svnsync /portals first,
which is way bigger than this, is a reasonable requirement. And this is
small compared with, say, the requirement to svnsync / to get any small
project that has graduated from /incubator/X to /NEWTLP or
from /jakarta/X to, say, /db/Z/Y.

subversion is very demanding re: size of repository and working copy.

> *) I'd hold off on that until the bugs are fixed that are preventing it to
>   be used for the svn.eu.apache.org mirror.


> **) The full root is probably stretching it, but YMMV.  If it doesn't cause
> disproportional additional strain, why not.

One of the good things about git is that it can very easily get
different revision graphs in/out of the same repository, and merge
between them. So there is absolutely no need to have it as big chunk. I
think most users would prefer something like shindig with whole history
measuring just 3.4Megs repo + 4M WC, i.e. 7.3M total (a single svn
checkout is 11M) than having to deal with the whole ASF code repository.
Specially as code can be merged/copied from one project to another while
preserving commit history.

> >  The idea of a zone was born at the dSCM BoF in Amsterdam
> > because a zone allows isolation and control of such an experimental
> > infrastructure with *minimal* impact on the existing systems.
> Except for the system that would host the zone.  Given we're at
> capacity zone wise, AFAIK, it doesn't seem like a viable option.

I was not there, I could not afford staying one week in Ams to cover
both hackathon and planners meeting (the two parts of it I was
interested). I always preferred something like allowing p.a.o IP to lift
restrictions, or having https authenticated requests skip restrictions,
as someone suggested.

> > Maybe you or any other infra members should have bothered to be
> > there.
> > Dialog is good and a pillar of the ASF.
> It is.  But to say that infra volunteers didn't bother to come is stretching
> it I think.  Nothing prevented anyone to approach the people from Infra
> that were present (unfortunate timing notwithstanding); say the day
> after the BoF?
> >  You still seem to believe that this discussion is "intended to kick
> > SVN out". For most participants, it is not (AFAICS). This is "how
> > can we augment the existing infrastructure in new ways, that are
> > outside conventional thinking".
> No.  To make sure there is no lack of understanding about it: the Infra
> team is not willing to maintain more than one SCM.

Cool, the good thing about dSCM is that there is no need of a team
maintaining it. Just that the current history can be cloned initially
one way or another, via some way to work around the barriers set up
against our server, which seems to be the problem since the first
experiments I did in 2006 or so.

> >  And from what I have seen, a lot of people like Jukka have invested a
> >  considerable amount of thinking before asking.
> Which is appreciated.
> > I don't think, anyone should belittle this.
> I think that with the time invested in answering questions, I wouldn't
> come close to considering it "belittling".

I think belittling does not describe the typical reaction to it either.
There are better words for it.

> Cheers,
> Sander

Santiago Gala

View raw message