labs-labs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago Gala <santiago.g...@gmail.com>
Subject Re: Lab for distributed SCM?
Date Sat, 23 Feb 2008 00:58:22 GMT

El vie, 22-02-2008 a las 12:55 -0800, Roy T. Fielding escribiĆ³:
> On Feb 22, 2008, at 11:35 AM, Santiago Gala wrote:
> > I had a number of responses to your comments in previous emails about
> > the distributed tools forcing isolated behavior, which you can  
> > imagine I
> > disagree with, but I think it makes no sense to keep on with this kind
> > of discussion. I would just repeat a mantra about forks that I  
> > heard (to
> > Sam?): the easiest you make forking, the most difficult is for it to
> > happen. The same, I think, applies to distribution here: the  
> > easiest to
> > move patches around, the more they will "come" to the central point.
> 
> Umm, no, patches are text files.  I think you mean the easier it is to
> identify and apply patches, the more likely they won't be lost or left
> behind.
> 

I actually mean that the easier it is to keep changes over a "reference"
tree public, the more likely the "good" ones will arrive to the
reference tree (or alternatively a different tree will become the
reference).

Evolution is about mutation *and* selection. I think of different trees
as organisms undergoing mutation. Selection is the process by which
trees containing "good" changes are selected. The point is that with
distributed development there are many more places to pick up changes
from, and it is much easier.

> > The fact that people finds easier to do a clean checkouts to work on
> > different features, commit and delete, does not ask necessarily for
> > different work flows. For instance, I used to have two different svn
> > checkouts for this kind of split between small fixes and a longer term
> > "feature". Keeping the changes synced to both was mostly tedious.
> >
> > Most patches I keep from gajim, which is a project I track "from the
> > outside", have been sent upstream and not accepted, for one reason or
> > the other. I got a lot accepted, but sometimes people does not have  
> > the
> > same perception that I have, or just the same configuration. I *run*
> > the tools (a python jabber client) hours a day with those patches, so
> > I'm confident that they work. What should I do? trust them and throw
> > the patches away to get a less functional version? What I'm doing now
> > is keeping quilt and merging the patches every time I update svn from
> > them. I'm migrating to use git for that task after I tested git-svn  
> > import
> > with the repository.
> 
> That is isolated development.  If you participate within the project,
> instead of outside it, then the presence of everyone's private tree
> makes it very hard for anyone to know what has been accepted, let alone
> test with the same configuration, plan for the same features, and
> generally help each other out with the small stuff.  What happens next
> is that the trees become their own distributions, just like Debian and
> Ubuntu and Redhat are not actually based on Linus' tree.
> 

I guess we disagree in what you call "isolated development", see the
bottom.

Re: gajim, I actually participate in the project, pulling changes,
discussing in the chat room, solving bugs and reporting, etc. I just
don't want to accept any responsibility, as I started there when I was
sick and burned out and don't expend too much energy trying to push
changes. I just offer them and keep my course. :)

The point is I'm not *under discipline*, in the sense of doing whatever
somebody asks me (well, if people asks nicely and I have time...). I am
probably currently the 3-4th developer in terms of contributions, of 4-5
active ones, and have a sustained activity there for a couple of years.
It is not a casual patch. But some changes I strongly believe should be
in are not (yet), so I keep my tree. My changes grow and shrinks in
time, as I either get patches inside or drop them because an alternate
solution works.

I think Debian, Ubunto or Redhat kernels are based on Linus' one. The
one I run now, gentoo-sources-2.6.24-r2, has around 60k of patches in
50M, i.e., around 1/1000 "mutation rate" this is typically growing from
2.6.24-flat on, for instance 2.6.23-9 contains 180K, because it includes
backports of the maintenance branch in linus' sources apart from
specific gentoo patches.

There is no long term divergence in the trees. The differences reflect
mostly the "lag" between bug reports, ability to reproduce, quality of
patches, etc. Transient peaks happen, but they typically reduce in one
release or so. So I see vendor trees as "buffers", and very useful ones,
at that.

> >> Likewise, if you intend to analyze the parts of gSCM systems that  
> >> don't
> >> scale in the hope of finding fixes/workarounds, then by all means  
> >> do so.
> >
> > As an example of the kind of experiments I think we can do here:  
> > today I
> > did a (couple of) git-svn "clone" of incubator/shindig. It is much
> > smaller than a clean svn checkout, *while having full history*. It is
> > way faster to process because it is smaller. It is faster to process
> > because the depth of the history is much smaller than the whole ASF
> > tree, etc:
> >
> > $ du -sh shindig git-shindig*
> > 6,7M	shindig
> > 2,9M	git-shindig  <- read only, http
> > 2,9M	git-shindig2 <- can commit to svn, https
> >
> > I can navigate and search the whole history, fast, etc.
> >
> > Obviously git-svn is not a mature tool, git-svn has problems (or I am
> > stupid enough to blow it and make it fail silently), and even more
> > problems appear because there are impedance mismatches between the
> > stores of both tools. Also, fame is git is not exactly well supported
> > under windows. Not that I care for myself, I have not used windows  
> > in a
> > number of years, but this is a problem for everything except
> > experimental usage.
> >
> >> Just don't assume that the folks who built Subversion (or people like
> >> me,
> >> who just did a lot of research on gSEE back in the 90s) are somehow
> >> unaware of the advantages/disadvantages of dSCM.
> >
> > Cool, I would ask you in exchange to don't assume that tools that are
> > used to manage several of the most dynamic projects in the world (the
> > linux kernel is one such) with a lot of success, and tools used by
> > companies such as Redhat or Ubuntu for a number of years are used  
> > out of
> > masochism.
> 
> Dude, I already have Hg installed, watched Linus' rant the week
> it was posted, and was well aware of bitkeeper, TeamWare, DARCS, and
> several other long-abandoned dSCM tools years ago (I tried to install
> git, but at the time it didn't work on anything but Linux).  Most of
> the dSCM legacy came directly or indirectly out of Solaris engineering's
> tool groups and ex-Sun employees, which is one of the reasons they are
> so friggin incapable of collaborating on even the simplest of tasks.
> 

I'm actually looking for serious bibliography on source control, and I
have not been able to find reasonable textbooks or even much academic
work. *Good* pointers welcome! 

> When you get to the part where you look at the storage formats and see
> the trade-offs that have been deliberately chosen to benefit isolated
> development at the cost of consensus development, then you will
> understand where I am coming from.  If you can fix that trade-off
> without failing both models, then you'll have bettered both Linus
> and the Subversion team.  I'm not saying its impossible -- I am just
> saying it is naive to think that the trade-offs weren't chosen
> for a good reason.
> 

I don't get your point about storage formats and trade-offs. Basically
git's format is designed for integrity of content (the name of the
objects is their SHA1). I don't really know the low level of other
systems, other than knowing that most of them have several different
formats.

You insist in classifying into "isolated" vs "consensus" development. Do
you have any document where those concepts are described? I would
classify the processes much more as "hierarchical" vs.
"evolutionary"/"headless"/"networked", similar to the classic
"cathedral" vs. "bazaar". But I can't see a mapping between your
classification and mine. In both cathedral or bazaar style development
there are consensus builders and lone hunters roles. And I think
centralized development favors the cathedral style, not the consensus
development. 

> Good luck,
> 
> ....Roy
> 

Thanks
Santiago

> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: labs-unsubscribe@labs.apache.org
> For additional commands, e-mail: labs-help@labs.apache.org
> 
-- 
Santiago Gala
http://memojo.com/~sgala/blog/


---------------------------------------------------------------------
To unsubscribe, e-mail: labs-unsubscribe@labs.apache.org
For additional commands, e-mail: labs-help@labs.apache.org


Mime
View raw message