Return-Path: X-Original-To: apmail-subversion-dev-archive@minotaur.apache.org Delivered-To: apmail-subversion-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE17B7FDA for ; Mon, 10 Oct 2011 17:39:35 +0000 (UTC) Received: (qmail 64139 invoked by uid 500); 10 Oct 2011 17:39:35 -0000 Delivered-To: apmail-subversion-dev-archive@subversion.apache.org Received: (qmail 64106 invoked by uid 500); 10 Oct 2011 17:39:35 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Delivered-To: moderator for dev@subversion.apache.org Received: (qmail 86119 invoked by uid 99); 10 Oct 2011 15:58:10 -0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Message-ID: <4E9315DB.20704@snakebite.org> Date: Mon, 10 Oct 2011 11:57:15 -0400 From: Trent Nelson User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1 MIME-Version: 1.0 To: Julian Foad CC: Stefan Sperling , "dev@subversion.apache.org" , Subject: Re: Identifying branch roots References: <20111005100931.CC43D23888EA@eris.apache.org> <20111006202945.GH11507@jack.stsp.name> <1317983369.6194.14937.camel@edith> <1317985149.6194.15132.camel@edith> In-Reply-To: <1317985149.6194.15132.camel@edith> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 07-Oct-11 6:59 AM, Julian Foad wrote: > On Fri, 2011-10-07 at 11:29 +0100, Julian Foad wrote: >> Stefan Sperling wrote: >>> julianfoad wrote: >>>> +/* This property marks a branch root. Branches with the same value of this >>>> + * property are mergeable. */ >>>> +#define SVN_PROP_BRANCHING_ROOT "svn:ignore" /* ### should be "svn:branching-root" */ >> >> Hi Stefan. Thanks for picking up on this. >> >>> I think your addition of a 'branch root' property is quite a significant >>> step. Is this really necessary in order to improve the output of >>> 'svn mergeinfo' or do you have additional steps planned that go beyond >>> tuning output? >> >> Both. I think knowing whether the (requested) merge source and target >> are branch roots (and indeed branches of the *same* "project" or tree) >> is important for improving the output and diagnostics of "svn mergeinfo" >> and "svn merge" commands. >> >> It could of course enable other new behaviours relating to branches, and >> I don't know what those are yet (apart from trivial UI things like >> answering "is this a branch?"). >> >> So I'm working on the idea that it would be useful to have branch roots >> identifiable by some mechanism, so I'll add "some mechanism" (currently >> this property, but I'm totally open to a different mechanism such as >> branch points being defined in a config file) and see what useful >> behaviours I can come up with. >> >>> There has been some discussion about adding a property for this >>> and similar purposes in the past, see >>> http://svn.haxx.se/dev/archive-2009-09/0156.shtml >>> (there are probably more threads about this topic) >> >> Yes, and it's time to figure out what we can usefully do with such >> information and then we'll know exactly what branch configuration >> information we need and what's a good way to store it. >> >> I'll reply to the rest in a further email. >> >> - Julian Welp, I'm never going to get a better lead in than that, so, hi, folks! Freelance SCM consultant here; used to specialise in ClearQuest, of all things, but my last two gigs ended up revolving around Subversion. Specifically, Subversion merges, in the enterprise, and the, uh, quirks involved. Each client had different requirements, and thus, the solution I ended up delivering to each one differed a bit. The first solution was neat, and did all kinds of funky ClearQuest integration and merge validation, but the second one is more applicable to this discussion, so I'll describe that first. In essence, it's a hook framework that attempts to enforce Subversion best-practices by blocking* incoming commits if it detects one or more of the following: (*) Sometimes it'll block, but phrase the error message along the lines of "if you *really* want to do this, re-try your commit with the phrase 'CONFIRM MULTI-ROOT RENAME' somewhere in your commit message". TagCopied TagRenamed TagRemoved TagModified TagReplaced TagSubtreeCopied TagSubtreeRenamed TagSubtreeRemoved TagSubtreeModified TagSubtreeReplaced MultipleUnknownAndKnownRootsModified MixedRootNamesInMultiRootCommit MixedRootTypesInMultiRootCommit SubversionRepositoryCheckedIn MergeinfoAddedToRepositoryRoot MergeinfoModifiedOnRepositoryRoot SubtreeMergeinfoAdded RootMergeinfoRemoved DirectoryReplacedDuringMerge EmptyMergeinfoCreated TagDirectoryCreatedManually BranchDirectoryCreatedManually BranchRenamedToTrunk TrunkRenamedToBranch TrunkRenamedToTag BranchRenamedToTag BranchRenamedOutsideRootBaseDir TagSubtreePathRemoved RenameAffectsMultipleRoots UncleanRenameAffectsMultipleRoots MultipleRootsCopied UncleanCopy FileRemovedFromTag CopyKnownRootSubtreeToValidAbsRootPath MixedRootsNotClarifiedByExternals CopyKnownRootToIncorrectlyNamedRootPath CopyKnownRootSubtreeToIncorrectlyNamedRootPath RenamedKnownRootToIncorrectlyNamedRootPath MixedChangeTypesInMultiRootCommit CopyKnownRootToKnownRootSubtree UnknownPathCopiedToIncorrectlyNamedNewRootPath RenamedKnownRootToKnownRootSubtree FileUnchangedAndNoParentCopyOrRename DirUnchangedAndNoParentCopyOrRename EmptyChangeSet CopyKnownRootToUnknownPath CopyKnownRootSubtreeToInvalidRootPath NewRootCreatedByRenamingUnknownPath UnknownPathCopiedToKnownRootSubtree NewRootCreatedByCopyingUnknownPath PathCopiedFromOutsideRootDuringNonMerge UnknownDirReplacedViaCopyDuringNonMerge DirReplacedViaCopyDuringNonMerge DirectoryReplacedDuringNonMerge PreviousPathNotMatchedToPathsInMergeinfo PreviousRevDiffersFromParentCopiedFromRev PreviousPathDiffersFromParentCopiedFromPath PreviousRevDiffersFromParentRenamedFromRev PreviousPathDiffersFromParentRenamedFromPath KnownRootPathReplacedViaCopy BranchesDirShouldBeCreatedManuallyNotCopied TagsDirShouldBeCreatedManuallyNotCopied CopiedFromPathNotMatchedToPathsInMergeinfo InvariantViolatedModifyContainsMismatchedPreviousPath InvariantViolatedModifyContainsMismatchedPreviousRev InvariantViolatedCopyNewPathInRootsButNotReplace MultipleRootsAffectedByRemove AbsoluteRootOfRepositoryCopied PropertyChangedButOldAndNewValuesAreSame CopiedOrRenamedUnknownPathToIncorrectlyNamedNewRootPath UnknownPathRenamedViaReplaceToExistingKnownRoot UnknownPathCopiedViaReplaceToExistingKnownRoot UnknownPathRenamedToKnownRootSubtree UnknownPathCopiedToKnownRootSubtree KnownRootSubtreeRenamedViaReplaceToExistingKnownRoot UncleanRenameOfRootAncestorPath RenamedKnownRootViaReplaceToExistingKnownRoot RootPathAncestorRenamedViaReplaceToExistingKnownRoot RenamedKnownRootViaReplaceToRootAncestorPath RenamedKnownRootViaReplaceToRootAncestorPath RootPathAncestorRenamedToValidAbsoluteRootPath RootPathAncestorRenamedToValidRootPathSubtree RootPathAncestorRenamedToKnownRootSubtree RootPathAncestorRenamedViaReplaceToRootAncestorPath RenamedKnownRootToUnknownPath RenamedKnownRootSubtreeToUnknownPath RenamedKnownRootSubtreeToValidRootPath RenamedKnownRootSubtreeToIncorrectlyNamedRootPath UncleanRename RenameRelocatedPathOutsideKnownRoot (There's probably room for another e-mail thread just discussing all of these conditions; let's just say, Subversion repositories in the enterprise rarely look like their usually-well-laid out open source repository brethren. What was the Blade Runner line? "I've seen things you people wouldn't believe."? ;-) My personal favorite: 'SubversionRepositoryCheckedIn'.) So, as you can see, most of these conditions involve the concept of a root. Thus, the ability to accurately discern what constitutes a root took up a large portion of my time. Hard-coding regexes and forcing all repositories to confirm to a pre- defined repository layout worked like a charm for my first client, as I was coming in before they had any Subversion repositories rolled out into production. (Well, sort of.) That unfortunately wasn't feasible for my second client. They were a *huge* Subversion shop. At the time I came in they had something like 960 production repositories, and I wouldn't be surprised if they were well over 1,000 by now. There was no standard layout between repos, and a lot of repos used non-standard branches/tags/trunks paths so trying to manage 'root detection' via regexes was a non-starter. For example, a number of repos had layouts like this: /foo/trunk /foo/branches/1.0.x /foo/branches/bugzilla/1081 i.e. 'bugzilla' was just some random directory they created to hold developer branches related to bugs. A regex approach would have matched 'bugzilla' as the branch root, whereas, in fact, the branch root would have been 1081. The other non-starter was requiring the admin staff to have to go in and manually specify what constituted a branch, i.e. setting a 'branch root' property on relevant paths. The overhead that would have been required to do that for ~1,000 repositories (with hundreds, if not thousands of differently named branches/roots (i.e. not particularly easy to automate reliably)) was not acceptable (for many enterprisey reasons mainly surrounding cost). So, I needed to design the branch detection logic in such a way that it didn't require any hand-holding from the admins or support staff. It took two attempts. For the first attempt, I played around with the notion of a root *base* directory, i.e. /branches and /tags. The first thing the framework would do when processing a pre-commit was create a 'RepositoryRoots' class (the framework was written in Python FWIW), which would recurse through the repo up to N-levels deep in order to determine the valid root base directories. Except for trunk, which was special, if a directory had subdirectories that were created by copying another path (i.e. how tags or branches are created), then the directory would be considered a root base dir. That lasted... about a day or two. It was a leaky abstraction at best, and broke when I encountered repos with the more non-standard layouts. (I'm not even sure if I've described it accurately above; but eh, who cares, it's gone now.) The problem with the regex and base-root-dir discovery approaches was that they were essentially heuristic based. "This directory features lots of subdirectories that were copies of other paths, therefore, it's a good chance it's a valid root base directory." In most cases, yes, that was a valid assumption, but not always. The root detection logic was the most critical piece of my solution -- I wasn't getting paid to correctly detect roots 70% of the time in 60% of the repos. It needed to be 100% in 100%. So, I thought to myself, how can I correctly and autonomously identify a root with 100% accuracy? What one property did valid roots share that I could interrogate? Heck, what even constitutes a root? A branch is a root, so is a tag, so is trunk. ....and then it dawned on me. It seems so simple now, in retrospect: In the beginning, there was one root: trunk. Then it was copied elsewhere, and became a branch, or maybe a tag. These copies are also roots, and copies of them should also be considered roots. Ah, so simple! I just need to start at revision 0 and work my way up to HEAD, whilst keeping a record of roots I encounter along the way. And that's pretty much it ;-) Turns out, that approach has worked surprisingly well. It's been in production at the second client's site for nearly a year now. They just run the 'repo analysis' part of the code against new repositories before enabling the hooks, and wallah, they get instant root detection and prevention of some 80-something erroneous conditions. Here are some techie' details about the implementation. So, the script stores root information in a revision property called 'evn:roots' (set against the root of the repository). The value of evn:roots at any given revision will list all of the known roots in the repo at that revision: % svn pg --revprop -r26503 evn:roots svn://client.com/repos/foo {'/build/branches/3.0.1/': {'created': 22323}, '/build/branches/3.0.2/': {'created': 23129}, '/build/branches/3.1.0/': {'created': 25804}, '/build/branches/cvs/0.0.1/': {'created': 26389}, '/build/branches/bugzilla/4144/': {'created': 22121}, '/build/branches/bugzilla/6952/': {'created': 17661}, '/build/release//3.0.0/': {'created': 20774}, '/build/release/paris/3.0.0/': {'created': 20307}, '/build/release/rome/3.0.1/': {'created': 22473}, '/build/trunk/': {'created': 2919}, '/src/trunk/': {'created': 9353}, ... The 'created' revision refers to the revision that the root was created in. That's important, 'cause we store special metadata against the root in the revprop for the revision it was created in: % svn pg --revprop -r9353 svn://client.com/repos/foo ... '/src/trunk/': { 'copies': { 9834: [('/src/branches/2.1/', 9835)], 9997: [('/src/branches/bugzilla/2800/', 9998)], 10211: [('/src/branches/bugzilla/3326/', 10212)], 10252: [('/src/branches/bugzilla/2160/', 10253)], 10468: [('/src/branches/2.2/', 10469)], 11148: [('/src/branches/2.3/', 11149)], 11420: [('/src/branches/bugzilla/3720/', 11421)]}, 'created': 9353, 'creation_method': 'created'}, ... i.e. we store all the subsequent forward-copies of this root, as well as details of how it was created (which isn't very interesting in this example, as it's trunk and was created via mkdir, but if it were a branch or tag, it would contain details about where it was copied from). Let's say I delete /src/trunk in r26504. The entry for it in evn:roots in that revision will be gone; but a note will be made against the r9353 creation revprop to indicate which rev it was deleted in. The importance of storing data like this becomes apparent when you deal with situations like this: *hooks are turned off* r2: svn cp ^/trunk ^/branches/foo r3: svn rm ^/branches/foo r4: svn mkdir ^/branches/foo *repo is analysed, evn:roots are set, hooks are turned on* An attempt to do the following would be blocked, because r4/HEAD of /branches/foo was not created correctly (i.e. wasn't copied from an existing root), and thus, isn't considered a root either: svn cp /branches/foo /branches/bar However, the following *would* work, because /branches/foo *was* a valid root in r2: svn cp -r2 /branches/foo /branches/bar Thoughts? Trent.