Return-Path: X-Original-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB6904E8C for ; Wed, 29 Jun 2011 10:18:04 +0000 (UTC) Received: (qmail 78458 invoked by uid 500); 29 Jun 2011 10:18:02 -0000 Delivered-To: apmail-incubator-ooo-dev-archive@incubator.apache.org Received: (qmail 78010 invoked by uid 500); 29 Jun 2011 10:17:50 -0000 Mailing-List: contact ooo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ooo-dev@incubator.apache.org Delivered-To: mailing list ooo-dev@incubator.apache.org Received: (qmail 77986 invoked by uid 99); 29 Jun 2011 10:17:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jun 2011 10:17:43 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gstein@gmail.com designates 209.85.210.47 as permitted sender) Received: from [209.85.210.47] (HELO mail-pz0-f47.google.com) (209.85.210.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jun 2011 10:17:38 +0000 Received: by pzk36 with SMTP id 36so870471pzk.6 for ; Wed, 29 Jun 2011 03:17:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=FkrgQjVvkTuYuZAqLZRQNkL7PfwNKSVP7wwmfPNjWlI=; b=t8lE8QH/kBNRwkjTuJxAragfOK/LGwBAkk1IG7i22pHCDmGTt4Yt1HQ5cY4X0nIc6h PZQuHSTwRX8N6yPhwm0ekzjkurGK16d2J0LqGSTSiR7sBLW30y4MUfqLbNturoXJ3ooq NJs3AL/Ai+AK5Z6bfZFwMcW/dZqi61mJPdtKA= MIME-Version: 1.0 Received: by 10.142.248.21 with SMTP id v21mr350856wfh.42.1309342637693; Wed, 29 Jun 2011 03:17:17 -0700 (PDT) Received: by 10.143.83.6 with HTTP; Wed, 29 Jun 2011 03:17:17 -0700 (PDT) In-Reply-To: References: Date: Wed, 29 Jun 2011 06:17:17 -0400 Message-ID: Subject: Re: Building a single Hg repository (was: An svn question) From: Greg Stein To: ooo-dev@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 On Wed, Jun 29, 2011 at 05:04, Michael Stahl wrote: > On 29.06.2011 05:27, Greg Stein wrote: >... >> One more thing... I cloned one of the CWSs (ab78), and it was 2.8 Gb. >> My clone of DEV300 is 3.5 Gb. Is the size of that CWS typical? There >> are about 250 CWSs hosted at OOo. If the average holds, I would need >> to clone 700 Gb of material down to my system to perform the >> integration. > > i guess your DEV300 includes a working copy, and ab78 does not? > "du" says 2.4 GB for .hg on ext3 filesystem here. Nope. I also had a full working copy :-) ... it wasn't until later that I learned about 'hg clone -U'. I'm a total n00b with Hg. heh. >> Am I missing something? Is there a better way? etc. > > you're doing it wrong :) Thought so. I jumped onto the #mercurial channel and spoke with a couple people there. In just that short time, I learned quite a bit. Specifically, the hardlinks that you mentioned, along with the relink extension. > in principle the size of a CWS is on the same order as the master, because > it's just another HG repository. Right. If you link them together, which I didn't understand how to do. (but have now learned) > but HG supports hardlinks between repositories (in newer versions even on > win32), so you can "hg clone" the master on the same filesystem and then > pull in the CWS, and it will be _much_ faster and take _much_ less Yah. This is awesome, and will make pulling CWSs much quicker. I'll bake that into our scripts. > additional space (in fact, less than the useful-only-for-diff "pristine > source" in a SVN working copy would take). Um. I see kind of a pot shot at svn here. I'll give you the benefit of the doubt, rather than get cranky. The local pristines (beyond just diff) mean that commits can send deltas, rather than the whole file. And when you're working with 4G files (oh, wait! Hg can't deal with files that size!) then sending a delta is very important. > there is an extension written by my former colleague Bjoern Michaelsen that > can mirror all the CWSes automatically: > > http://mercurial.selenic.com/wiki/BranchmirrorExtension > > IIRC all CWSes that actually include changesets not in the master take less > than 100GB. > only issue is that Branchmirror does not check "hg incoming" before cloning > for a CWS, so you end up with some useless repos identical to master. Cool. I'll take a look at this. Maybe this will be important for our conversion scripts. I'm still learning while I assemble that stuff. All this help is awesome, as I really don't know Best Practices for Hg. > i'll attach the .hgrc i used; it excludes a lot of CWSes that are marked as > "integrated" or "deleted" in EIS (which is a database and a web UI to manage > CWS metadata); these are also automatically deleted on the HG server after > some time. I've checked in a list of all the CWSs from the Oracle repository. If there are some CWSs that we *know* that we don't want, then please comment them out from that file (and preferably, with a short explanation why). That will definitely help the overall conversion process, if we don't have to process a bunch of the CWS repositories. > oh, just noticed it doesn't include all the l10n repositories. > i think we need those as well. > with Branchmirror probably a second config file is required, because l10n is > a separate master repo. > (since DEV300m101 a master/CWS consists of 2 repositories, one for all the > bulky translations, one for the stuff i work on :) I don't understand this part. DEV300 is the master repo, right? Are you saying that there is a *separate* repository for the l10n data? > of course cloning all the CWSes individually is different from what Heiner > suggested above, but i think it's useful as a backup, and you can experiment > much better if you have this as an intermediate step and don't have to > download everything again. Right. The script that I've started assumes you've cloned all of these repositories locally. We need to be able to work through this process as a community. That means developing some scripts so that *everybody* can replicate what we're going to extract from the Oracle repositories and import into Apache. > my totally unsubstantiated guess is that one HG repo with all CWSes pulled > in would be ~3 GB. Wow. Cool. I was very worried about total space for these things. Keeping it to repository-only (eg. "clone -U") and ensuring hardlinks are used, then yah: space and time should be greatly reduced. I appreciate the pointers! The problem seems much more approachable. Cheers, -g