incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stahl <...@openoffice.org>
Subject Re: fetch-all-cws.sh (was: Building a single Hg repository)
Date Fri, 01 Jul 2011 20:58:24 GMT
On 01.07.2011 13:42, Greg Stein wrote:
> On Wed, Jun 29, 2011 at 05:04, Michael Stahl<mst@openoffice.org>  wrote:
>> ...
>> in principle the size of a CWS is on the same order as the master, because
>> it's just another HG repository.
>>
>> but HG supports hardlinks between repositories (in newer versions even on
>> win32), so you can "hg clone" the master on the same filesystem and then
>> pull in the CWS, and it will be _much_ faster and take _much_ less
>> additional space
>
> This is the approach that I took. Please look at
> tools/dev/fetch-all-cws.sh. Each of these CWS repositories (on Mac OS)
> are consuming 600 Mb *minimum*. I've fetched a dozen, and a couple are
> over 2 Gb each, and another over 1 Gb. And this is with the clone/pull
> technique.

indeed, i get similar numbers.
a clone with hardlinks is 34 MB on my filesystem.
a CWS with a single changeset takes 670MB.
reason is that for every commit, 2 files that store changelogs and 
manifests are modified, and together these are >600 MB in our repo.

> I don't have enough space on my laptop to do a complete trial run. I'm
> hoping that somebody can figure out how to reduce the disk footprint,
> or determine that we just have to suck it up. And it would be nice to
> understand what that target size will be, for all 250 CWS
> repositories.

i think i wrote that all CWSes as HG repos take ~100 GB, but actually i 
now think i remembered wrong and the number was more like ~150 GB.
(i did this originally in 2 steps, and i remembered only the second step...)
(and if it weren't so late now i'd even dig out my external hd and run 
du...)

of course the filesystem used could make a difference here.

but actually i think that a lot of these 250 CWSes will not contain a 
changeset that is not in the master already; a lot of developers create 
new CWS and then (have to) work on something else for some weeks...

so i have adapted the fetch script to skip empty CWSes.

> A possible alternative to pulling N repositories, then combining them
> in a second step, is to attempt to bring them all into a single
> repository, one at a time. This is a little more scary for me, not
> knowing Hg, to understand how restartable and repeatable this process
> will be in the face of errors. Either starting from scratch, or (I
> believe an important feature) if it needs to be resumed after some
> minor failure (eg. network failure).

this would of course take much less space, but then it would be 
necessary to mark the newly pulled head immediately to know which CWS it 
corresponds to.

> We have a script. It is time to make it work.
>
> Michael: you say that some CWS repositories are useless. If so, then
> please update tools/dev/cws-list.txt to comment-out those CWS's with
> some explanation for future readers. No need for us to attempt to
> process them if they're bogus.

i have checked the status in EIS, and it seems like the repos for almost 
all integrated/deleted CWSes have already been automatically removed 
from the server.
found a couple that were in a state "cancelled", which i didn't even 
know existed, sounds like we don't need those, so i've commented them out.

of course some CWSes contain stuff that's not useful, but i don't know 
which these are :)

-- 
"Fools ignore complexity.  Pragmatists suffer it.  Some can avoid it.
  Geniuses remove it." -- Alan J. Perlis


Mime
View raw message