subversion-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gert Kello <gert.ke...@gmail.com>
Subject Re: Svn 1.9 repository 20% bigger than svn 1.8 repository
Date Fri, 29 Jan 2016 10:17:23 GMT
> I have a svn 1.9 repository, created with svnsync, that has ~150000

> > revisions and size about 45 GB.
>
> 300kB/rev is quite large, like >1 MB of changes before
> compression - on average.  Are these office documents,
> large xml / html files or simply many files per commit?
>
>
The content is mixed. Quite many small, source code commits. But office
documents and zip archives as well. There are even few extremely huge
commits, biggest one is 3+GB, one 800+MB and one 500+MB (as per revision
file size in db/revs folder)



> > Due to some issues in svn-all-fast-export I
> > wanted to have svn 1.8 version repository so I downgraded it by doing
> > svnadmin (v 1.9) dump /svnadmin (v 1.8) load cycle. I was surprised that
> > the size of v 1.8 repository is "only" 37.5 GB
> > I tried to compare content of db\revs folder: some files are bigger in
> 1.8
> > repo, some in 1.9 repo.
>
> For the record: you already said elsewhere in this
> thread that you used 1.8 to create the 1.8 repo and
> 1.9 for the 1.9.  I also assume standard settings
> as in "no fsfs.conf tweaks".
>
>
Correct.


> There is a simple way to compare the "content size"
> your repositories.  Run the 1.9 svnfsfs tool on both:
>
> svnfsfs stats -M 1000 /path/to/repo > /some/output/path
>
> It basically reads the whole repository, groups and
> aggregates the item sizes and produces a long report.
> Number of changes and node revision should be more
> or less (exactly?) the same.  If they are, you'll
> be good.
>
> "Representation" size is where the numbers will differ.
> Looking at the differences in detail, you should be able
> to pin down one or two file extensions that account for
> most of the increase.  It would be interesting to learn
> what is special about them ...
>

Yes, number of changes and number of node revision records are identical.
Number of representation do differ (1.744.149 @1.8 vs 1.901.312 @1.9)
The "nodes total", "directory noderevs" and "file noderevs" numbers are
identical

The "Largest representations:" sections shows that 1.9 has failed to
de-duplicate several files (executables in this case)

The "Extensions by number of representations:" shows that all extensions
have bigger number of representations in 1.9 repo

The size if representations is most increased for .exe and .pdf extensions,
where .exe causes 5GB increase and .pdf 500MB. Several types cause increase
~300MB, "others" have +1GB

The dump/load cycle into 1.9 is finished as well, now it is 36.2 GB (less
compared to 1.8 which was 37.5 GB). Both 1.9->1.9 and 1.8->1.9 resulted
almost identical repos when comparing files byte by byte (the exception is
UUID file)... Which makes me wonder if I dumped the same rep twice. Too bad
the windows cmd doesn't retain command history.

Gert

Mime
View raw message