subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Schaber <m.scha...@3s-software.com>
Subject AW: Compressed Pristines (Summary)
Date Mon, 02 Apr 2012 09:30:38 GMT
Hi, Ashod,

First, thanks for your great summary. I'll throw in just my 2 cents below.

> Von: Ashod Nakashian [mailto:ashodnakashian@yahoo.com]
 
> Pristine files currently incur almost 100%[2] overhead both in terms of
> disk footprint and file count in a given WC. Since pristine files is a
> design element of SVN, reducing their inherent overhead should be a
> welcome improvement to SVN from a user's perspective. Due to the nature of
> source files that tend to be small, the footprint of a pristine store (PS)
> is larger on disk than the actual total bytes because of internal
> fragmentation (file-system block-size rounding waste) - see references for
> numbers.

Was any of those tests actually executed on a file system supporting something like "block
suballocation", "tail merging" or "tail packing"?

Today, I was rather surprised that my pristine subdir of one of our main projects which contains
726 MB of data has an actual disk size of 759 MB, which leads to an overhead of less than
4% due to block-size rounding. (According to the Explorer "Properties" dialog of Win 7 on
a NTFS file system.)

AFAICS, "modern" file systems increasingly support that kind of feature[1], so we should at
least think about how much effort we want to throw at the "packing" part of the problem if
it's likely to vanish (or, at least, being drastically reduced) in the future. My concern
is that storing small pristines in their own SQLite database will also bring some overhead
that may be in the same magnitude of 4%, due to SQLite Metadata, the necessary primary key
column, and indexing.

Additionally, the simple and efficient way of storing the pristines in a SQLite database (one
blob per file) also prevents us from exploiting inter-file redundancies during compression,
while adding a packing layer on top of sqlite leads to both high complexity and a large average
blob size, and large blobs are probably more efficiently handled by the FS directly.

To cut it short: I'll "take" whatever solution emerges, but my gut feeling tells me that we
should use plain files as containers, instead of using sqlite.

The other aspects (grouping similar files into the same container before compression, applying
a size limit for containers, and storing uncompressible files in uncompressed containers)
are fine as discussed.

I'll try to run some statistics using publicly available projects on an NTFS file system,
just for comparision.

Best regards

Markus Schaber

[1]: http://msdn.microsoft.com/en-us/library/windows/desktop/ee681827%28v=vs.85%29.aspx claims
tail packing support for NTFS. http://en.wikipedia.org/wiki/Block_suballocation claims support
for BtrFS, ReiserFS, Reiser4, FreeBSD UFS2. And AFAIR, XFS has a similar feature. Sadly, Ext[2,3,4]
are not on that list yet, but rumors claim that Ext4 is to be replaced by BtrFS in the long
run.

-- 
___________________________
We software Automation.

3S-Smart Software Solutions GmbH
Markus Schaber | Developer
Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax +49-831-54031-50

Email: m.schaber@3s-software.com | Web: http://www.3s-software.com 
CoDeSys internet forum: http://forum.3s-software.com
Download CoDeSys sample projects: http://www.3s-software.com/index.shtml?sample_projects

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten
HRB 6186 | Tax ID No.: DE 167014915 

Mime
View raw message