subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashod Nakashian <ashodnakash...@yahoo.com>
Subject Re: Compressed Pristines (Summary)
Date Wed, 04 Apr 2012 18:38:57 GMT

>________________________________
> From: Mark Phippard <markphip@gmail.com>
>To: Ashod Nakashian <ashodnakashian@yahoo.com> 
>Cc: Daniel Shahaf <danielsh@elego.de>; Markus Schaber <m.schaber@3s-software.com>;
"julianfoad@btopenworld.com" <julianfoad@btopenworld.com>; "mtherieau@gmail.com" <mtherieau@gmail.com>;
Subversion Development <dev@subversion.apache.org> 
>Sent: Wednesday, April 4, 2012 9:23 PM
>Subject: Re: Compressed Pristines (Summary)
> 
>On Wed, Apr 4, 2012 at 1:18 PM, Ashod Nakashian
><ashodnakashian@yahoo.com> wrote:
>
>> That's an easy question. The answer is that at *best* they'll do as good as in-place
compression. However, in practice
>> they'll do much worse. The reason is that the OS level compression works on not only
the single file level, but actually
>> at the block level. This is to make modifications reasonably fast (read compressed
data, uncompress, modify, write
>> recompressed data). If the complete file is compressed then even changing a single
byte (neglecting that no storage
>> works on the byte-level anyway) will yield performance that will at least linearly
degrade by the filesize.
>
>FWIW, that is exactly my concern with your custom file format.  I do
>not see how you can achieve the benefits you expect without needing to
>repack files and I do not see how that can perform reasonably.

That's the tricky part of course. To attack this problem we need to strike a balance between
pack size and how aggressively we repack to regain wasted "holes". It's not difficult to find
a good middle-ground because working with a few MBs is reasonably fast (please see the estimations
on size/speed in the proposal) and the waste of a full block is negligible for a file of even
1MB of size. I'm oversimplifying to convey a point: we don't need optimality, we need a practical
approach that yields the biggest bang for our buck. And as far as that goes, I'm in agreement
with the sentiment of settling for the easiest solution that gets us the farthest. It's just
that we haven't yet reached consensus on what that is! :-)

>
>That said, you also seem aware that the solution has to perform well
>so at worst it is just a question as to whether you want to spend the
>cycles to prove it can work and achieve all the goals.  I am skeptical
>but look forward to being wrong.
>

>The lazy part of me thinks storing files up to 32KB in SQLite and
>storing the rest as just single compressed files would give 99% of our
>users what they want and would be less likely to have issues.


I have to agree with you here. We just need to get working on having something that can actually
work and verified to meet this goal. If we can do that *with decent performance* then we have
a clear winner. 

-Ash

>
>-- 
>Thanks
>
>Mark Phippard
>http://markphip.blogspot.com/
>
>
> 


Mime
View raw message