subversion-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Phippard <markp...@gmail.com>
Subject Re: Efficiency of rep-sharing (deduplication) in 1.8 and later
Date Fri, 12 Sep 2014 15:24:43 GMT
On Fri, Sep 12, 2014 at 11:17 AM, Thomas Harold <thomas-lists@nybeta.com>
wrote:

> I have a question about how efficient SVN is at de-duplication within a
> repository with regards to files that appear in multiple locations, but
> which have the same content.
>
> I know a small improvement was made in 1.8...
>
> http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements
>
> > When representation sharing has been enabled, Subversion 1.8 will now
> > be able to detect files and properties with identical contents within
> > the same revision and only store them once. This is a common
> > situation when you for instance import a non-incremental dump file or
> > when users apply the same change to multiple branches in a single
> > commit.
>
> #1 - If a commit puts files A, B and C into the repository, and a latter
> commit puts files B, C and D into the repository at a different
> location, is SVN smart enough to realize that B and C are already stored
> in the repository?
>
> In other words, does it track each individual file separately, even if
> they were all part of one big revision?
>

Representation cache is based on the sha of the rep.  So it does not matter
what the filename is or where it is stored.  If it has the same sha as an
existing rep, then it will be be shared.

The small improvement in 1.8 was simply to do this for files being added
within the same revision, but the other scenario was already supported.

I think it is worth pointing out that a rep is not necessarily a "file".
 It is the specific delta that SVN would be storing in the repository DB.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Mime
View raw message