subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johan Corveleyn <jcor...@gmail.com>
Subject Re: Populating the rep-cache
Date Fri, 29 May 2015 07:50:13 GMT
On Thu, May 28, 2015 at 6:00 PM, Stefan Fuhrmann
<stefan.fuhrmann@wandisco.com> wrote:
> On Wed, May 27, 2015 at 8:14 PM, Philip Martin <philip.martin@wandisco.com>
> wrote:
>>
>> Julian Foad <julianfoad@gmail.com> writes:
>>
>> > Stefan Fuhrmann wrote:
>> >> * clear the rep-cache.db
>> >
>> > Clearing the cache and continuing operation may make subsequent
>> > commits much larger than they should be, and there is no easy way to
>> > undo that if it happens.
>>
>> I've been thinking of writing some code to populate the rep-cache from
>> existing revisions.  This code would parse the revision, a bit like
>> verify, identify checksums in that revision and add any that are found
>> to the rep-cache.  This would be time consuming if run on the whole
>> repository but would run perfectly well in a separate process while the
>> repository remains live.  It could also be run over a revision range
>> rather than just the whole repository, and running on a single revision
>> such as HEAD would be fast.
>
>
> Makes sense.
>
>>
>> I believe the code will be relative straightforward, if anything it is
>> the API that is more of a problem.
>>
>>  - We could add a public svn_fs_rep_cache().  This is backend specific
>>    but there is precedent: we have svn_fs_berkeley_logfiles() and
>>    svn_fs_pack().
>>
>>  - We could add a more general svn_fs_optimize().  This would do backend
>>    specific optimizations that may change in future versions.  Perhaps
>>    passing backend-specific flags?
>
>
> I think svn_fs_optimize(bool online) would make sense
> in the longer term.
>
> In the "offline" case, it could do anything from removing
> duplicate reps as we build the cache to sharding repos
> or repacking shards. Not that I would want to implement
> any of that soon.

I was wondering about that too. I think repopulating the rep-cache
(without the need to take the repos offline) is very interesting, but
I immediately think: functionality to repopulate the rep-cache *and*
(optionally) rewrite rev files to let them use rep sharing (i.e.
effectively deduplicating the repository) ... that would be even
better.

But big +1 on the initial idea already for offering the ability to
rebuild a broken rep-cache (without having to dump/load).

-- 
Johan

Mime
View raw message