subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Fuhrmann <stefan.fuhrm...@wandisco.com>
Subject Re: Issue #4588, part 1: FSFS access error
Date Tue, 25 Aug 2015 14:35:42 GMT
On Tue, Aug 25, 2015 at 12:47 AM, Evgeny Kotkov <evgeny.kotkov@visualsvn.com
> wrote:

> Stefan Fuhrmann <stefan.fuhrmann@wandisco.com> writes:
>
> > My current hypothesis is that the server did not get restarted after
> > replacing the repository.  Because we decided not to make the instance ID
> > part of the cache key, we could easily have picked up cached format 6
> data
> > for the format 7 repository.
>
> [...]
>
> > That said, are we still happy with the decision to not make the instance
> > ID part of the cache key? The rationale has basically been "fail early"
> > because failure to restart or reconfigure the server after the repo files
> > got modified might lead to any kind of unknown problems (much) further
> down
> > the road.
>
> As I see it, there are two separate problems:
>
[...]


> 2) The second part of the problem, to my mind, is the offset and item-based
>    addressing.  Irrespectively of whether we use instance IDs in the cache
>    keys, or not, I find it rather questionable that the same entry in the
>    cache can mean two different things, depending on how you look at it.
>

3 different things, in fact. Two in format7, two in older formats.
The 3 different addressing modes are:

1. Absolute position ("phys") in a rev / pack file.
   This is where we need to navigate to when reading data.
2. Offset within a revision. phys = manifest[ref] + offset
3. Item number within a revision: phys = L2P[rev, item]

So, we always need to translate pairs of (rev, item-index)
to phys. SVN 1.9 uses "item-index" as the umbrella term
for case 2. and 3. Only a single function, svn_fs_fs__item_offset,
deals with the differences between the two - modulo the
format specific ways to find the rev root node and changed
paths list.

In that sense, the cache content not only *means* the
same thing ("this is item xyz") but even uses the same
generalized data type. The problem here is simply that
the cache contents becomes invalid as soon as history
is being rewritten.


>    What happens if we're unlucky enough, and the offset in the revision
> file
>    also is a valid index in the l2p lookup table?  Is there something we
> can
>    do about it — say, associate the addressing type with the corresponding
>    cache entry?
>

Since they are homogenous throughout a repository,  we
could simply add the repo format, addressing mode and
sharding size (manifest cache vs. resharding) to the cache
key prefix.

The practical implications would be similar to adding the
instance ID, except working for pre-f7 repos and being
less specific. So, if we decide to add the instance ID to
the cache key, we should add the other parts as well.

-- Stefan^2.

Mime
View raw message