Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CA+t0gk1DgQVhrU_mezbJGMMaA-TdRJxYUXNVOQSmGauSK+QPUg@mail.gmail.com>
References: 
 <CAEcU=1aoCRNNA1A9axvgwZGefaxFzp0TE7U-0ZqrgaBNtxBZhA@mail.gmail.com>
 <CA+t0gk1DgQVhrU_mezbJGMMaA-TdRJxYUXNVOQSmGauSK+QPUg@mail.gmail.com>
From: Julian Foad <julianfoad@gmail.com>
Date: Wed, 27 May 2015 12:11:16 +0100
Message-ID: 
 <CAEcU=1a3xq9EYudhz67PTn8ytjyju=+cHBvcVC+cDLsRY3vFGQ@mail.gmail.com>
Subject: Re: Queries about rep cache: get_shared_rep()
To: Stefan Fuhrmann <stefan.fuhrmann@wandisco.com>
Cc: dev <dev@subversion.apache.org>
Content-Type: text/plain; charset=UTF-8

Stefan Fuhrmann wrote:
> Julian Foad wrote:
>> +      /* ### Would it be faster (and so better) to just try reading it,
>> +             and handle ENOENT, instead of first checking for presence? */
>>        SVN_ERR(svn_io_check_path(file_name, &kind, scratch_pool));
>>        if (kind == svn_node_file)
>>          {
>>            svn_stringbuf_t *rep_string;
>>            SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name,
>>                                             scratch_pool));
>
> This checks for duplicate representations within the same txn.
> It mainly finds them in a-typical or infrequent scenarios [...]
>
> I personally prefer checking for "preconditions" over just trying
> and then dissecting various error cases. Apart from that style
> issue, frequently constructing error codes can be expensive
> (when messages are localized).

Ack -- this way is reasonable in a case like this where it's usually
expected to fail. (It does have a disadvantage, which is that the
(rare) failure mode where the file becomes unreadable between the
check and the open will never get test coverage.)

>> @@ -2262,14 +2264,20 @@ get_shared_rep(representation_t **old_re
>>
>>    /* A simple guard against general rep-cache induced corruption. */
>>    if ((*old_rep)->expanded_size != rep->expanded_size)
>>      {
[...]
>> +         ### [JAF] Why should we assume the cache is corrupt rather than the
>> +                   new rep is corrupt? Is this really the logic we want?
>
> Hm. We don't say "the cache is corrupt" but log a warning with
> a "FS corruption" error code and tell the user that there is an
> inconsistency / mismatch.

I meant that we proceed with storing the new data into the repository
(ignoring the cache), which assumes the new data is correct and the
cache is wrong.

> It is simply our experience so far that it is more likely that
> the rep-cache.db contents is at fault rather than the rep
> we just calculated the sha1 for.

Acknowledged.

>> +                   Should we do something more forceful -- flag the cache as
>> +                   unusable, perhaps -- rather than just logging a warning?
>
> I don't have a particularly strong opinion on that one. I guess
> it is very unlikely that the mismatch has "legitimately" been
> caused by e.g. a SHA1 collision. Maybe, we should reset /
> clear the rep-cache.db at that point and say so in the warning.

I don't know what would be best. It's probably "good enough" based on
our empirical experience. But maybe we should think in terms of what
guarantees we should expect from the cache. It doesn't have to return
a result, but if it does then the result *must* be correct. If one
result is demonstrably wrong (as in this code block) then maybe we
should work on the assumption that the whole cache is bad, rather than
run the risk that some other entries are wrong but happen to have a
matching expanded-size.

- Julian