Return-Path: X-Original-To: apmail-subversion-dev-archive@minotaur.apache.org Delivered-To: apmail-subversion-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2366418415 for ; Wed, 27 May 2015 11:13:29 +0000 (UTC) Received: (qmail 73081 invoked by uid 500); 27 May 2015 11:12:25 -0000 Delivered-To: apmail-subversion-dev-archive@subversion.apache.org Received: (qmail 73027 invoked by uid 500); 27 May 2015 11:12:25 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 73017 invoked by uid 99); 27 May 2015 11:12:25 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 May 2015 11:12:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D4A8E1A34EE for ; Wed, 27 May 2015 11:12:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.821 X-Spam-Level: X-Spam-Status: No, score=-0.821 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Y6SB6blkYYN5 for ; Wed, 27 May 2015 11:12:24 +0000 (UTC) Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 85A3E2309F for ; Wed, 27 May 2015 11:12:23 +0000 (UTC) Received: by wizo1 with SMTP id o1so17739567wiz.1 for ; Wed, 27 May 2015 04:11:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=ZzKv9SXyS1+gJSvv49RdxPZvN6xwvpba6FOn3CNXbWw=; b=pWZzS0AZie8+0g4CmmhmxZyx7myXnFC52/9XmbrSDfIDbvQmDU4NmUTpwI4IFByeWq 1lfsWA3JE1d/6CVg9l0xAx3iapHIZC4sBnRuzjo3aH/fRPChIxTk/sbJ2j1Fzm56zzJU hW7qS74P3gFGFHW5l4YOdBm1bAcIyrUTqsXyhIwT+Y5USubd/iivh3I9Vao+o1+Ui6Sv JGwEC86VPm9Y8zAyAY9yQ+kLALcUhshJytR+2pXYsIgSAkR8LVMyO+sQPX1yDQfNkgOh ZplcgA0Jg+/Yv11KbO9gNO71wSlgLu9umvvGDKvcANn1vZzfh1o9R84Mlwg2hZUw/GJR msBQ== X-Received: by 10.180.95.10 with SMTP id dg10mr4987683wib.41.1432725097286; Wed, 27 May 2015 04:11:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.51.38 with HTTP; Wed, 27 May 2015 04:11:16 -0700 (PDT) In-Reply-To: References: From: Julian Foad Date: Wed, 27 May 2015 12:11:16 +0100 Message-ID: Subject: Re: Queries about rep cache: get_shared_rep() To: Stefan Fuhrmann Cc: dev Content-Type: text/plain; charset=UTF-8 Stefan Fuhrmann wrote: > Julian Foad wrote: >> + /* ### Would it be faster (and so better) to just try reading it, >> + and handle ENOENT, instead of first checking for presence? */ >> SVN_ERR(svn_io_check_path(file_name, &kind, scratch_pool)); >> if (kind == svn_node_file) >> { >> svn_stringbuf_t *rep_string; >> SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name, >> scratch_pool)); > > This checks for duplicate representations within the same txn. > It mainly finds them in a-typical or infrequent scenarios [...] > > I personally prefer checking for "preconditions" over just trying > and then dissecting various error cases. Apart from that style > issue, frequently constructing error codes can be expensive > (when messages are localized). Ack -- this way is reasonable in a case like this where it's usually expected to fail. (It does have a disadvantage, which is that the (rare) failure mode where the file becomes unreadable between the check and the open will never get test coverage.) >> @@ -2262,14 +2264,20 @@ get_shared_rep(representation_t **old_re >> >> /* A simple guard against general rep-cache induced corruption. */ >> if ((*old_rep)->expanded_size != rep->expanded_size) >> { [...] >> + ### [JAF] Why should we assume the cache is corrupt rather than the >> + new rep is corrupt? Is this really the logic we want? > > Hm. We don't say "the cache is corrupt" but log a warning with > a "FS corruption" error code and tell the user that there is an > inconsistency / mismatch. I meant that we proceed with storing the new data into the repository (ignoring the cache), which assumes the new data is correct and the cache is wrong. > It is simply our experience so far that it is more likely that > the rep-cache.db contents is at fault rather than the rep > we just calculated the sha1 for. Acknowledged. >> + Should we do something more forceful -- flag the cache as >> + unusable, perhaps -- rather than just logging a warning? > > I don't have a particularly strong opinion on that one. I guess > it is very unlikely that the mismatch has "legitimately" been > caused by e.g. a SHA1 collision. Maybe, we should reset / > clear the rep-cache.db at that point and say so in the warning. I don't know what would be best. It's probably "good enough" based on our empirical experience. But maybe we should think in terms of what guarantees we should expect from the cache. It doesn't have to return a result, but if it does then the result *must* be correct. If one result is demonstrably wrong (as in this code block) then maybe we should work on the assumption that the whole cache is bad, rather than run the risk that some other entries are wrong but happen to have a matching expanded-size. - Julian