From dev-return-33109-apmail-subversion-dev-archive=subversion.apache.org@subversion.apache.org Wed May 27 09:00:31 2015 Return-Path: X-Original-To: apmail-subversion-dev-archive@minotaur.apache.org Delivered-To: apmail-subversion-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C98091805F for ; Wed, 27 May 2015 09:00:31 +0000 (UTC) Received: (qmail 19971 invoked by uid 500); 27 May 2015 09:00:15 -0000 Delivered-To: apmail-subversion-dev-archive@subversion.apache.org Received: (qmail 19918 invoked by uid 500); 27 May 2015 09:00:15 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 19901 invoked by uid 99); 27 May 2015 09:00:15 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 May 2015 09:00:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 24F1EC8FBA for ; Wed, 27 May 2015 09:00:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.821 X-Spam-Level: X-Spam-Status: No, score=-0.821 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id VSf4IL9oK6qN for ; Wed, 27 May 2015 09:00:14 +0000 (UTC) Received: from mail-wg0-f53.google.com (mail-wg0-f53.google.com [74.125.82.53]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 1C09524BBB for ; Wed, 27 May 2015 09:00:14 +0000 (UTC) Received: by wgez8 with SMTP id z8so3422893wge.0 for ; Wed, 27 May 2015 01:58:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=oOG3OJjC8TZ9I1Zw6vTAUr/M/3DI5tFw41QEdMoUuaI=; b=UxePCk6Fde3i/RUv8jST4vzIVCqqy4H5eCDpvp7Sq4MUtMhTMcctcngWKpgXHoJNog Ly1VQyTKLKyVQQecUOPvbb2Xw7Wlk4Et+fm/V5oe/b58aPlk/NZE1t4k2Bw83i9k8t+S 7jGWa2BhztrQCmtPf8f59QQ5HvBIhW1mmyxpNgs+2rlGudep5zh1EsMpLyU95Bn8Wdot K9Rq4923/eqLsw1GO/Ih3Due5FP2hZH0EehI72djy1Zu84Rujy76hd9N6NnFE/dFQ7ux C2pLYduRK5tCCL6IA+Y3hNnC9zQgNDQ7vL4gI+czVfuCzwZj4A6IDB2I5tEIvMaIV9ni dfJQ== X-Received: by 10.194.11.73 with SMTP id o9mr58220591wjb.116.1432717122828; Wed, 27 May 2015 01:58:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.51.38 with HTTP; Wed, 27 May 2015 01:58:22 -0700 (PDT) In-Reply-To: <5564eee9.5203c20a.0a9d.ffffde22@mx.google.com> References: <5564eee9.5203c20a.0a9d.ffffde22@mx.google.com> From: Julian Foad Date: Wed, 27 May 2015 09:58:22 +0100 Message-ID: Subject: Re: Queries about rep cache: get_shared_rep() To: Bert Huijben Cc: dev Content-Type: text/plain; charset=UTF-8 Hi Bert. Bert Huijben wrote: > On Windows opening the file is sensitive to outside interactions and may > trigger a retry loop. [...] A filestat is +- a constant time operation that > doesn't have these problems. OK... > So this really depends on how common all cases are. If not existing or > possibly locked is common, then statting first could certainly be useful... > > But if it is +- an error if the file doesn't exist and the file must be > opened in almost every case then the open and handle errors keeps the code > clean. In this case we're looking to see if the rep is a duplicate of one that has already been inserted in the current transaction. In the majority of cases in typical work flows, the requested file will not exist. Therefore I think it is safe to say that the current code (statting first) will be reasonably efficient overall. However, I don't think that it would necessarily be inefficient to just try opening the file. It's important to distinguish *requesting* to open a file at a path where a file may or may not be present and accessible, from actually opening the file. At what point do these "outside interactions" typically occur? I spent a few minutes Googling ("file system filter driver" and "IRP_MJ_CREATE" are relevant terms) but couldn't find an answer. I imagine if the file does not exist then there would *not* be significant delays and retries due to waiting for a virus scanner (etc.) if we simply tried to open it. In conclusion, it seems to me now that the code is fine as it is, and the question of whether just trying to open a non-existent file would be generally efficient or inefficient is of only academic interest at this point. - Julian > From: Julian Foad > Index: subversion/libsvn_fs_fs/transaction.c > =================================================================== > --- subversion/libsvn_fs_fs/transaction.c (revision 1681856) > +++ subversion/libsvn_fs_fs/transaction.c (working copy) > @@ -2243,12 +2243,14 @@ (representation_t **old_re > const char *file_name > = path_txn_sha1(fs, &rep->txn_id, rep->sha1_digest, scratch_pool); > > /* in our txn, is there a rep file named with the wanted SHA1? > If so, read it and use that rep. > */ > + /* ### Would it be faster (and so better) to just try reading it, > + and handle ENOENT, instead of first checking for presence? */ > SVN_ERR(svn_io_check_path(file_name, &kind, scratch_pool)); > if (kind == svn_node_file) > { > svn_stringbuf_t *rep_string; > SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name, > scratch_pool)); > @@ -2262,14 +2264,20 @@ get_shared_rep(representation_t **old_re > > /* A simple guard against general rep-cache induced corruption. */ > if ((*old_rep)->expanded_size != rep->expanded_size) > { > /* Make the problem show up in the server log. > > - Because not sharing reps is always a save option, > + Because not sharing reps is always a safe option, > terminating the request would be inappropriate. > + > + ### [JAF] Why should we assume the cache is corrupt rather than the > + new rep is corrupt? Is this really the logic we want? > + > + Should we do something more forceful -- flag the cache as > + unusable, perhaps -- rather than just logging a warning? > */