Return-Path: X-Original-To: apmail-subversion-dev-archive@minotaur.apache.org Delivered-To: apmail-subversion-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1127E7363 for ; Sun, 2 Oct 2011 20:29:22 +0000 (UTC) Received: (qmail 13543 invoked by uid 500); 2 Oct 2011 20:29:21 -0000 Delivered-To: apmail-subversion-dev-archive@subversion.apache.org Received: (qmail 13509 invoked by uid 500); 2 Oct 2011 20:29:21 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 13502 invoked by uid 99); 2 Oct 2011 20:29:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Oct 2011 20:29:21 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.111.4.26] (HELO out2.smtp.messagingengine.com) (66.111.4.26) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Oct 2011 20:29:15 +0000 Received: from compute4.internal (compute4.nyi.mail.srv.osa [10.202.2.44]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 7519E252CA; Sun, 2 Oct 2011 16:28:54 -0400 (EDT) Received: from frontend1.nyi.mail.srv.osa ([10.202.2.160]) by compute4.internal (MEProxy); Sun, 02 Oct 2011 16:28:54 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=date:from:to:cc:subject:message-id :references:mime-version:content-type:in-reply-to; s=smtpout; bh=12THJ+QiEBwS7+e163SA1W0KdzQ=; b=qdpMe4U1+CyG2UoHK4La3Ywj3Q2H 1UbLXQ9FVflpV+YkOyRwufbdU3suO8G+Qs5SJIoNGACDLCOVsybXgLuBhny6LUFp gk3y7b7vV1RZ4zoUDWPDXhvSj/sQxGl6VztLuRJyM+rC9n6oAHR2z7AdzDc2nuuF 7uxgb+uwC92Ptp4= X-Sasl-enc: 9JEuszIzq7qZpv12YDAOhdJQ+b5v5Mb8ICP0mJsfDKepwiuBKNJEvng2bRc7nQ 1317587333 Received: from daniel3.local (bzq-79-183-219-200.red.bezeqint.net [79.183.219.200]) by mail.messagingengine.com (Postfix) with ESMTPSA id E0532781592; Sun, 2 Oct 2011 16:28:52 -0400 (EDT) Date: Sun, 2 Oct 2011 22:28:33 +0200 From: Daniel Shahaf To: dev@subversion.apache.org Cc: infrastructure@apache.org Subject: First theory for explaining both kinds of corruption Re: Fwd: [Daniel Shahaf: Long-standing corruption on svn.apache.org] Message-ID: <20111002202833.GA12373@daniel3.local> References: <20110930171857.GA3633@daniel3.local> <20111002182635.GA11238@daniel3.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111002182635.GA11238@daniel3.local> User-Agent: Mutt/1.5.20 (2009-06-14) tldr: The first part of this mail lists a few more instances. The second half of this mail offers a theory that explains both modes of corruption. --- I've run 'svn log -ql 7000' for the following fspaths and repositories: asf:/hadoop asf:/openejb asf:/james asf:/jackrabbit asf:/karaf asf:/archiva asf:/hbase asf:/cxf asf:/tomcat asf:/incubator asf:/subversion infra:/websites infra:/websites/production/www and I found two more instances: r931481 in ^/jackrabbit and r1136942 in ^/cxf. [[[ r891679 | julianfoad | 2009-12-17 12:48:09 +0000 (Thu, 17 Dec 2009) r891677 | stylesen | 2009-12-17 12:47:52 +0000 (Thu, 17 Dec 2009) r891672 | stylesen | 2009-12-17 12:30:43 +0000 (Thu, 17 Dec 2009) r965497 | rhuijben | 2010-07-19 14:26:54 +0000 (Mon, 19 Jul 2010) r965496 | cmpilato | 2010-07-19 14:26:50 +0000 (Mon, 19 Jul 2010) r965495 | artagnon | 2010-07-19 14:26:43 +0000 (Mon, 19 Jul 2010) r931481 | mduerig | 2010-04-07 09:41:50 +0000 (Wed, 07 Apr 2010) r931480 | jukka | 2010-04-07 09:41:48 +0000 (Wed, 07 Apr 2010) r931479 | jukka | 2010-04-07 09:36:17 +0000 (Wed, 07 Apr 2010) r1136942 | dkulp | 2011-06-17 17:11:26 +0000 (Fri, 17 Jun 2011) r1136941 | dkulp | 2011-06-17 17:11:25 +0000 (Fri, 17 Jun 2011) r1136938 | dkulp | 2011-06-17 17:01:54 +0000 (Fri, 17 Jun 2011) ]]] In all cases, eris jumps from the youngest revision of the triplet to the oldest, while harmonia gives all three revisions of the triplet, during a 'svn log -ql file://$REPOS_ROOT/$tlp' run. --- I also found a ridiculously bogus instance in one of the minfo-cnt blowup revisions from my previous email: the noderev of ^/@r908653 thinks its predecessor is 0.0.r908626/17893... --- Analysis: it seems that in all cases, there is a relatively small time gap between the two youngest revisions in the triplet. In two cases, namely r891679 and r908653, the corruption of predecessors is accompanied by corruption of minfo-cnt. However, r1136942 is not accompanied by a similar corruption, nor is it preceded by a O(100) files commit. Theory: there are two independent bugs: one corrupts the predecessors when two commits are made in quick succession, and another causes the minfo-cnt values to become corrupt when a commit quickly follows a O(100)-file commit --- i.e., in other words, follows a commit that would have triggered issue #3506 (also known as INFRA-2261), a failure to update rep-cache.db. The first issue doesn't trigger on harmonia because it doesn't have cache disks in its zpool, and the second doesn't because the svnsync process that commits to harmonia's repositories takes an out-of-band lock to ensure that at most one svnsync process runs at any given moment.