subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Shahaf <...@daniel.shahaf.name>
Subject First theory for explaining both kinds of corruption Re: Fwd: [Daniel Shahaf: Long-standing corruption on svn.apache.org]
Date Sun, 02 Oct 2011 20:28:33 GMT
tldr: The first part of this mail lists a few more instances.  The
second half of this mail offers a theory that explains both modes of
corruption.

---

I've run 'svn log -ql 7000' for the following fspaths and repositories:

asf:/hadoop asf:/openejb asf:/james asf:/jackrabbit asf:/karaf
asf:/archiva asf:/hbase asf:/cxf asf:/tomcat asf:/incubator
asf:/subversion infra:/websites infra:/websites/production/www

and I found two more instances: r931481 in ^/jackrabbit and r1136942 in
^/cxf.

[[[
r891679 | julianfoad | 2009-12-17 12:48:09 +0000 (Thu, 17 Dec 2009)
r891677 | stylesen | 2009-12-17 12:47:52 +0000 (Thu, 17 Dec 2009)
r891672 | stylesen | 2009-12-17 12:30:43 +0000 (Thu, 17 Dec 2009)

r965497 | rhuijben | 2010-07-19 14:26:54 +0000 (Mon, 19 Jul 2010)
r965496 | cmpilato | 2010-07-19 14:26:50 +0000 (Mon, 19 Jul 2010)
r965495 | artagnon | 2010-07-19 14:26:43 +0000 (Mon, 19 Jul 2010)

r931481 | mduerig | 2010-04-07 09:41:50 +0000 (Wed, 07 Apr 2010)
r931480 | jukka | 2010-04-07 09:41:48 +0000 (Wed, 07 Apr 2010)
r931479 | jukka | 2010-04-07 09:36:17 +0000 (Wed, 07 Apr 2010)

r1136942 | dkulp | 2011-06-17 17:11:26 +0000 (Fri, 17 Jun 2011)
r1136941 | dkulp | 2011-06-17 17:11:25 +0000 (Fri, 17 Jun 2011)
r1136938 | dkulp | 2011-06-17 17:01:54 +0000 (Fri, 17 Jun 2011)
]]]

In all cases, eris jumps from the youngest revision of the triplet to
the oldest, while harmonia gives all three revisions of the triplet,
during a 'svn log -ql file://$REPOS_ROOT/$tlp' run.

---

I also found a ridiculously bogus instance in one of the minfo-cnt
blowup revisions from my previous email: the noderev of ^/@r908653
thinks its predecessor is 0.0.r908626/17893...

---

Analysis: it seems that in all cases, there is a relatively small time
gap between the two youngest revisions in the triplet.  In two cases,
namely r891679 and r908653, the corruption of predecessors is
accompanied by corruption of minfo-cnt.  However, r1136942 is not
accompanied by a similar corruption, nor is it preceded by a O(100)
files commit.

Theory: there are two independent bugs: one corrupts the predecessors
when two commits are made in quick succession, and another causes the
minfo-cnt values to become corrupt when a commit quickly follows
a O(100)-file commit --- i.e., in other words, follows a commit that
would have triggered issue #3506 (also known as INFRA-2261), a failure
to update rep-cache.db.  The first issue doesn't trigger on harmonia
because it doesn't have cache disks in its zpool, and the second doesn't
because the svnsync process that commits to harmonia's repositories
takes an out-of-band lock to ensure that at most one svnsync process runs
at any given moment.

Mime
View raw message