subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Reser <...@reser.org>
Subject Re: upgrade_tests.py 29 spurious failure while testing 1.7.17
Date Tue, 13 May 2014 15:58:38 GMT
On 5/13/14, 12:43 AM, Johan Corveleyn wrote:
> First, thanks a lot for taking a look and giving a plausible
> explanation. It's a possibility, but I'm not fully convinced yet :-).
> 
> Pro:
> - It fits theoretically (the one bit off etc).
> - It's the only explanation so far. And IIUC cache corruption is ruled out.
> - The machine is getting old (almost 8 years now -- I think the memory
> is 5 or 6 years old). Its operating system (WinXP) is EOL.
> 
> Con:
> - I've had zero stability issues with my machine so far. No crashes,
> no bluescreens. Not one for as far as I can remember.
> - I've been testing / signing svn releases for a couple of years. No
> problem, until the last two release cycles or so.
> - Ran memtest86 (version 4.0.0 that I still had on some boot CD) last
> night. It ran for 8 hours. No errors.
> 
> So either my machine really has a memory problem, or it's a unique
> machine that can (rarely) reproduce a bug in Subversion. I'm still not
> sure. If it's the latter it would be a waste to throw it in the trash
> :-). OTOH, if it's such a rare issue that nobody else is seeing this,
> maybe it's not worth further precious time (of me and you and others)
> ...
> 
> I'll continue pounding it a bit more, but I'll probably give up at
> some point (not determined yet).

There are two possibilities that while also still memory corruption don't
necessarily mean there's anything wrong with your memory or Subversion.  These
are kinda out there but plausible.

1) An alpha particle hit your machine and flipped the bit.  If you aren't using
ECC memory this is more likely than if you aren't.  See this question on stack
overflow which has a bunch of useful links:
http://stackoverflow.com/questions/4109218/do-gamma-rays-from-the-sun-really-flip-bits-every-once-in-a-while

2) The kernel on XP had a bug and flipped the bit in your memory.  This one
seems less plausible.

But back to the original theory.  Running memtest86 for 8 hours isn't
necessarily indicative of no hardware problems.  The problem with memory issues
is that they often cause problems in a bit when another nearby piece of memory
is manipulated in a particular way.  Memtest86 is aware of these things and
tries to run the patterns that cause the issues, but it takes a long time to
try them all.  If this is the case you'd actually be able to reproduce the
problem but only if the memory positions lined up just right again.

At this point I'd stop bothering to try and find the cause.  If you get further
test failures or strange behaviors then I'd look into it further.


Mime
View raw message