db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Narayanan <V.Naraya...@Sun.COM>
Subject Re: ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313))
Date Mon, 31 Mar 2008 04:17:20 GMT
Hi David,

You might find the following links containing earlier discussions on the 
similar issue useful,





David Sitsky wrote:
> I have an intensive data-processing application which utilises Apache 
> Lucene and Derby, using 6 quad-core machines running Vista SP1 and/or 
> Vista Server 2008.
> I have found after 5 or 10 hours of processing, one or a couple of my 
> worker processes start reporting the following error in the derby.log 
> file:
> ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313))
> The worker process never seems to recover.  Derby locates the error, 
> reboots the database, but seems to inevitably report the same error 
> again.  It is always page 1313, and what is extra strange is it 
> doesn't matter which machine it occurs on, it is always page 1313!  I 
> know 13 is unlikely, but twice is a row must be extra unlucky. :)
> The quad-core machines have been configured with both hardware and 
> software raid, but the same error has been seen.  Windows does not 
> report any disk errors in the event log.
> The error is difficult to reproduce.  My runs typically run for 24 
> hours, involving 22 separate JVM processes spread across the machines, 
> each running their own Derby embedded database.  Sometimes I can get 
> through the run without any issues - sometimes I might see one or two 
> processes with this issue, and it seems to pick a different quad-core 
> machine each time, so the possibility of a hardware error seems like 
> unlikely, especially given it is always page 1313.
> I have tried both and with the same results.
> Lucene doesn't report any problems with its index, so given all the 
> above evidence, I am starting to lean more to a software issue than 
> hardware.
> I have attached three derby.log files from different machines.  Does 
> anyone have any ideas what might be causing this?

View raw message