db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Sitsky <s...@nuix.com>
Subject Re: ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313))
Date Mon, 31 Mar 2008 05:20:32 GMT
Hi Narayanan,

Yes I have seen those links already.  I have spent quite a bit of time 
confirming that my hardware is not at fault before posting here.

I think you'll agree to see exactly the same page number failing on 3 
separate machines lends itself more to being a software issue than a 
hardware one.

The OS has not reported any disk issues at all.

Cheers,
David

Narayanan wrote:
> Hi David,
> 
> You might find the following links containing earlier discussions on the 
> similar issue useful,
> 
> http://www.nabble.com/invalid-checksum-tt9528741.html#a9528741
> 
> http://www.nabble.com/Derby-crash-%28urgent%29-tt16217446.html#a16265491
> 
> https://issues.apache.org/jira/browse/DERBY-2475
> 
> Narayanan
> 
> David Sitsky wrote:
>> I have an intensive data-processing application which utilises Apache 
>> Lucene and Derby, using 6 quad-core machines running Vista SP1 and/or 
>> Vista Server 2008.
>>
>> I have found after 5 or 10 hours of processing, one or a couple of my 
>> worker processes start reporting the following error in the derby.log 
>> file:
>>
>> ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313))
>>
>> The worker process never seems to recover.  Derby locates the error, 
>> reboots the database, but seems to inevitably report the same error 
>> again.  It is always page 1313, and what is extra strange is it 
>> doesn't matter which machine it occurs on, it is always page 1313!  I 
>> know 13 is unlikely, but twice is a row must be extra unlucky. :)
>>
>> The quad-core machines have been configured with both hardware and 
>> software raid, but the same error has been seen.  Windows does not 
>> report any disk errors in the event log.
>>
>> The error is difficult to reproduce.  My runs typically run for 24 
>> hours, involving 22 separate JVM processes spread across the machines, 
>> each running their own Derby embedded database.  Sometimes I can get 
>> through the run without any issues - sometimes I might see one or two 
>> processes with this issue, and it seems to pick a different quad-core 
>> machine each time, so the possibility of a hardware error seems like 
>> unlikely, especially given it is always page 1313.
>>
>> I have tried both 10.3.1.4 and 10.3.2.1 with the same results.
>>
>> Lucene doesn't report any problems with its index, so given all the 
>> above evidence, I am starting to lean more to a software issue than 
>> hardware.
>>
>> I have attached three derby.log files from different machines.  Does 
>> anyone have any ideas what might be causing this?
>>


-- 
Cheers,
David

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://www.nuix.com                            Fax: +61 2 9212 6902

Mime
View raw message