Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 41594 invoked from network); 31 Mar 2008 04:18:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Mar 2008 04:18:08 -0000 Received: (qmail 67900 invoked by uid 500); 31 Mar 2008 04:18:07 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 67879 invoked by uid 500); 31 Mar 2008 04:18:07 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 67868 invoked by uid 99); 31 Mar 2008 04:18:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Mar 2008 21:18:07 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [192.18.19.6] (HELO sineb-mail-1.sun.com) (192.18.19.6) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2008 04:17:14 +0000 Received: from fe-apac-05.sun.com (fe-apac-05.sun.com [192.18.19.176] (may be forged)) by sineb-mail-1.sun.com (8.13.6+Sun/8.12.9) with ESMTP id m2V4HW6W001506 for ; Mon, 31 Mar 2008 04:17:43 GMT Received: from conversion-daemon.mail-apac.sun.com by mail-apac.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <0JYK00601UK5EJ00@mail-apac.sun.com> (original mail from V.Narayanan@Sun.COM) for derby-user@db.apache.org; Mon, 31 Mar 2008 12:17:18 +0800 (SGT) Received: from [192.168.1.2] ([122.167.26.188]) by mail-apac.sun.com (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <0JYK006C0UKTKDHM@mail-apac.sun.com> for derby-user@db.apache.org; Mon, 31 Mar 2008 12:17:18 +0800 (SGT) Date: Mon, 31 Mar 2008 09:47:20 +0530 From: Narayanan Subject: Re: ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313)) In-reply-to: <47F02F3E.40807@nuix.com> Sender: V.Narayanan@Sun.COM To: Derby Discussion Message-id: <47F065D0.6010404@sun.com> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT References: <47F02F3E.40807@nuix.com> User-Agent: Thunderbird 2.0.0.6 (X11/20071022) X-Virus-Checked: Checked by ClamAV on apache.org Hi David, You might find the following links containing earlier discussions on the similar issue useful, http://www.nabble.com/invalid-checksum-tt9528741.html#a9528741 http://www.nabble.com/Derby-crash-%28urgent%29-tt16217446.html#a16265491 https://issues.apache.org/jira/browse/DERBY-2475 Narayanan David Sitsky wrote: > I have an intensive data-processing application which utilises Apache > Lucene and Derby, using 6 quad-core machines running Vista SP1 and/or > Vista Server 2008. > > I have found after 5 or 10 hours of processing, one or a couple of my > worker processes start reporting the following error in the derby.log > file: > > ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313)) > > The worker process never seems to recover. Derby locates the error, > reboots the database, but seems to inevitably report the same error > again. It is always page 1313, and what is extra strange is it > doesn't matter which machine it occurs on, it is always page 1313! I > know 13 is unlikely, but twice is a row must be extra unlucky. :) > > The quad-core machines have been configured with both hardware and > software raid, but the same error has been seen. Windows does not > report any disk errors in the event log. > > The error is difficult to reproduce. My runs typically run for 24 > hours, involving 22 separate JVM processes spread across the machines, > each running their own Derby embedded database. Sometimes I can get > through the run without any issues - sometimes I might see one or two > processes with this issue, and it seems to pick a different quad-core > machine each time, so the possibility of a hardware error seems like > unlikely, especially given it is always page 1313. > > I have tried both 10.3.1.4 and 10.3.2.1 with the same results. > > Lucene doesn't report any problems with its index, so given all the > above evidence, I am starting to lean more to a software issue than > hardware. > > I have attached three derby.log files from different machines. Does > anyone have any ideas what might be causing this? >