db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Sitsky (JIRA)" <j...@apache.org>
Subject [jira] Created: (DERBY-3611) ERROR XSDG2: Invalid checksum on Page occurs during mass inserts into two-column bigint PK table
Date Thu, 10 Apr 2008 01:26:04 GMT
ERROR XSDG2: Invalid checksum on Page occurs during mass inserts into two-column bigint PK
table
------------------------------------------------------------------------------------------------

                 Key: DERBY-3611
                 URL: https://issues.apache.org/jira/browse/DERBY-3611
             Project: Derby
          Issue Type: Bug
          Components: Store
    Affects Versions: 10.3.2.1, 10.3.1.4
         Environment: Occurred on 6 separate quad-core machines running either Vista, Vista
SP1 and Server 2008.  Also seen on AMD64 dual core 4200 with 4 GB of ram running 32 bit XP
pro.
            Reporter: David Sitsky
            Priority: Critical
         Attachments: derby-worker0.log

The original extensive email thread reporting this issue can be seen from here: http://www.nabble.com/ERROR-XSDG2%3A-Invalid-checksum-on-Page-Page%280%2CContainer%280%2C-1313%29%29-td16389697.html.

I have an intensive data-processing application which utilises Apache Derby, using 6 quad-core
machines running Vista SP1 and/or Vista Server 2008.  Each quad-core machine typically runs
4 separate JVM worker processes, each running their own embedded derby database.

I have found after 5 or 10 hours of processing, one or a couple of my worker processes, start
reporting the following error in their derby.log file:

ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313)) 

The worker process never seems to recover.  Derby locates the error, reboots the database,
but seems to inevitably report the same error again.  I have tried both 10.3.1.4 and 10.3.2.1
with the same results.  The conglomerate and page number is always the same.

I know it is not a hardware issue, as this is across 6 separate machines, and it has happened
with software / hardware raid, and no disk errors have been reported.  A customer of our software
also reported this error occurring on their AMD64 dual core 4200 with 4 GB of ram running
32 bit XP pro.

The table the conglomerate refers to is as follows:

CREATE TABLE text_table (guidhigh BIGINT NOT NULL,
                         guid BIGINT NOT NULL,
                         data BLOB (1G) NOT NULL,
                         PRIMARY KEY (guidhigh, guid)) 

In this application, essentially random values for guidhigh and guid were being created, with
data being compressed text, that could range from anything from a few bytes to many megabytes
in size.

The processing code effectively did a select from the table on guidhigh and guid to check
if an entry exists, before inserting a new row within a transaction.

If I forceable shut the application down, I could connect to the database using ij, and would
get the same error:

ij> select count(*) from text_table;
ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313)), expected=304,608,373, on-disk
version=2,462,088,751, page dump follows: Hex dump:
00000000: 0076 0000 0001 0000 0000 0000 27ea 0000  .v..............
00000010: 0000 0006 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0001 0000 0000 0000 0000 0000  ................
.... 

A workaround which we managed to implement in our application, as suggested from derby-user
via Stanley Bradbury,  was to not have the PK during the load, which we managed to implement.
 We also replaced the two column PK with a single column and the problem has since never occurred.

I'll attach a number of example derby.log files which contain the error messages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message