db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Sitsky (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3611) ERROR XSDG2: Invalid checksum on Page occurs during mass inserts into two-column bigint PK table
Date Fri, 11 Apr 2008 03:50:05 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587837#action_12587837
] 

David Sitsky commented on DERBY-3611:
-------------------------------------

Hi Knut,

Something I didn't write in the description of this bug record but did write in my email is
unfortunately it is not easy to reproduce.  In a 24 hour run, where I had 22 individual JVM
processes running across 6 quad-core machines (one machine only runs two processes), sometimes
they would all run successfully, sometimes 1 or 2 processes may trigger the condition after
many hours.

Given that they are all independent processes, you can see it isn't that easy to reproduce
- 22 days of "serial processing time" may trigger the condition.

We have quite a lot of customers, but so far, only one has reported this issue to us.

Unfortunately, our 6 quad-core system is heavily used, so I may not be able to access it soon
for running this test, but will try to do so next week time permitting, but I can't promise
anything unfortunately.

I would actually recommend writing a small program with the table described in the bug report,
and just an endless loop where you create random numbers of the two guid columns, with some
random binary for the other.  We just had a transaction which did a select on the two guid
columns, and if it didn't exist, then did the insert, then the commit.  This is basically
what our application does.




> ERROR XSDG2: Invalid checksum on Page occurs during mass inserts into two-column bigint
PK table
> ------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3611
>                 URL: https://issues.apache.org/jira/browse/DERBY-3611
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.3.1.4, 10.3.2.1
>         Environment: Occurred on 6 separate quad-core machines running either Vista,
Vista SP1 and Server 2008.  Also seen on AMD64 dual core 4200 with 4 GB of ram running 32
bit XP pro.
>            Reporter: David Sitsky
>            Priority: Critical
>         Attachments: d3347-1a+2a.diff, derby-worker0.log, derby-worker3.log, derby-worker4.log
>
>
> The original extensive email thread reporting this issue can be seen from here: http://www.nabble.com/ERROR-XSDG2%3A-Invalid-checksum-on-Page-Page%280%2CContainer%280%2C-1313%29%29-td16389697.html.
> I have an intensive data-processing application which utilises Apache Derby, using 6
quad-core machines running Vista SP1 and/or Vista Server 2008.  Each quad-core machine typically
runs 4 separate JVM worker processes, each running their own embedded derby database.
> I have found after 5 or 10 hours of processing, one or a couple of my worker processes,
start reporting the following error in their derby.log file:
> ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313)) 
> The worker process never seems to recover.  Derby locates the error, reboots the database,
but seems to inevitably report the same error again.  I have tried both 10.3.1.4 and 10.3.2.1
with the same results.  The conglomerate and page number is always the same.
> I know it is not a hardware issue, as this is across 6 separate machines, and it has
happened with software / hardware raid, and no disk errors have been reported.  A customer
of our software also reported this error occurring on their AMD64 dual core 4200 with 4 GB
of ram running 32 bit XP pro.
> The table the conglomerate refers to is as follows:
> CREATE TABLE text_table (guidhigh BIGINT NOT NULL,
>                          guid BIGINT NOT NULL,
>                          data BLOB (1G) NOT NULL,
>                          PRIMARY KEY (guidhigh, guid)) 
> In this application, essentially random values for guidhigh and guid were being created,
with data being compressed text, that could range from anything from a few bytes to many megabytes
in size.
> The processing code effectively did a select from the table on guidhigh and guid to check
if an entry exists, before inserting a new row within a transaction.
> If I forceable shut the application down, I could connect to the database using ij, and
would get the same error:
> ij> select count(*) from text_table;
> ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313)), expected=304,608,373,
on-disk version=2,462,088,751, page dump follows: Hex dump:
> 00000000: 0076 0000 0001 0000 0000 0000 27ea 0000  .v..............
> 00000010: 0000 0006 0000 0000 0000 0000 0000 0000  ................
> 00000020: 0000 0000 0001 0000 0000 0000 0000 0000  ................
> .... 
> A workaround which we managed to implement in our application, as suggested from derby-user
via Stanley Bradbury,  was to not have the PK during the load, which we managed to implement.
 We also replaced the two column PK with a single column and the problem has since never occurred.
> I'll attach a number of example derby.log files which contain the error messages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message