db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Sitsky <s...@nuix.com>
Subject Re: ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313))
Date Wed, 02 Apr 2008 05:19:02 GMT
> Hi Dave -
> The good news is that it looks like you do not have to recover this 
> database from backups because:
> 1)  since you can boot the database using IJ the corruption is not in 
> the transaction log (I assume you did not directly modify or delete any 
> files in the database directory tree in order to bypass recovery - *yes, 
> I have seen people do this*)

Correct.

> 2) since the corruption is in an index you should be able to drop and 
> recreate the index thus restoring the index without any data loss.
> 
> I also have an idea that might get around it.  Since the problem is with 
> an index and during a data load (I am assuming the load is a one time 
> database initialization streaming lots of data?) then I would try 

Yes.

> creating the index AFTER loading the data rather than declaring the 
> index in the table Create statement.  If the problem involves handling a 
> large number of database pages and the index pages at the same time 
> separating the activities will avoid it.  Doing this also insures that 
> index statistics are created (for more info on this see:  
> http://wiki.apache.org/db-derby/CheckingForIndexStatistics)

Ok - we are restructuring the application, so that now, we can get away 
with a primary key field of just an INTEGER, rather than the two BIGINT 
columns.  Also, because of the restructure, we don't need to check for 
the existence of a duplicate record before insertion, so we can now 
create the primary key once all of the data has been loaded.

I'm in the process of merging these changes in, and will try to do some 
more big loads to see if this has made the problem go away.  I'll let 
you know how I go in the next few days.

> As for the cause of the problem I can't be sure but since this happens 
> repeatedly and with different physical files it sounds like the problem 
> is not with the physical disk surface (I think the RAID architecture 
> helps protect against this type of failure anyway).

For these tests - I am just running software "scary raid" (raid-0), but 
the OS doesn't report any disk failures.

> I'd like to see this problem captured in an JIRA entry.  You have 
> provided a lot of good information here and the results of performing a 
> load then creating the Primary Key will narrow it down more.  My 
> gut-level feel is that reproducing this will be hard without an 
> environment similar to yours.  With the multiple processors and the RAID 
> array I imagine it is a pretty fast machine and this may be exposing a 
> thread synchronization issue in Derby (or not).  Having your information 
> and stack traces may give the stores folks enough information to perform 
> a code inspection for this or other issues.  Would you file a JIRA 
> describing your processing, the machine environment, the JVM and attach 
> your derby log files? 

Will do - once I have more information from the load with the modified 
application, I'll create the JIRA entry with all the relevant information.

Thanks so much for your advice.

-- 
Cheers,
David

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://www.nuix.com                            Fax: +61 2 9212 6902

Mime
View raw message