db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: 'Invalid checksum on Page' error
Date Tue, 13 Sep 2005 18:11:03 GMT
sorry for the delay, went away for the weekend.

I am not sure what made me ask, but I am glad I did - there
is VERY little testing of derby on compact flash devices.  I
am not sure how appropriate they are for serious write
applications - any opinions out there?  The specs on these
things are all over the map, can you post the exact model
you are using?

You didn't say how big the db was, could you post that.  I think
if it is withing the JIRA limitations the best place to post the
db would be to file a JIRA entry and attach the db to it.  That way
anyone in derby can see it.

I don't have much experience with flash cards and derby, and it
is of course very hardware dependent.   Anyone who understands the
hardware better please correct me.   I found the following on
a sandisk compact flash and I don't understand most of it
(http://www.sandisk.com/pdf/oem/WPaperWearLevelv1.0.pdf) - but
what I do get is that they seem to be using 2,000,000 updates as
an expected failure boundary (and I think that is for a very
good compact flash - I think I have seen others that only
guarantee 10,000 writes).  So given that you are doing 3.5 million
inserts it may be that the hardware just is not going to be
able to support your application.

Now the actual number of I/O's to the disk is dependent on your
application: (size of buffer cache, size of rows, number of
checkpoints).

Any compact flash experts out there, is it possible (maybe even
likely), that after some minimum number of writes, a write to the
card could mess up a bit but not log any error on the write or
on the subsequent read.

Would you be able to recognize a bad bit in the data returned from
this page?  For instance on glancing at the page dump it looked like
the test char data was the same for every row.  If we get the
database we can probably hack a version that ignores the checksum
and returns the data, if the metadata on the page is all good then
it would be interesting to know if the data is all good.

Also could you post the create statement you used for the table.

I am not sure why but your property did not affect the table, as
the table is defaulting to 32k.  Some reasons could be that property
was not there when you created the table, or some error in the property
file or the property file not being found.


SBarboza@ILSTechnology.com wrote:

> The device is using a Compact Flash card as its persistent storage.
> The derby.properties file entries are as follows:
> 
> derby.storage.pageSize=8192
> derby.storage.tempDirectory=/root/xqjava/dbtmp
> 
> 
> The database is not encrypted.
> When loading the DB  there were no crashes.
> 
> I think we can provide you with the database image. Where should we send it
> to ?
>  I'd hate to clog up the discussion group with the image.
> 
> Thanks
> Sunil
> 
> 
> 
> |---------+---------------------------->
> |         |           Mike Matrigali   |
> |         |           <mikem_app@sbcglo|
> |         |           bal.net>         |
> |         |                            |
> |         |           09/08/2005 01:15 |
> |         |           PM               |
> |         |           Please respond to|
> |         |           "Derby           |
> |         |           Discussion"      |
> |---------+---------------------------->
>   >---------------------------------------------------------------------------------------------------------------------------------------------|
>   |                                                                                 
                                                           |
>   |       To:       Derby Discussion <derby-user@db.apache.org>               
                                                                 |
>   |       cc:                                                                       
                                                           |
>   |       Subject:  Re: 'Invalid checksum on Page' error                            
                                                           |
>   >---------------------------------------------------------------------------------------------------------------------------------------------|
> 
> 
> 
> 
> thanks for the more info.  Definitely interested if you can reproduce
> on different device.  I did a quick look at the page dump and on the
> surface nothing jumped out, the ascii dump of the data looks reasonable,
> there is a set of 0's in the middle as expected with a set of what looks
> like a reasonable page offset table at the end, the last page offset
> points at what looks like the last record.  Next step is to decode
> the actual values in stuff like the page hdrs, see if the zero's in the
> middle are right or if there is missing data pointed to by the offset
> table.
> 
> 
> Some more questions:
> o what kind of device was this error on (ie. IDE, SCSI, flash card, ...)
> o were you setting any non-default derby properties?
> o was this database encrypted?
> o When you were loading the db was there any crash encountered?
> 
> When you try to reproduce could you set the following property so that
> derby.log will have a complete record of any errors, by default it gets
> overwritten every time:
> http://db.apache.org/derby/docs/10.1/tuning/rtunproper13217.html
> 
> If the data in your db is not sensitive would you be willing to provide
> it.  I realize it is probably very big, so I am not sure the best way.
> Derby db's do tend to compress well using standard zip.
> 
> SBarboza@ILSTechnology.com wrote:
> 
> 
>>The error is always on the same page ( 10031 ).
>>I ran the SYSCS_CHECK_TABLE command and I get the same error displayed
>>about the page checksum error
>> that is listed in the derby.log.
>>I took a look at the OS logs but there was nothing that would indicate a
> 
> IO
> 
>>failure.
>>I am attaching the derby.log file.
>>
>>(See attached file: derby.log)
>>
>>I will run this scenario on several devices to try to recreate the
> 
> problem.
> 
>>
>>
>>
>>|---------+---------------------------->
>>|         |           Mike Matrigali   |
>>|         |           <mikem_app@sbcglo|
>>|         |           bal.net>         |
>>|         |                            |
>>|         |           09/07/2005 12:47 |
>>|         |           PM               |
>>|         |           Please respond to|
>>|         |           "Derby           |
>>|         |           Discussion"      |
>>|---------+---------------------------->
>>  >
> 
> ---------------------------------------------------------------------------------------------------------------------------------------------|
> 
> 
>>  |
> 
> |
> 
>>  |       To:       Derby Discussion <derby-user@db.apache.org>
> 
> |
> 
>>  |       cc:
> 
> |
> 
>>  |       Subject:  Re: 'Invalid checksum on Page' error
> 
> |
> 
>>  >
> 
> ---------------------------------------------------------------------------------------------------------------------------------------------|
> 
> 
>>
>>
>>
>>The most usual case that causes a bad checksum error is a
>>hardware problem on the data disk.  I have also seen OS I/O issues
>>where for some reason other data has been written into the derby
>>file.  Have you checked the OS log
>>to see if any errors are being generated?  Could you attach the
>>complete derby.log if it is not too big? Or if not could you at
>>least attach the complete error from this particular error - most
>>of the time the page dump won't help much but sometimes it is
>>interesting if there is something like all 0's in the end of
>>the page.
>>
>>It sounds like this problem on the disk and not a runtime error
>>from your description.  The current error is reporting an error
>>on page 10031, are all the errors you are seeing on the same page?
>>Running the following will check your table, and should report
>>the same error as encountered below if the problem is a persistent
>>on disk error:
>>http://db.apache.org/derby/docs/10.1/ref/rrefsyscschecktablefunc.html
>>http://db.apache.org/derby/docs/10.1/adminguide/cadminconsist01.html
>>
>>The only supported way to recover from this error to apply a backup
>>if you have one, and if it was a roll forward backup it will bring
>>the database up to the current state.
>>
>>SBarboza@ILSTechnology.com wrote:
>>
>>
>>>Hi,
>>>       I have apache derby 10.0 running on a MontaVista Linux system
>>
>>(3.1
>>
>>
>>>Professional with Linux/i686 2.4.20)  in embedded mode using the
>>>EmbeddedConnectionPoolDataSource.
>>>The java level is Sun's jre 1.4.2_04.
>>>There are around 3.5 million records in a table in the DB.
>>>While adding the records I had  one thread inserting rows into this table
>>>at a rate of around 50 msecs.
>>>Another thread is periodically doing selects on this table and some
>>>deletes.
>>>
>>>When the record count was building up to 3.5 million , no deletes were
>>>being done on the table.
>>>I have the transaction log and db temp space in a different directory.
>>>
>>>When a thread attempts to delete a record from the table , it catches a
>>>SQLException with the following error message
>>>
>>>SQLError:0              SQLState:XJ001           SQLErrMsg:Java
>>
>>exception:
>>
>>
>>>': java.lang.NullPointerException'.
>>>
>>>
>>>The derby.log file (at the end of this posting )  indicates an invalid
>>>checksum on a page . I have only included the first few lines.
>>>This may have occurred when i was selecting data from the database.
>>>
>>>If I restart the application, I sometimes get the same SQLException on
>>
>>the
>>
>>
>>>thread that is inserting data , after a few succesful inserts.
>>>
>>>When I run the command line client (ij), I am able to select and delete
>>>records from this database.
>>>
>>>What would typically cause a checksum error to occur ? Is there a way to
>>>recover from it without losing data ?
>>>
>>>
>>>====================  Begin derby.log
>>>========================================================================
>>>------------  BEGIN SHUTDOWN ERROR STACK -------------
>>>
>>>ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
>>>expected=3,
>>>558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
>>>00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
>>>00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............
>>>
>>>
>>>The trailing stack trace is as follows:
>>>
>>>       at
>>>org.apache.derby.iapi.error.StandardException.newException(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
> 
>>>       at
>>>org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
>>>Source)
>>>       at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
>>>Source)
>>>       at org.apache.derby.impl.services.cache.Clock.find(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
>>>Source)
>>>       at
>>>org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown
> 
> 
>>
>>>Source)
>>>       at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
>>>Source)
>>>       at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
>>>      ......
>>>
>>>------------  END SHUTDOWN ERROR STACK -------------
>>>
>>>2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
>>>action starting
>>>2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
>>>Statement is: INSERT
>>>INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
>>>java.lang.NullPointerException
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>
>>org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
>>
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown
> 
> 
>>
>>>Source)
>>>       at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
> 
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
>>>Source)
>>>       at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
> 
>>>Source)
>>>       at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
>>>Source)
>>>       at
>>>org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>
>>
>>
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
> 
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
> 
>>>Source)
>>>       at
>>>org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
>>>........
>>>Cleanup action completed
>>>
>>>==================== derby.log
>>>========================================================================
>>>
>>>
>>>Thanks in advance.
>>>Sunil.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
> 
> 
> 
> 
> 
> 

Mime
View raw message