db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-4252) intermittent corruption./java.io.EOFException in RandomAccessFile.readInt() on boot after compress with background checkpoint
Date Thu, 05 May 2011 16:33:03 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029417#comment-13029417
] 

Rick Hillegas commented on DERBY-4252:
--------------------------------------

I have run the log-file-inspecting tools against the files in the log directory of the bad
database attached to this issue. The log-file-inspecting tools can be found on DERBY-5195.
I can report the following facts about this data corruption:

1) The control files log.ctrl and logmirror.ctrl appear to be intact and identical.

2) There are 20 log files in the directory. The first 19 seem to be valid. At least, the LogFileReader
is able to successfully deserialize all of them.

3) The last log file is corrupt. It is only 17 bytes long so it doesn't even have a valid
header record. I get an EOF reading this file. The initial formatableID, obsoleteVersion,
and logFileNumber fields are correct but the previousRecordLogInstant is truncated. This is
what I see:

java LogFileReader wombat/log/log20.dat -n 0
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<logFile>
    <formatableID>128</formatableID>
    <obsoleteVersion>9</obsoleteVersion>
    <logFileNumber>20</logFileNumber>
    <message>java.io.EOFException: null</message>
    <stackTrace>java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readLong(DataInputStream.java:399)
	at LogFileReader$LogFile.readFileHeader(LogFileReader.java:250)
	at LogFileReader$LogFile.readFile(LogFileReader.java:216)
	at LogFileReader$LogFile.printMe(LogFileReader.java:199)
	at LogFileReader.main(LogFileReader.java:88)
</stackTrace>
</logFile>

4) I have noticed the following oddity, which may be a red herring: The last good log file
(log19.dat) ends with a flurry of CheckpointOperations. Each CheckpointOperation is separated
from the preceding CheckpointOperation by only a couple hundred bytes. In the other log files,
the spacing of CheckpointOperations is generally several tens of thousands of bytes.

I would be happy to continue exploring these files if someone would like to suggest what I
should look for. Alternatively, the LogFileReader is very easy to use and would be even more
powerful in the hands of a real Store expert.

> intermittent corruption./java.io.EOFException in RandomAccessFile.readInt() on boot after
compress with background checkpoint
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4252
>                 URL: https://issues.apache.org/jira/browse/DERBY-4252
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.6.1.0
>         Environment: Windows XP dualcore.
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build pwi3260sr4-20090219_01(SR4))
> IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows XP x86-32 jvmwi3260-20090215_29883
(JIT enabled, AOT enabled)
> J9VM - 20090215_029883_lHdSMr
> JIT  - r9_20090213_2028
> GC   - 20090213_AA)
> JCL  - 20090218_01
> Also tried with the latest development SR5 version and it reproduces.
>            Reporter: Kathey Marsden
>            Priority: Critical
>              Labels: derby_triage10_5_2
>         Attachments: corrupt_database_with_logs.zip, reproBackgroundCheckpoint.zip
>
>
> The attached reproduction reproBackgroundCheckpoint.zip occasionally results in an EOF
exception on boot with  IBM 1.6 (about 1 out of 200 runs) even after the fix for   DERBY-4239.
 The program inserts into /deletes some data from a table and runs compress, and as a daemon
thread that loops checkpoints.  I have not seen it with the Sun JVM.
> Before the fix for DERBY-4239, this issue was much more frequent.  Sometimes you would
get the DERBY-4239 trace and sometimes this one.  After the fix, this issue remains.  Mike
thought it looked like a different bug and asked that we file a separate issue.
> This is the trace:
> java.io.EOFException
> 	at java.io.RandomAccessFile.readInt(RandomAccessFile.java:739)
> 	at java.io.RandomAccessFile.readLong(RandomAccessFile.java:772)
> 	at org.apache.derby.impl.store.raw.log.Scan.getNextRecordForward(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.Scan.getNextRecord(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.FileLogger.redo(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source)
> 	at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
> 	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
> 	at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
> 	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
> 	at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
> 	at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
Source)
> 	at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
> 	at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
> 	at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
> 	at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
> 	at java.sql.DriverManager.getConnection(DriverManager.java:316)
> 	at java.sql.DriverManager.getConnection(DriverManager.java:273)
> 	at CheckTables.main(CheckTables.java:8)
> To reproduce, compile the java programs and run reprobckchkpt.ksh.  You may want to increase
the number of iterations and let it run overnight to see the failure or back out the fix for
DERBY-4239 to see it happen quicker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message