db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4239) corruption on z/OS with storerecovery oc_rec? tests. ERROR XSLA7: Cannot redo operation null in the log.
Date Thu, 21 May 2009 21:55:45 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Matrigali updated DERBY-4239:
----------------------------------

    Attachment: derby.log

Kathey asked that I log suggestions about what to look at in corrupt db's, so logging them
here.  I don't have any conclusions yet what is happening, just
posting info as I see it.

The information I am posting is from looking at the wombat_with_keeplog.zip 
posted to this issue.  This is a good one to look at as kathey has managed to 
reproduce and had it set up so that all transaction logs are kept.   This means that
it is possible to read through these records and get an exact history of all "writing"
operations.   There a couple of ways to do this.  With a sane server you can set 
derby.storage.keepTransactionLog=true (i am not sure if this works with insane 
server or not).  A supported way of almost doing this is to take a online backup, 
which I believe will then stop logs from being deleted - but note that online backup
may change the behavior of some operations which are documented in the backup
docs.


In this case we are debugging a reproducible boot error, so just starting up 
reproduces the error.   First I set the following which will only work in a SANE
server.:
derby.debug.true=LogTrace
This will dump a short description of each log record that is processed.  For 
recovery this will include all the records that are read at boot time.  Note this 
will not be all the records in the log, it will just dump out the ones looked at during
normal reboot.

The first thing I look for is the first occurence of the error and if it is nested the lowest
nested error.  In this case I get the following.  From this I get that recovery
reboot is trying to redo a delete on page 16 of container 1073.  The page version
is 775 so there have been 775 updates to this page before this.  The instant
is the basically the log record address (6,447354) - the first part 6 means it is
in log6.dat, second part is the byte offset into log6.dat:
Page(16,Container(0, 1073)) pageVersion 775 :  Delete : Slot=1 recordId=7 delete=true instant
=
 (6,447354) logEnd = (6,447413) logIn at 25 available 18^M

>From the stack the operation is reading the page in from disk and getting a page
of all zero's when it expects to get a formated page at pageVersion 774.

snip from log:
DEBUG LogTrace OUTPUT: scanned 64300 : Page Operation: Page(17,Container(0, 1041)) pageVersion
776 :  Delete : Slot=1 recordId=7 delete=true instant =
 (6,447236) logEnd = (6,447295) logIn at 25 available 18^M
DEBUG LogTrace OUTPUT: scanned 64300 : Page Operation: Page(16,Container(0, 1057)) pageVersion
775 :  Delete : Slot=1 recordId=7 delete=true instant =
 (6,447295) logEnd = (6,447354) logIn at 25 available 18^M
DEBUG LogTrace OUTPUT: scanned 64300 : Page Operation: Page(16,Container(0, 1073)) pageVersion
775 :  Delete : Slot=1 recordId=7 delete=true instant =
 (6,447354) logEnd = (6,447413) logIn at 25 available 18^M

------------  BEGIN SHUTDOWN ERROR STACK -------------
^M
ERROR XSLA7: Cannot redo operation Page Operation: Page(16,Container(0, 1073)) pageVersion
775 :  Delete : Slot=1 recordId=7 delete=true in the log.^M
    at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:296)^M
    at org.apache.derby.impl.store.raw.log.FileLogger.redo(FileLogger.java:1525)^M
    at org.apache.derby.impl.store.raw.log.LogToFile.recover(LogToFile.java:924)^M
    at org.apache.derby.impl.store.raw.RawStore.boot(RawStore.java:339)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:2021)^M
    at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:573)^M
    at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427)^M
    at org.apache.derby.impl.store.access.RAMAccessManager.boot(RAMAccessManager.java:1019)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:2021)^M
    at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:573)^M
    at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427)^M
    at org.apache.derby.impl.db.BasicDatabase.bootStore(BasicDatabase.java:780)^M
    at org.apache.derby.impl.db.BasicDatabase.boot(BasicDatabase.java:196)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:2021)^M
    at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(BaseMonitor.java:1858)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(BaseMonitor.java:1724)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(BaseMonitor.java:1602)^M
    at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(BaseMonitor.java:1021)^M
    at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Monitor.java:550)^M
    at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(EmbedConnection.java:2581)^M
    at org.apache.derby.impl.jdbc.EmbedConnection.<init>(EmbedConnection.java:374)^M
    at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Driver40.java:68)^M
    at org.apache.derby.jdbc.InternalDriver.connect(InternalDriver.java:238)^M
    at org.apache.derby.jdbc.AutoloadedDriver.connect(AutoloadedDriver.java:119)^M
    at java.sql.DriverManager.getConnection(DriverManager.java:316)^M
    at java.sql.DriverManager.getConnection(DriverManager.java:297)^M
    at org.apache.derby.impl.tools.ij.util.startJBMS(util.java:462)^M
    at org.apache.derby.impl.tools.ij.util.startJBMS(util.java:542)^M
    at org.apache.derby.impl.tools.ij.ConnectionEnv.init(ConnectionEnv.java:64)^M
    at org.apache.derby.impl.tools.ij.utilMain.initFromEnvironment(utilMain.java:164)^M
    at org.apache.derby.impl.tools.ij.Main.<init>(Main.java:225)^M
    at org.apache.derby.impl.tools.ij.Main.getMain(Main.java:189)^M
    at org.apache.derby.impl.tools.ij.Main.mainCore(Main.java:174)^M
    at org.apache.derby.impl.tools.ij.Main.main(Main.java:73)^M
    at org.apache.derby.tools.ij.main(ij.java:59)^M
Caused by: ERROR XSDBB: Unknown page format at page Page(16,Container(0, 1073)), page dump
follows: Hex dump:
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
... complete page dump of a page of all ZERO's


> corruption on z/OS with storerecovery oc_rec? tests.  ERROR XSLA7: Cannot redo operation
null in the log.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4239
>                 URL: https://issues.apache.org/jira/browse/DERBY-4239
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.5.1.1
>         Environment: z/OS z10 processor. 
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build pmz3160sr4-20090219_01(SR4))
> IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 z/OS s390-31 jvmmz3160-20090215_29883 (JIT
enabled, AOT enabled)
> J9VM - 20090215_029883_bHdSMr
> JIT  - r9_20090213_2028
> GC   - 20090213_AA)
> JCL  - 20090218_01
> also 
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build pmz3160sr2ifix-20081021_01(SR2+IZ32776+IZ33456))
> IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 z/OS s390-31 jvmmz3160ifx-20081010_24288
(JIT enabled, AOT enabled)
> J9VM - 20081009_024288_bHdSMr
> JIT  - r9_20080721_1330ifx2
> GC   - 20080724_AA)
> JCL  - 20080808_02
>            Reporter: Kathey Marsden
>            Priority: Critical
>         Attachments: derby.log, derby.log, reproDerby4239.zip, wombat_with_keeplog.zip
>
>
> I saw corruption on z/OS with the storerecovery tests and 10.5.1.1.  The failure comes
in oc_rec3 trying to connect to the database, but the actual problem seems to have occurred
with the prior test oc_rec2.  The problem is somewhat intermittent, happening approximately
1/4 times.  I extracted the case from the harness and will attach the reproduction and run
the script repro.ksh.  The script will loop up to 50 times until it gets the failure which
looks like.
> ERROR XSLA7: Cannot redo operation null in the log.
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.FileLogger.redo(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source)
> 	at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
> 	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
> 	at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
> 	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
> 	at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
> 	at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
Source)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
Source)
> 	at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
> 	at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
> 	at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
> 	at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
> 	at java.sql.DriverManager.getConnection(DriverManager.java:311)
> 	at java.sql.DriverManager.getConnection(DriverManager.java:268)
> 	at CheckTables.main(CheckTables.java:8)
> Caused by: ERROR XSDBB: Unknown page format at page Page(16,Container(0, 1073)), page
dump follows: Hex dump:
> 00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> <snip lots of 000's>
> I ran it with 10.3 and it completed all 50 iterations, so whether JVM or Derby issue
it seems new since 10.3. (I haven't tried with 10.4).  Oddly I have run tests many times before
on this machine using in the 10.5.1.1 release and the same jvm and have never seen this failure,
so am looking into whether maybe something changed on the machine or environment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message