Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 89482 invoked from network); 26 May 2009 19:44:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 May 2009 19:44:58 -0000 Received: (qmail 27556 invoked by uid 500); 26 May 2009 19:45:10 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 27518 invoked by uid 500); 26 May 2009 19:45:10 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 27510 invoked by uid 99); 26 May 2009 19:45:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2009 19:45:10 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2009 19:45:06 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A4F82234C056 for ; Tue, 26 May 2009 12:44:45 -0700 (PDT) Message-ID: <1383018285.1243367085674.JavaMail.jira@brutus> Date: Tue, 26 May 2009 12:44:45 -0700 (PDT) From: "Mike Matrigali (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Updated: (DERBY-4239) corruption on z/OS with storerecovery oc_rec? tests. ERROR XSLA7: Cannot redo operation null in the log. In-Reply-To: <947613656.1242862665627.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Matrigali updated DERBY-4239: ---------------------------------- Attachment: derby-4239_1.diff Preliminary patch for this issue. I have not run full tests yet, but would like feedback from anyone who could reproduce the original error - I have not actually reproduced it myself. This patch only includes code changes, no new tests. The fix is to add interfaces that allow compress table to tell the underlying store that it needs a new checkpoint and needs to wait until that checkpoint has made it into the log before proceeding with the operation which will shrink the file destroying pages that may otherwise participate in redo recovery. I have only altered the behavior for the compress operation and left all other checkpoint() calling paths the same, but reading some comments while looking at the code makes me concerned that some of the backup code and backup for encryption code may have also have problems with an ongoing checkpoint. But would rather address those problems if they exist in another issue. > corruption on z/OS with storerecovery oc_rec? tests. ERROR XSLA7: Cannot redo operation null in the log. > --------------------------------------------------------------------------------------------------------- > > Key: DERBY-4239 > URL: https://issues.apache.org/jira/browse/DERBY-4239 > Project: Derby > Issue Type: Bug > Components: Store > Affects Versions: 10.1.3.3, 10.2.2.1, 10.3.2.1, 10.4.2.0, 10.5.1.1, 10.6.0.0 > Environment: z/OS z10 processor. > java version "1.6.0" > Java(TM) SE Runtime Environment (build pmz3160sr4-20090219_01(SR4)) > IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 z/OS s390-31 jvmmz3160-20090215_29883 (JIT enabled, AOT enabled) > J9VM - 20090215_029883_bHdSMr > JIT - r9_20090213_2028 > GC - 20090213_AA) > JCL - 20090218_01 > also > java version "1.6.0" > Java(TM) SE Runtime Environment (build pmz3160sr2ifix-20081021_01(SR2+IZ32776+IZ33456)) > IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 z/OS s390-31 jvmmz3160ifx-20081010_24288 (JIT enabled, AOT enabled) > J9VM - 20081009_024288_bHdSMr > JIT - r9_20080721_1330ifx2 > GC - 20080724_AA) > JCL - 20080808_02 > Reporter: Kathey Marsden > Assignee: Mike Matrigali > Priority: Critical > Attachments: badlogsizes.txt, derby-4239_1.diff, derby.log, derby.log, derby_dumponly.zip, goodlogsizes.txt, identifyBadContainer.ksh, reproBackgroundCheckpoint.zip, reproDerby4239.zip, wombat_keeplog_notcorrupt.zip, wombat_with_keeplog.zip > > > I saw corruption on z/OS with the storerecovery tests and 10.5.1.1. The failure comes in oc_rec3 trying to connect to the database, but the actual problem seems to have occurred with the prior test oc_rec2. The problem is somewhat intermittent, happening approximately 1/4 times. I extracted the case from the harness and will attach the reproduction and run the script repro.ksh. The script will loop up to 50 times until it gets the failure which looks like. > ERROR XSLA7: Cannot redo operation null in the log. > at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) > at org.apache.derby.impl.store.raw.log.FileLogger.redo(Unknown Source) > at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source) > at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) > at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) > at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) > at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source) > at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source) > at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source) > at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown Source) > at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source) > at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source) > at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source) > at java.sql.DriverManager.getConnection(DriverManager.java:311) > at java.sql.DriverManager.getConnection(DriverManager.java:268) > at CheckTables.main(CheckTables.java:8) > Caused by: ERROR XSDBB: Unknown page format at page Page(16,Container(0, 1073)), page dump follows: Hex dump: > 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > > I ran it with 10.3 and it completed all 50 iterations, so whether JVM or Derby issue it seems new since 10.3. (I haven't tried with 10.4). Oddly I have run tests many times before on this machine using in the 10.5.1.1 release and the same jvm and have never seen this failure, so am looking into whether maybe something changed on the machine or environment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.