db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3719) '...replication.buffer.LogBufferFullException' causes failover to fail w/ 'XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode.'
Date Sat, 16 May 2009 18:37:45 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dag H. Wanvik updated DERBY-3719:
---------------------------------

    Attachment: derby-3719-1.diff

Uploading patch derby-3719-1. This moves the actual log send to inside
the forceFlushSemaphore monitor. The effect of this in the failure
scenario, (that is, if the user thread called forceFlush after a send
has been initiated), is to hold back the user thread doing forceFlush
till the log shipper thread has finished its send. That way, when the
forceFlush will not return until at least (*) ONE sending operation
has been initiated and completed, ensuring that at least one buffer
has been returned to the free pool. This in turn leads the 2nd attempt
(after receiving a LogBufferFullException) to be able to append the
log in MasterController.appendLog.

(*) the log shipper thread could possibly race past the user thread
and send more than once, but that would not be harmful because the
sending thread would ultimately call notify again, allowing the user
thread to continue and find free buffers.

This trace fragment from db_master/derby.log (with patch of trace patch applied) of the master
in a tight spot (when running ReplicationRun_Local_StateTest_part1), shows 
how the sequence of events change with the patch:

@1242436667589 Sending
@1242436667590 Sending done
@1242436667590 ship sleep 100
@1242436667667 >= FI_HIGH
@1242436667668 Sending
@1242436667668 >= FI_HIGH
@1242436667670 >= FI_HIGH
@1242436667673 >= FI_HIGH
@1242436667676 >= FI_HIGH
@1242436667679 log buffer full, try to force flush
@1242436667679 forceflush
@1242436667695 Sending done
@1242436667696 Sending
@1242436667696 Sending done
@1242436667696 Sending

Sending takes somewhat long here, (7695 ms - 7668 ms = 27ms) and the
user thread finds few free buffers left, and then finally none and
goes on to force a flush. But with the patch, the call to forceflush
at instant 7679 must wait till the shiper thread's send is done; at
instant 7695. Since the shipper thread has held the monitor since
before the instant it released a free buffer (implying a the user
thread can not have been able to grab it yet!), by the time the user
thread gets the monitor on forceFlushSemaphore, the sender is done,
and a free buffer is guaranteed to have been returned to the pool.

I have run ReplicationRun_Local_StateTest_part1 now for 24 hours
without seeing a problem with it (ca 350 runs). Running full
regressions now.

Ready for review.


> '...replication.buffer.LogBufferFullException' causes failover to fail w/ 'XRE07, SQLERRMC:
Could not perform operation because the database is not in replication master mode.'
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3719
>                 URL: https://issues.apache.org/jira/browse/DERBY-3719
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.2.0, 10.5.1.1
>         Environment: HW: 2 X i86pc i386 (AMD Opteron(tm) Processor 252): 2593 MHz, unknown
cache. 3968 Megabytes Total Memory.
> OS: Solaris 10 5/08 s10x_u5wos_10 X86 64bits - SunOS 5.10 Generic_127128-11
> JVM: Sun Microsystems Inc.
>     java version "1.6.0_06"
>     Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
>     Java HotSpot(TM) Client VM (build 10.0-b22, mixed mode)
>            Reporter: Ole Solberg
>         Attachments: 12.tar.gz, derby-3719-1.diff, traceLogShipping.diff, traceLogShipping.stat
>
>
> With the patch for DERBY-3709, derby-3709_p1-v2.diff.txt,  I was able to provoke this
error twice in 30 test runs on this platform (On another platform I saw none in 100 test runs.)
> I will upload the full test run log dir.
> "Summary":
> 1) testReplication_Local_StateTest_part2(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part2)junit.framework.ComparisonFailure:
Unexpected SQL state. expected:<XRE[20]> but was:<XRE[07]>
> Master derby.log:
> -----------------------------------------
> ----  BEGIN REPLICATION ERROR MESSAGE (6/10/08 4:08 PM) ----
> Exception occurred during log shipping.
> org.apache.derby.impl.store.replication.buffer.LogBufferFullException
> 	at org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.switchDirtyBuffer(ReplicationLogBuffer.java:357)
> 	at org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.appendLog(ReplicationLogBuffer.java:146)
> 	at org.apache.derby.impl.store.replication.master.MasterController.appendLog(MasterController.java:428)
> 	at org.apache.derby.impl.store.raw.log.LogAccessFile.writeToLog(LogAccessFile.java:787)
> 	at org.apache.derby.impl.store.raw.log.LogAccessFile.flushDirtyBuffers(LogAccessFile.java:534)
> 	at org.apache.derby.impl.store.raw.log.LogAccessFile.flushLogAccessFile(LogAccessFile.java:574)
> 	at org.apache.derby.impl.store.raw.log.LogAccessFile.writeLogRecord(LogAccessFile.java:332)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.appendLogRecord(LogToFile.java:3759)
> 	at org.apache.derby.impl.store.raw.log.FileLogger.logAndDo(FileLogger.java:370)
> 	at org.apache.derby.impl.store.raw.xact.Xact.logAndDo(Xact.java:1193)
> 	at org.apache.derby.impl.store.raw.data.LoggableActions.doAction(LoggableActions.java:221)
> 	at org.apache.derby.impl.store.raw.data.LoggableActions.actionUpdate(LoggableActions.java:85)
> 	at org.apache.derby.impl.store.raw.data.StoredPage.doUpdateAtSlot(StoredPage.java:8463)
> 	at org.apache.derby.impl.store.raw.data.StoredPage.updateOverflowDetails(StoredPage.java:8336)
> 	at org.apache.derby.impl.store.raw.data.StoredPage.updateOverflowDetails(StoredPage.java:8319)
> 	at org.apache.derby.impl.store.raw.data.BasePage.insertAllowOverflow(BasePage.java:808)
> 	at org.apache.derby.impl.store.raw.data.BasePage.insert(BasePage.java:653)
> 	at org.apache.derby.impl.store.access.heap.HeapController.doInsert(HeapController.java:307)
> 	at org.apache.derby.impl.store.access.heap.HeapController.insert(HeapController.java:575)
> 	at org.apache.derby.impl.sql.execute.RowChangerImpl.insertRow(RowChangerImpl.java:457)
> 	at org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(InsertResultSet.java:1011)
> 	at org.apache.derby.impl.sql.execute.InsertResultSet.open(InsertResultSet.java:487)
> 	at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:384)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1652)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(EmbedPreparedStatement.java:1307)
> 	at org.apache.derby.impl.drda.DRDAStatement.execute(DRDAStatement.java:672)
> 	at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTTobjects(DRDAConnThread.java:4197)
> 	at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTT(DRDAConnThread.java:4001)
> 	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:991)
> 	at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:278)
> --------------------  END REPLICATION ERROR MESSAGE ---------------------
> Slave derby.log:
> -------------------------------------------------------------------------------------------
> 2008-06-10 14:05:56.408 GMT Thread[DRDAConnThread_3,5,main] (DATABASE = /export/home/tmp/os136789/testingInMyDerbySandbox/12/db_slave/wombat),
(DRDAID = {2}), Replication slave mode started successfully for database '/export/home/tmp/os136789/testingInMyDerbySandbox/12/db_slave/wombat'.
Connection refused because the database is in replication slave mode. 
> Replication slave role was stopped for database '/export/home/tmp/os136789/testingInMyDerbySandbox/12/db_slave/wombat'.
> ------------  BEGIN SHUTDOWN ERROR STACK -------------
> ERROR XSLA7: Cannot redo operation null in the log.
> 	at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:296)
> 	at org.apache.derby.impl.store.raw.log.FileLogger.redo(FileLogger.java:1525)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.recover(LogToFile.java:920)
> 	at org.apache.derby.impl.store.raw.RawStore.boot(RawStore.java:334)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:1999)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:553)
> 	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427)
> 	at org.apache.derby.impl.store.access.RAMAccessManager.boot(RAMAccessManager.java:1019)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:1999)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:553)
> 	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427)
> 	at org.apache.derby.impl.db.BasicDatabase.bootStore(BasicDatabase.java:780)
> 	at org.apache.derby.impl.db.BasicDatabase.boot(BasicDatabase.java:196)
> 	at org.apache.derby.impl.db.SlaveDatabase.bootBasicDatabase(SlaveDatabase.java:424)
> 	at org.apache.derby.impl.db.SlaveDatabase.access$000(SlaveDatabase.java:70)
> 	at org.apache.derby.impl.db.SlaveDatabase$SlaveDatabaseBootThread.run(SlaveDatabase.java:311)
> 	at java.lang.Thread.run(Thread.java:619)
> Caused by: ERROR 08006: Database '{0}' shutdown.
> 	at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:276)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.stopReplicationSlaveRole(LogToFile.java:5142)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController.stopSlave(SlaveController.java:266)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController.access$500(SlaveController.java:64)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(SlaveController.java:531)
> ============= begin nested exception, level (1) ===========
> ERROR 08006: Database '{0}' shutdown.
> 	at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:276)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.stopReplicationSlaveRole(LogToFile.java:5142)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController.stopSlave(SlaveController.java:266)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController.access$500(SlaveController.java:64)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(SlaveController.java:531)
> ============= end nested exception, level (1) ===========
> ------------  END SHUTDOWN ERROR STACK -------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message