Return-Path: X-Original-To: apmail-activemq-dev-archive@www.apache.org Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9BB30793D for ; Mon, 19 Dec 2011 18:17:58 +0000 (UTC) Received: (qmail 64031 invoked by uid 500); 19 Dec 2011 18:17:58 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 63991 invoked by uid 500); 19 Dec 2011 18:17:58 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 63983 invoked by uid 99); 19 Dec 2011 18:17:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Dec 2011 18:17:58 +0000 X-ASF-Spam-Status: No, hits=-2002.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Dec 2011 18:17:53 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6555E11B7BE for ; Mon, 19 Dec 2011 18:17:31 +0000 (UTC) Date: Mon, 19 Dec 2011 18:17:31 +0000 (UTC) From: "Gary Tully (Updated) (JIRA)" To: dev@activemq.apache.org Message-ID: <1778310339.27201.1324318651416.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (AMQ-2595) NFS problems can cause broker to hang on shutdown requiring a hard kill MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/AMQ-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Tully updated AMQ-2595: ---------------------------- Fix Version/s: (was: 5.x) > NFS problems can cause broker to hang on shutdown requiring a hard kill > ------------------------------------------------------------------------ > > Key: AMQ-2595 > URL: https://issues.apache.org/jira/browse/AMQ-2595 > Project: ActiveMQ > Issue Type: New Feature > Components: Broker > Affects Versions: 5.3.0 > Reporter: Bruce Snyder > Fix For: 5.4.3 > > > When the ActiveMQ data directory exists on a NFS mounted volume and that volume experiences some type of failure, it can cause the broker to hang and only a manual kill will shut it down. Below is a sequence of errors demonstrating this: > {code} > ERROR | Failed to fill batch > java.io.IOException: I/O error > at java.io.RandomAccessFile.readBytes(Native Method) > at java.io.RandomAccessFile.read(RandomAccessFile.java:322) > at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) > at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361) > at org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:90) > at org.apache.kahadb.journal.Journal.read(Journal.java:573) > at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:656) > at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:534) > at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:229) > at org.apache.kahadb.page.Transaction.execute(Transaction.java:728) > at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:222) > at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:81) > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:227) > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:99) > at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157) > at org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1363) > at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1503) > at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1178) > at org.apache.activemq.thread.DeterministicTaskRunner.runTask(DeterministicTaskRunner.java:84) > at org.apache.activemq.thread.DeterministicTaskRunner$1.run(DeterministicTaskRunner.java:41) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > ERROR | Failed to fill batch > java.lang.RuntimeException: java.io.IOException: I/O error > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:230) > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:99) > at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157) > at org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1363) > at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1503) > at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1178) > at org.apache.activemq.thread.DeterministicTaskRunner.runTask(DeterministicTaskRunner.java:84) > at org.apache.activemq.thread.DeterministicTaskRunner$1.run(DeterministicTaskRunner.java:41) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.IOException: I/O error > at java.io.RandomAccessFile.readBytes(Native Method) > at java.io.RandomAccessFile.read(RandomAccessFile.java:322) > at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) > at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361) > at org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:90) > at org.apache.kahadb.journal.Journal.read(Journal.java:573) > at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:656) > at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:534) > at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:229) > at org.apache.kahadb.page.Transaction.execute(Transaction.java:728) > at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:222) > at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:81) > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:227) > ... 10 more > ERROR | Failed to page in more queue messages > java.lang.RuntimeException: java.lang.RuntimeException: > java.io.IOException: I/O error > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:102) > at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157) > at org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1363) > at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1503) > at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1178) > at org.apache.activemq.thread.DeterministicTaskRunner.runTask(DeterministicTaskRunner.java:84) > at org.apache.activemq.thread.DeterministicTaskRunner$1.run(DeterministicTaskRunner.java:41) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.RuntimeException: java.io.IOException: I/O error > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:230) > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:99) > ... 9 more > Caused by: java.io.IOException: I/O error > at java.io.RandomAccessFile.readBytes(Native Method) > at java.io.RandomAccessFile.read(RandomAccessFile.java:322) > at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) > at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361) > at org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:90) > at org.apache.kahadb.journal.Journal.read(Journal.java:573) > at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:656) > at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:534) > at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:229) > at org.apache.kahadb.page.Transaction.execute(Transaction.java:728) > at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:222) > at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:81) > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:227) > ... 10 more > java.io.IOException: I/O error > at java.io.RandomAccessFile.write(Native Method) > at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:967) > at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:974) > at org.apache.kahadb.page.PageFile.flush(PageFile.java:443) > at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:947) > at org.apache.activemq.store.kahadb.MessageDatabase.access$200(MessageDatabase.java:79) > at org.apache.activemq.store.kahadb.MessageDatabase$9.execute(MessageDatabase.java:587) > at org.apache.kahadb.page.Transaction.execute(Transaction.java:728) > at org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:585) > at org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:256) > Exception in thread "ActiveMQ Journal Checkpoint Worker" > java.lang.NullPointerException > at java.util.zip.Adler32.update(Adler32.java:44) > at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:962) > at org.apache.kahadb.page.PageFile.flush(PageFile.java:443) > at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:947) > at org.apache.activemq.store.kahadb.MessageDatabase.access$200(MessageDatabase.java:79) > at org.apache.activemq.store.kahadb.MessageDatabase$9.execute(MessageDatabase.java:587) > at org.apache.kahadb.page.Transaction.execute(Transaction.java:728) > at org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:585) > at org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:256) > ^C INFO | ActiveMQ Message Broker (dynamic-broker1, > ID:tssol10s-32974-1265131581698-0:0) is shutting down > INFO | Connector vm://dynamic-broker1 Stopped > INFO | dynamic-broker1 bridge to dynamic-broker2 stopped > INFO | Network Connector localhost Stopped > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown. > {code} > In this case, a problem occurred with the NFS mount point where the data directory lives which resulted in the I/O error. After sending a ctrl-c to shut down the broker, it would not shut down on it's own so it had to be killed. Only then could failover take place allowing another broker could grab the lock. > Given that the broker cannot recover from such a problem, it should probably force itself to shut down so that failover can take place. Also, such a failure should be a FATAL and not an ERROR in the logging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira