Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 2733 invoked from network); 24 Dec 2010 05:49:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Dec 2010 05:49:11 -0000 Received: (qmail 83753 invoked by uid 500); 24 Dec 2010 05:49:11 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 83635 invoked by uid 500); 24 Dec 2010 05:49:11 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 83624 invoked by uid 99); 24 Dec 2010 05:49:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Dec 2010 05:49:10 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Dec 2010 05:49:08 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oBO5mklx024372 for ; Fri, 24 Dec 2010 05:48:46 GMT Message-ID: <13119492.7801293169726705.JavaMail.jira@thor> Date: Fri, 24 Dec 2010 00:48:46 -0500 (EST) From: "Swapnonil Mukherjee (JIRA)" To: dev@activemq.apache.org Subject: [jira] Issue Comment Edited: (AMQ-3103) Queue stalls after Job Scheduler component shuts down. In-Reply-To: <23575293.286261293101043114.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/AMQ-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974842#action_12974842 ] Swapnonil Mukherjee edited comment on AMQ-3103 at 12/24/10 12:48 AM: --------------------------------------------------------------------- Hi Everybody, Another observation. This is one is funny. We are observing that the Job Scheduler component *shuts down everyday precisely at 01:48 hours GMT.* As mentioned earlier, the way we recover from these failures is by * Stopping the broker using $> ./activemq stop * Deleting the db.redo file from the kahadb directory * Restarting the broker using $> ./activemq start Are we missing doing something for the recovery? I am re-attaching another log file named activemq-1.log. This one is from our staging servers where you would observe that the Job Scheduler shuts itself precisely at 01:48 hours. was (Author: swapnonil): Hi Everybody, Another observation. This is one is funny. We are observing that the Job Scheduler component *shuts down everyday precisely at 01:48 hours GMT.* As mentioned earlier, the way we recover from these failures is by * Stopping the broker using $> ./activemq stop * Deleting the db.redo file from the kahadb directory * Restarting the broker using $> ./activemq start Are we missing doing something for the recovery? I am re-attaching another activemq.log file. This one is from our staging servers. > Queue stalls after Job Scheduler component shuts down. > ------------------------------------------------------ > > Key: AMQ-3103 > URL: https://issues.apache.org/jira/browse/AMQ-3103 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.4.2 > Environment: Redhat Enterprise Linux 5.X, > JDK 1.5 32-bit > JDK 1.6 64-bit > Reporter: Swapnonil Mukherjee > Attachments: activemq-1.log, activemq.log, activemq.xml > > > Observation > ---- > Active MQ stops accepting all incoming messages destined for a particular queue, after the scheduler component processing scheduled messages on that queue encounters an Null Pointer Exception. > Environment > ---- > We are using the Spring JMSTemplate component to post messages onto a queue. We also place a delay of 30 seconds on each message before posting > {noformat} > message.setLongProperty(ScheduledMessage.AMQ_SCHEDULED_DELAY, Integer.parseInt("30") * 1000); > {noformat} > We use the Spring Default Message Listener Container to receive messages. > Normally the broker runs fine and we have seen messages appear under the "Scheduled" tab on the Active MQ Console, after which they processed normally and we can tally using the "Messages Enqueued" and the "Messages Dequeued" numbers. But occasionally the Job Scheduler fails with the following exception. > {code:xml} > 2010-12-10 16:31:38,522 | ERROR | JMS Failed to schedule job | org.apache.activemq.broker.scheduler.JobSchedulerImpl | JobScheduler:JMS > java.lang.NullPointerException > at org.apache.kahadb.index.BTreeIndex.loadNode(BTreeIndex.java:264) > at org.apache.kahadb.index.BTreeNode.getChild(BTreeNode.java:225) > at org.apache.kahadb.index.BTreeNode.remove(BTreeNode.java:330) > at org.apache.kahadb.index.BTreeIndex.remove(BTreeIndex.java:194) > at org.apache.activemq.broker.scheduler.JobSchedulerImpl.remove(JobSchedulerImpl.java:347) > at org.apache.activemq.broker.scheduler.JobSchedulerImpl$4.execute(JobSchedulerImpl.java:125) > at org.apache.kahadb.page.Transaction.execute(Transaction.java:728) > at org.apache.activemq.broker.scheduler.JobSchedulerImpl.remove(JobSchedulerImpl.java:123) > at org.apache.activemq.broker.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:515) > at org.apache.activemq.broker.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:429) > at java.lang.Thread.run(Thread.java:619) > 2010-12-10 16:31:39,561 | INFO | JobSchedulerStore:activemq-data/primary/scheduler stopped | org.apache.activemq.broker.scheduler.JobSchedulerStore | JobScheduler:JMS > {code} > Why does the Job Scheduler fail? One possible reason we have found is that the clock time settings on the VMs producing the messages and the broker as well as the consumers are all different. So the Job Scheduler may be shutting itself down arbitrarily due to this difference in clock. We are in the process of syncing all clocks but we are not sure whether this will solve the problem. > Bug > ---- > But the Bug really is, even if the Job Scheduler encounters an Null Pointer, why should it shutdown? Even more problematic is the fact that the queue itself stalls and does not accept anymore messages after the Job Scheduler shuts down. > We have tried to delete the db.redo log to recover from this type of shutdown. The broker recovers fine, but all messages posted to this queue after the Job Scheduler shut itself down were lost. We have not been able to recover those messages. > I am attaching the activemq log and activemq configuration file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.