Return-Path: X-Original-To: apmail-activemq-dev-archive@www.apache.org Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AEB32EA51 for ; Fri, 14 Dec 2012 14:04:16 +0000 (UTC) Received: (qmail 88729 invoked by uid 500); 14 Dec 2012 14:04:16 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 88463 invoked by uid 500); 14 Dec 2012 14:04:16 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 88134 invoked by uid 99); 14 Dec 2012 14:04:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Dec 2012 14:04:14 +0000 Date: Fri, 14 Dec 2012 14:04:14 +0000 (UTC) From: "Michael Black (JIRA)" To: dev@activemq.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AMQ-4214) PageFile is not loaded MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMQ-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532309#comment-13532309 ] Michael Black commented on AMQ-4214: ------------------------------------ One thing I can add....here is the script I was running to test our error checking. Our system retrieves a small message containing a document name from "activemq" below and then gets the document itself from another system (a mysql database called UDR). The script either blocks the IP address completely (network down) or rejects the packet (service down). So my suspicion is that doing "./fw.sh activemq down" or "./fw.sh activemq reject" or the combo is what caused this. Perhaps in the middle of a receive or while the receive was in wait status. And messages were being deposited in the topic (though probably at rate of maybe 1 to 10 per second...it's a realtime news feed so it varies). During my testing the topic queue did have thousands of messages waiting since I was not pulling them off until empty for my testing. We are on a WAN with about 25ms latency to the activemq system. I probably tested doing this several dozen times and eventually noticed the negative numbers in the topic it was using. All other topics were fine (though I think were the ones mainly using the activemq system). I could try and recreate this but it's going to be random I'm sure. And my guys on the other end of the WAN won't be happy if they have to restart ActiveMQ again. And I'll note again...we had a bug where we were trying to reconnect an already-opened connection. But I doubt that was the cause since it never gets to the queue in that situation. So the 2 most likely situations IMHO: #1 receive() is blocked and packets are either dropped or rejected during that time #2 receive() is receiving message and packets are either dropped or rejected during that time perhaps messing with the autoacknowledge. I'd say the odds of #2 seem small but with 25ms delays it's more than feasible for me to have hit that occurrence. So if you receive a message and drop the connection before it's autoacknowledged could that cause it? #!/bin/sh if [ "$1" == "" ];then echo Usage: $0 [udr activemq up] [down reject] echo To turn off firewall "$0 up" exit 1 fi if [ "$1" == "udr" ];then IPADDR=10.2.100.214 PORT=8080 shift fi if [ "$1" == "activemq" ];then IPADDR=10.2.100.209 PORT=61616 shift fi if [ "$1" == "down" ];then echo $IPADDR down iptables -A INPUT -s "$IPADDR" -j DROP fi if [ "$1" == "reject" ];then echo port $PORT reject iptables -A OUTPUT -p TCP --destination $IPADDR -j REJECT --reject-with tcp-reset #iptables -A OUTPUT -p TCP --dport $PORT -j REJECT --reject-with tcp-reset fi if [ "$1" == "up" ];then echo iptables flushed iptables --flush fi Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems > PageFile is not loaded > ---------------------- > > Key: AMQ-4214 > URL: https://issues.apache.org/jira/browse/AMQ-4214 > Project: ActiveMQ > Issue Type: Bug > Components: Message Store > Affects Versions: 5.7.0 > Environment: Red Hat Enterprise Linux Server release 5.7 (Tikanga) > System had been up for 15 days. > The "Messages Enqueued" in the Topics tab for our topic was negative (~-63000) > The Subscriber tab showed about 14,000 in the Pending Queue Size and was not decreasing. Messages Enqueued/Dequeued were incrementing and messages were being delivered (this is a real-time queue). > So I deleted the Q. Then we could not recreate the Q and got the message below. > Other topics in the system appeared to be fine. > I had been doing a fair amount of testing on client error recover dropping port connections and IP connections using the firewall so was aborting connections frequently (though not doing a lot at once). > A number of times (due to a bug) tried to make a duplication connection to the Q. I'm not sure exactly when the topic went haywire. > We are running over a WAN with about 30ms latency. > Can't think of anything else we were doing different than anybody else. > Here's our connection > ActiveMQConnectionFactory tcf =new ActiveMQConnectionFactory(cs); > conn = tcf.createTopicConnection(); > String iName = prop.getTopicSubscriberName(); > conn.setClientID(iName); > session = conn.createTopicSession(false,TopicSession.AUTO_ACKNOWLEDGE); > topic = session.createTopic(prop.getBigDataRepositoryTopicName()); > conn.start(); > String sName = iName; > subscriber = session.createDurableSubscriber(topic, sName); > Reporter: Michael Black > > 2012-12-07 13:07:06,317 | WARN | Failed to browse Topic: AllDocumentsTopic | org.apache.activemq.broker.region.Topic | ActiveMQ Broker[localhost] Scheduler > java.lang.IllegalStateException: PageFile is not loaded > at org.apache.kahadb.page.PageFile.assertLoaded(PageFile.java:809) > at org.apache.kahadb.page.PageFile.tx(PageFile.java:303) > at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recover(KahaDBStore.java:523) > at org.apache.activemq.store.ProxyTopicMessageStore.recover(ProxyTopicMessageStore.java:62) > at org.apache.activemq.store.ProxyTopicMessageStore.recover(ProxyTopicMessageStore.java:62) > at org.apache.activemq.broker.region.Topic.doBrowse(Topic.java:570) > at org.apache.activemq.broker.region.Topic.access$100(Topic.java:63) > at org.apache.activemq.broker.region.Topic$6.run(Topic.java:695) > at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira