Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@activemq.apache.org
Date: Fri, 14 Dec 2012 14:04:14 +0000 (UTC)
From: "Michael Black (JIRA)" <jira@apache.org>
To: dev@activemq.apache.org
Message-ID: <JIRA.12623010.1354970816421.7225.1355493854167@arcas>
In-Reply-To: <JIRA.12623010.1354970816421@arcas>
References: <JIRA.12623010.1354970816421@arcas>
Subject: [jira] [Commented] (AMQ-4214) PageFile is not loaded
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/AMQ-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532309#comment-13532309 ] 

Michael Black commented on AMQ-4214:
------------------------------------

One thing I can add....here is the script I was running to test our error checking.
Our system retrieves a small message containing a document name from "activemq" below and then gets the document itself from another system (a mysql database called UDR).

The script either blocks the IP address completely (network down) or rejects the packet (service down).

So my suspicion is that doing "./fw.sh activemq down" or "./fw.sh activemq reject" or the combo is what caused this.  Perhaps in the middle of a receive or while the receive was in wait status.  And messages were being deposited in the topic (though probably at rate of maybe 1 to 10 per second...it's a realtime news feed so it varies).  During my testing the topic queue did have thousands of messages waiting since I was not pulling them off until empty for my testing.
We are on a WAN with about 25ms latency to the activemq system.

I probably tested doing this several dozen times and eventually noticed the negative numbers in the topic it was using.  All other topics were fine (though I think were the ones mainly using the activemq system).  I could try and recreate this but it's going to be random I'm sure.  And my guys on the other end of the WAN won't be happy if they have to restart ActiveMQ again.

And I'll note again...we had a bug where we were trying to reconnect an already-opened connection.  But I doubt that was the cause since it never gets to the queue in that situation.

So the 2 most likely situations IMHO:
#1 receive() is blocked and packets are either dropped or rejected during that time
#2 receive() is receiving message and packets are either dropped or rejected during that time perhaps messing with the autoacknowledge.

I'd say the odds of #2 seem small but with 25ms delays it's more than feasible for me to have hit that occurrence.

So if you receive a message and drop the connection before it's autoacknowledged could that cause it? 

#!/bin/sh
if [ "$1" == "" ];then
echo Usage: $0 [udr activemq up] [down reject]
echo To turn off firewall "$0 up"
exit 1
fi
if [ "$1" == "udr" ];then
IPADDR=10.2.100.214
PORT=8080
shift
fi
if [ "$1" == "activemq" ];then
IPADDR=10.2.100.209
PORT=61616
shift
fi
if [ "$1" == "down" ];then
echo $IPADDR down
iptables -A INPUT -s "$IPADDR" -j DROP
fi
if [ "$1" == "reject" ];then
echo port $PORT reject
iptables -A OUTPUT -p TCP --destination $IPADDR -j REJECT --reject-with tcp-reset
#iptables -A OUTPUT -p TCP --dport $PORT -j REJECT --reject-with tcp-reset
fi
if [ "$1" == "up" ];then
echo iptables flushed
iptables --flush
fi


Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems


> PageFile is not loaded
> ----------------------
>
>                 Key: AMQ-4214
>                 URL: https://issues.apache.org/jira/browse/AMQ-4214
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Message Store
>    Affects Versions: 5.7.0
>         Environment: Red Hat Enterprise Linux Server release 5.7 (Tikanga)
> System had been up for 15 days.
> The "Messages Enqueued" in the Topics tab for our topic was negative (~-63000)
> The Subscriber tab showed about 14,000 in the Pending Queue Size and was not decreasing.  Messages Enqueued/Dequeued were incrementing and messages were being delivered (this is a real-time queue).
> So I deleted the Q.  Then we could not recreate the Q and got the message below.
> Other topics in the system appeared to be fine.
> I had been doing a fair amount of testing on client error recover dropping port connections and IP connections using the firewall so was aborting connections frequently (though not doing a lot at once).
> A number of times (due to a bug) tried to make a duplication connection to the Q.  I'm not sure exactly when the topic went haywire.
> We are running over a WAN with about 30ms latency.
> Can't think of anything else we were doing different than anybody else.
> Here's our connection
>             ActiveMQConnectionFactory tcf =new ActiveMQConnectionFactory(cs);
>             conn = tcf.createTopicConnection();
>             String iName = prop.getTopicSubscriberName();
>             conn.setClientID(iName);
>             session = conn.createTopicSession(false,TopicSession.AUTO_ACKNOWLEDGE);
>             topic = session.createTopic(prop.getBigDataRepositoryTopicName());
>             conn.start();
>             String sName = iName;
>             subscriber = session.createDurableSubscriber(topic, sName);
>            Reporter: Michael Black
>
> 2012-12-07 13:07:06,317 | WARN  | Failed to browse Topic: AllDocumentsTopic | org.apache.activemq.broker.region.Topic | ActiveMQ Broker[localhost] Scheduler
> java.lang.IllegalStateException: PageFile is not loaded
>         at org.apache.kahadb.page.PageFile.assertLoaded(PageFile.java:809)
>         at org.apache.kahadb.page.PageFile.tx(PageFile.java:303)
>         at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recover(KahaDBStore.java:523)
>         at org.apache.activemq.store.ProxyTopicMessageStore.recover(ProxyTopicMessageStore.java:62)
>         at org.apache.activemq.store.ProxyTopicMessageStore.recover(ProxyTopicMessageStore.java:62)
>         at org.apache.activemq.broker.region.Topic.doBrowse(Topic.java:570)
>         at org.apache.activemq.broker.region.Topic.access$100(Topic.java:63)
>         at org.apache.activemq.broker.region.Topic$6.run(Topic.java:695)
>         at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
>         at java.util.TimerThread.mainLoop(Timer.java:512)
>         at java.util.TimerThread.run(Timer.java:462)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira