Return-Path: X-Original-To: apmail-activemq-dev-archive@www.apache.org Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C42410DCE for ; Wed, 20 Nov 2013 13:41:44 +0000 (UTC) Received: (qmail 64865 invoked by uid 500); 20 Nov 2013 13:41:43 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 64788 invoked by uid 500); 20 Nov 2013 13:41:39 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 64699 invoked by uid 99); 20 Nov 2013 13:41:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Nov 2013 13:41:36 +0000 Date: Wed, 20 Nov 2013 13:41:36 +0000 (UTC) From: "Tenzin giatso (JIRA)" To: dev@activemq.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AMQ-4837) LevelDB corrupted in AMQ cluster MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMQ-4837?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D138276= 54#comment-13827654 ]=20 Tenzin giatso commented on AMQ-4837: ------------------------------------ I've try this night with AMQ 5.9 levelDb (no cluster) in case of 30 message= s / s and there is no error or reboot (about 17h processing). So the error appends only with a minimum of messages per second (200 or mor= e). > LevelDB corrupted in AMQ cluster > -------------------------------- > > Key: AMQ-4837 > URL: https://issues.apache.org/jira/browse/AMQ-4837 > Project: ActiveMQ > Issue Type: Bug > Components: activemq-leveldb-store > Affects Versions: 5.9.0 > Environment: CentOS, Linux version 2.6.32-71.29.1.el6.x86_64 > java-1.7.0-openjdk.x86_64/java-1.6.0-openjdk.x86_64 > zookeeper-3.4.5.2 > Reporter: Guillaume > Assignee: Hiram Chirino > Priority: Critical > Attachments: LevelDBCorrupted.zip, activemq.xml > > > I have clustered 3 ActiveMQ instances using replicated leveldb and zookee= per. When performing some tests using Web UI, I can across issues that appe= ars to corrupt the leveldb data files. > The issue can be replicated by performing the following steps: > 1.=09Start 3 activemq nodes. > 2.=09Push a message to the master (Node1) and browse the queue using the = web UI > 3.=09Stop master node (Node1) > 4.=09Push a message to the new master (Node2) and browse the queue using = the web UI. Message summary and queue content ok. > 5.=09Start Node1 > 6.=09Stop master node (Node2) > 7.=09Browse the queue using the web UI on new master (Node3). Message sum= mary ok however when clicking on the queue, no message details. An error (s= ee below) is logged by the master, which attempts a restart. > From this point, the database appears to be corrupted and the same error = occurs to each node infinitely (shutdown/restart). The only way around is t= o stop the nodes and clear the data files. > However when a message is pushed between step 5 and 6, the error doesn=E2= =80=99t occur. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > Leveldb configuration on the 3 instances: > =09=09 > =09=09=09 =09=09=09=09=09directory=3D"${activemq.data}/leveldb" > =09=09=09=09=09replicas=3D"3" > =09=09=09=09=09bind=3D"tcp://0.0.0.0:0" > =09=09=09=09=09zkAddress=3D"zkserver:2181" > =09=09=09=09=09zkPath=3D"/activemq/leveldb-stores" > =09=09=09=09=09/> > =09=09 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > The error is: > INFO | Stopping BrokerService[localhost] due to exception, java.io.IOExce= ption > java.io.IOException > at org.apache.activemq.util.IOExceptionSupport.create(IOException= Support.java:39) > at org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBCl= ient.scala:543) > at org.apache.activemq.leveldb.LevelDBClient.might_fail_using_ind= ex(LevelDBClient.scala:974) > at org.apache.activemq.leveldb.LevelDBClient.collectionCursor(Lev= elDBClient.scala:1270) > at org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBC= lient.scala:1194) > at org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager= .scala:708) > at org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.re= coverNextMessages(LevelDBStore.scala:741) > at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.d= oFillBatch(QueueStorePrefetch.java:106) > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.= fillBatch(AbstractStoreCursor.java:258) > at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.= reset(AbstractStoreCursor.java:108) > at org.apache.activemq.broker.region.cursors.StoreQueueCursor.res= et(StoreQueueCursor.java:157) > at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Qu= eue.java:1875) > at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.j= ava:2086) > at org.apache.activemq.broker.region.Queue.iterate(Queue.java:158= 1) > at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTask= Runner.java:129) > at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRu= nner.java:47) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolEx= ecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE= xecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.lang.NullPointerException > at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor= $1.apply(LevelDBClient.scala:1198) > at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor= $1.apply(LevelDBClient.scala:1194) > at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionC= ursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1272) > at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionC= ursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1271) > at org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(Level= DBClient.scala:315) > at org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(L= evelDBClient.scala:317) > at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionC= ursor$1.apply$mcV$sp(LevelDBClient.scala:1271) > at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionC= ursor$1.apply(LevelDBClient.scala:1271) > at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionC= ursor$1.apply(LevelDBClient.scala:1271) > at org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBCl= ient.scala:968) > at org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_= using_index$1.apply(LevelDBClient.scala:974) > at org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBCl= ient.scala:540) > ... 17 more -- This message was sent by Atlassian JIRA (v6.1#6144)