activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Karpenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMQ-5618) Infinite loop in log replay with Replicated LevelDB
Date Mon, 09 Mar 2015 11:42:39 GMT

    [ https://issues.apache.org/jira/browse/AMQ-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352843#comment-14352843
] 

Artem Karpenko commented on AMQ-5618:
-------------------------------------

Hello Pablo.
The steps outlined above _always_ reproduce the problem for me. The steps in description alone
always do it on a separate environment I've setup in Google Compute Engine. And the steps
with corrrections in the first comment do the same in testing environment of our internal
project. Not sure why the first variant does not "work" there though.

<offtopic>The number and severity of bugs we've encountered when using Replicated LevelDB
(f.e. see AMQ-5619, AMQ-5612, AMQ-5611) made us switch to the old solution with JDBC and HA
MySQL (Google provides that). Previously we changed to Replicated LevelDB because change of
master occurred rather regularly (like once a day) with JDBC and made queue counters reset.
But now we'll see that as an improvement upon the current situation - at least there were
no cases when store was corrupted beyond repair.</offtopic>

> Infinite loop in log replay with Replicated LevelDB
> ---------------------------------------------------
>
>                 Key: AMQ-5618
>                 URL: https://issues.apache.org/jira/browse/AMQ-5618
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.11.0, 5.11.1
>         Environment: Linux, Google Compute Engine
>            Reporter: Artem Karpenko
>            Priority: Critical
>
> This is very similar to AMQ-5300 except that I use replicatedLevelDB persistence adapter
and in order to reproduce I don't have to delete any index files.
> Setup: 1 ZK instance, 3 AMQ nodes.
> One of the AMQ configs:
> {code}
> <replicatedLevelDB directory="${activemq.data}/replicatedLevelDB"
>                                 replicas="3"
>                                 bind="tcp://0.0.0.0:61619"
>                                 zkAddress="instance-6:2181"
>                                 zkPath="/activemq/leveldb-stores"
>                                 hostname="instance-7" />
> {code}
> Difference between nodes is only in hostname attribute.
> The way to reproduce is almost the same as in AMQ-5300: 
> # Produce lots of messages to generate several log files in leveldb data directory.
> # Consume _some_ messages until you see "Deleting log" in activemq.log.
> # Restart master. Wait for system to rebalance itself. Everything's fine at this point.
> # Restart the second master.
> # Observe the massive (infinite?) logging on slave and relatively calm but still possibly
infinite logging on master.
> This is what the first master logs after it's restarted:
> {code}
> 2015-02-25 21:37:08,338 | DEBUG | Download session connected... | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore
| hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:08,582 | INFO  | Slave skipping download of: log/00000000190be289.log
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,099 | INFO  | Slave skipping download of: log/000000000642f848.log
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,411 | INFO  | Slave skipping download of: log/000000000c85f06d.log
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,838 | INFO  | Slave skipping download of: log/0000000012c8e921.log
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,842 | INFO  | Slave requested: 000000001c9373b4.index/CURRENT | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore
| hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,846 | INFO  | Slave requested: 000000001c9373b4.index/MANIFEST-000002
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,850 | INFO  | Slave requested: 000000001c9373b4.index/000003.log
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,857 | INFO  | Attaching... Downloaded 0.02/95.65 kb and 1/3 files
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,859 | INFO  | Attaching... Downloaded 0.06/95.65 kb and 2/3 files
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,861 | INFO  | Attaching... Downloaded 95.65/95.65 kb and 3/3 files
| org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,862 | INFO  | Attached | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore
| hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,878 | DEBUG | Taking a snapshot of the current index: /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000001c9373b4.index
| org.apache.activemq.leveldb.LevelDBClient | Thread-2
> 2015-02-25 21:37:10,352 | DEBUG | Recovering from last index snapshot at: /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/dirty.index
| org.apache.activemq.leveldb.LevelDBClient | Thread-2
> {code}
> Right after that everything seems fine. But as soon as I stop the new master, the another
new master (that would be the third one) logs
> {code}
> 2015-02-25 21:38:43,876 | INFO  | Promoted to master | org.apache.activemq.leveldb.replicated.MasterElector
| main-EventThread
> 2015-02-25 21:38:43,894 | INFO  | Using the pure java LevelDB implementation. | org.apache.activemq.leveldb.LevelDBClient
| ActiveMQ BrokerService[localhost] Task-5
> 2015-02-25 21:38:45,954 | WARN  | Invalid log position: 44 | org.apache.activemq.leveldb.LevelDBClient
| ActiveMQ BrokerService[localhost] Task-5
> 2015-02-25 21:38:45,955 | WARN  | Invalid log position: 325280 | org.apache.activemq.leveldb.LevelDBClient
| ActiveMQ BrokerService[localhost] Task-5
> ...
> 2015-02-25 21:38:46,696 | WARN  | Invalid log position: 104726993 | org.apache.activemq.leveldb.LevelDBClient
| ActiveMQ BrokerService[localhost] Task-5
> 2015-02-25 21:38:46,902 | INFO  | Master started: tcp://instance-8:61619 | org.apache.activemq.leveldb.replicated.MasterElector
| ActiveMQ BrokerService[localhost] Task-6
> ...
> 2015-02-25 21:38:52,127 | INFO  | Initializing Spring FrameworkServlet 'dispatcher' |
/admin | main
> 2015-02-25 21:38:53,181 | INFO  | jolokia-agent: No access restrictor found at classpath:/jolokia-access.xml,
access to all MBeans is allowed | /api | main
> 2015-02-25 21:38:56,881 | WARN  | Invalid log position: 0 | org.apache.activemq.leveldb.LevelDBClient
| Thread-4
> 2015-02-25 21:39:06,884 | WARN  | Invalid log position: 0 | org.apache.activemq.leveldb.LevelDBClient
| Thread-4
> 2015-02-25 21:39:16,887 | WARN  | Invalid log position: 0 | org.apache.activemq.leveldb.LevelDBClient
| Thread-4
> ...
> {code}
> the last message starts repeating every 10 seconds without hint of stopping.
> At the same time the one left slave (that was the 1st master) logs
> {code}
> 2015-02-25 21:38:50,124 | INFO  | Attached | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore
| hawtdispatch-DEFAULT-1
> 2015-02-25 21:38:50,134 | DEBUG | Taking a snapshot of the current index: /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000001c9373b4.index
| org.apache.activemq.leveldb.LevelDBClient | Thread-3
> 2015-02-25 21:38:50,149 | DEBUG | Recovering from last index snapshot at: /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/dirty.index
| org.apache.activemq.leveldb.LevelDBClient | Thread-3
> 2015-02-25 21:38:50,189 | DEBUG | Replay of journal from: 0 to 479425460. | org.apache.activemq.leveldb.LevelDBClient
| Thread-3
> 2015-02-25 21:38:50,328 | WARN  | No reader available for position: 0, log_infos: {105052232=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000642f848.log,105052232,105052197),
210104429=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000c85f06d.log,210104429,105052340),
315156769=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/0000000012c8e921.log,315156769,105052520),
420209289=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/00000000190be289.log,420209289,0)}
| org.apache.activemq.leveldb.RecordLog | Thread-3
> 2015-02-25 21:38:50,332 | WARN  | No reader available for position: 0, log_infos: {105052232=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000642f848.log,105052232,105052197),
210104429=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000c85f06d.log,210104429,105052340),
315156769=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/0000000012c8e921.log,315156769,105052520),
420209289=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/00000000190be289.log,420209289,0)}
| org.apache.activemq.leveldb.RecordLog | Thread-3
> ...
> {code}
> the last message repeats infinitely and every ~1ms, with activemq process taking 100%
CPU.
> If I would now start the stopped node (that was the 2d master) it would log exactly the
same message with the same intensity.
> It should also be noted that while I can observe correct diminished (relatively to initial,
due to some consumption) count of messages in a queue via console after the first restart,
the second restart changes that and queue size becomes bigger (although I'm not sure if it
is equal to the size of queue before any consumption).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message