storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "P. Taylor Goetz" <ptgo...@gmail.com>
Subject Re: Storm 0.9.0.1 - topology halts processing after a while
Date Fri, 14 Feb 2014 02:34:46 GMT
Yeah, we had to disable issues on nathanmarz/storm in order to get everyone to start using
the Apache infrastructure. An unfortunate side effect of that was that now links to old github
issues return a 404.

All the github issues before the move have been migrated to JIRA now (https://issues.apache.org/jira/browse/STORM).
You may have to do some searching to find them, but James Xu did a lot of work to ensure that
most of the comments got transferred to JIRA.

I’ll consolidate them under a single JIRA when I get the chance, but the relevant issues
are:

STORM-202, STORM-105, STORM-29, STORM-143, and STORM-95

Or use this query: https://issues.apache.org/jira/browse/STORM-202?jql=project%20%3D%20STORM%20AND%20text%20~%20%22stdout%22

It would be great if we could come up with a proper solution. For those who have been bitten
by this it’s painful — zero indication of what the issue is.

At the very least, it should be well documented. Heisenbugs suck. ;)

- Taylor

On Feb 13, 2014, at 11:10 AM, Marc Vaillant <vaillant@animetrics.com> wrote:

> I was going to mention something similar.  The stdout/stderr issue is
> insidious.  Only happens in cluster mode and while it was fixed for some
> stdout/stderr pathways, others have not been fixed AFAIK.  Taylor has
> mentioned one, another is stdout/stderr coming from native code via JNI
> (i.e.  printf, std::cout, std::cerr from C and C++).   Unfortunately, I
> seem to be unable to access the original issue thread from github that I
> tried to reopen about 9 months ago. 
> 
> Marc
> 
> On Wed, Feb 12, 2014 at 03:50:52PM -0500, P. Taylor Goetz wrote:
>> It looks like you have GC logging turned on, but you haven t specified an
>> output file:
>> 
>> "worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
>> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.port=1%ID% -Xmx1200m 
>> 
>> I ve run into this before:
>> 
>> "The root cause was that GC logging for workers had been turned on without
>> specifying an output file. As a result the GC logging went to standard output
>> without being redirected to logback. Eventually the buffer filled and the JVM
>> would hang (and thus stop heartbeating) and be killed. The amount of time the
>> worker would last depended on memory pressure and allocated heap size
>> (obviously, in hindsight   the more GC, the more GC logging, the faster the
>> buffer fills).
>> 
>> The symptoms were workers timing out and being killed for no apparent reason. 
>> 
>> I would try with GC logging turned off or redirected to a file.
>> 
>> - Taylor
>> 
>> On Feb 12, 2014, at 3:12 PM, Mark Hu <markhu101@gmail.com> wrote:
>> 
>> 
>>    Hi guys,
>> 
>> 
>>    Just an update.
>> 
>>    1. After investigating the hang, it looked like the Spout was properly
>>    sending tuples to a bolt, but the bolt would never acknowledge the incoming
>>    tuple. After switching from using the Netty transport layer back to 0MQ,
>>    recovery and reliability guarantees were restored. Please note that we're
>>    using jruby-1.7.3 on top of storm java - could this cause issues with
>>    netty?
>>    2. Can anyone confirm if the above worker restart behaviour is normal?
>> 
>>    Thanks,
>>    Mark
>> 
>> 
>> 
>> On Feb 6, 2014, at 10:20 PM, Mark Hu <markhu101@gmail.com> wrote:
>> 
>> 
>>    HISTORY:
>>    I have past experience with Storm 0.8.2 and encountered silent worker death
>>    during development. After experimenting with various parallelism, worker,
>>    and max_spout configurations, we were able to find one that worked well
>>    with high reliability despite this issue. 
>> 
>>    The silent worker death basically involves a worker not responding and
>>    waiting 30 seconds before starting to process work again. A full load test
>>    reveals no data loss here.
>> 
>>    I have this topology running well for the past 3 months and the only errors
>>    were due to the user.
>> 
>>    PRESENT:
>>    I am now testing Storm 0.9.0.1 and it appears the silent worker death issue
>>    is still present, but now the topology hangs. Here are the details:
>> 
>>    Type and description for my test topology
>>    1. One spout - makes web calls to a rest service to retrieve ids.
>>    2. One bolt - matchRetrieveBolt - retrieves blobs for those ids.
>>    3. 50 bolts - matchProcessBolt - does some processing on the blobs.
>> 
>>    Storm performed swimmingly for about 63,000 blobs before it hung. After
>>    examining the logs, it appears that one worker was processing both bolt
>>    type 2 and bolt type 3. For some reason, a reset occurs and the bolts
>>    reinitialize but become unresponsive. The spout continues to send work but
>>    times out and retries.
>> 
>>    Here is the data for worker-6710.log:
>> 
>> 
>>    2014-02-07 01:31:26 MatchProcessBolt [INFO] [BOLT-MatchProcessBolt] bitfield already
allocated, ttl: -1
>>    2014-02-07 01:31:26 MatchProcessBolt [INFO] [BOLT-MatchProcessBolt] All done.
Acking now
>>    2014-02-07 01:31:26 MatchRetrieveBolt [INFO] [BOLT-1] process: first = 66300,
last = 66350
>>    2014-02-07 01:31:26 MatchRetrieveBolt [INFO] [BOLT-1] url: SOMEURL
>> 
>> 
>>    As per the above, this worker is performing bolt1 and bolt2 work.
>> 
>>    Suddenly, a restart occurs as per the below:
>> 
>> 
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.3.3-1073969,
built on 02/23/2011 22:27 GMT
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:host.name=dev-shared2-supervisor-i-e67015c7
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.version=1.6.0_27
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.vendor=Sun
Microsystems Inc.
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.class.path=/opt/storm/storm-0.9.0.1/storm-netty-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-console-logging-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-core-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/lib/jline-0.9.94.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5.jar:/opt/storm/storm-0.9.0.1/lib/minlog-1.2.jar:/opt/storm/storm-0.9.0.1/lib/curator-framework-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/snakeyaml-1.11.jar:/opt/storm/storm-0.9.0.1/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/storm-0.9.0.1/lib/clj-time-0.4.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-core-1.1.5.jar:/opt/storm/storm-0.9.0.1/lib/mockito-all-1.9.5.jar:/opt/storm/storm-0.9.0.1/lib/tools.logging-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/disruptor-2.10.1.jar:/opt/storm/storm-0.9.0.1/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5-20081211.jar:/opt/storm/storm-0.9.0.1/lib/carbonite-1.5.0.jar:/opt/storm/storm-0.9.0.1/lib/guava-13.0.jar:/opt/storm/storm-0.9.0.1/lib/commo>
ns-fileupload-1.2.1.jar:/opt/storm/storm-0.9.0.1/lib/joda-time-2.0.jar:/opt/storm/storm-0.9.0.1/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.9.0.1/lib/netty-3.6.3.Final.jar:/opt/storm/storm-0.9.0.1/lib/clojure-complete-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/tools.cli-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/logback-classic-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/ring-servlet-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/asm-4.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-exec-1.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/commons-lang-2.5.jar:/opt/storm/storm-0.9.0.1/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.9.0.1/lib/clout-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/clj-stacktrace-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/reflectasm-1.07-shaded.jar:/opt/storm/storm-0.9.0.1/lib/junit-3.8.1.jar:/opt/storm/storm-0.9.0.1/lib/jzmq-2.1.0.jar:/opt/storm/storm-0.9.0.1/lib/jetty-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/kryo-2.17.jar:/opt/storm/storm-0.9.0.1/lib/ring-devel-0.3.11.jar:/o>
pt/storm/storm-0.9.0.1/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.9.0.1/lib/compojure-1.1.3.jar:/opt/storm/storm-0.9.0.1/lib/jetty-util-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/commons-io-1.4.jar:/opt/storm/storm-0.9.0.1/lib/slf4j-api-1.6.5.jar:/opt/storm/storm-0.9.0.1/lib/clojure-1.4.0.jar:/opt/storm/storm-0.9.0.1/lib/tools.nrepl-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/curator-client-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/httpcore-4.1.jar:/opt/storm/storm-0.9.0.1/lib/logback-core-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/commons-codec-1.4.jar:/opt/storm/storm-0.9.0.1/lib/json-simple-1.1.jar:/opt/storm/storm-0.9.0.1/lib/core.incubator-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/libthrift7-0.7.0-2.jar:/opt/storm/storm-0.9.0.1/lib/tools.macro-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/httpclient-4.1.1.jar:/opt/storm/storm-0.9.0.1/lib/objenesis-1.2.jar:/opt/storm/storm-0.9.0.1/lib/math.numeric-tower-0.0.1.jar:/opt/storm/storm-0.9.0.1/conf:/mnt/storm/supervisor/stormdist/match_parser-12-1391735866/stormjar.jar
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.io.tmpdir=/tmp
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.compiler=<NA>
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:os.name=Linux
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:os.arch=amd64
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:os.version=3.2.0-48-virtual
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:user.name=storm
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:user.home=/home/storm
>>    2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:user.dir=/opt/storm/storm-0.9.0.1
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:zookeeper.version=3.3.3-1073969,
built on 02/23/2011 22:27 GMT
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:host.name=dev-shared2-supervisor-i-e67015c7
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.version=1.6.0_27
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.vendor=Sun
Microsystems Inc.
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.class.path=/opt/storm/storm-0.9.0.1/storm-netty-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-console-logging-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-core-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/lib/jline-0.9.94.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5.jar:/opt/storm/storm-0.9.0.1/lib/minlog-1.2.jar:/opt/storm/storm-0.9.0.1/lib/curator-framework-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/snakeyaml-1.11.jar:/opt/storm/storm-0.9.0.1/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/storm-0.9.0.1/lib/clj-time-0.4.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-core-1.1.5.jar:/opt/storm/storm-0.9.0.1/lib/mockito-all-1.9.5.jar:/opt/storm/storm-0.9.0.1/lib/tools.logging-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/disruptor-2.10.1.jar:/opt/storm/storm-0.9.0.1/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5-20081211.jar:/opt/storm/storm-0.9.0.1/lib/carbonite-1.5.0.jar:/opt/storm/storm-0.9.0.1/lib/guava-13.0.jar:/opt/storm/storm-0.9.0.1/l>
ib/commons-fileupload-1.2.1.jar:/opt/storm/storm-0.9.0.1/lib/joda-time-2.0.jar:/opt/storm/storm-0.9.0.1/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.9.0.1/lib/netty-3.6.3.Final.jar:/opt/storm/storm-0.9.0.1/lib/clojure-complete-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/tools.cli-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/logback-classic-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/ring-servlet-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/asm-4.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-exec-1.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/commons-lang-2.5.jar:/opt/storm/storm-0.9.0.1/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.9.0.1/lib/clout-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/clj-stacktrace-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/reflectasm-1.07-shaded.jar:/opt/storm/storm-0.9.0.1/lib/junit-3.8.1.jar:/opt/storm/storm-0.9.0.1/lib/jzmq-2.1.0.jar:/opt/storm/storm-0.9.0.1/lib/jetty-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/kryo-2.17.jar:/opt/storm/storm-0.9.0.1/lib/ring-devel-0.3.1>
1.jar:/opt/storm/storm-0.9.0.1/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.9.0.1/lib/compojure-1.1.3.jar:/opt/storm/storm-0.9.0.1/lib/jetty-util-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/commons-io-1.4.jar:/opt/storm/storm-0.9.0.1/lib/slf4j-api-1.6.5.jar:/opt/storm/storm-0.9.0.1/lib/clojure-1.4.0.jar:/opt/storm/storm-0.9.0.1/lib/tools.nrepl-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/curator-client-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/httpcore-4.1.jar:/opt/storm/storm-0.9.0.1/lib/logback-core-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/commons-codec-1.4.jar:/opt/storm/storm-0.9.0.1/lib/json-simple-1.1.jar:/opt/storm/storm-0.9.0.1/lib/core.incubator-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/libthrift7-0.7.0-2.jar:/opt/storm/storm-0.9.0.1/lib/tools.macro-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/httpclient-4.1.1.jar:/opt/storm/storm-0.9.0.1/lib/objenesis-1.2.jar:/opt/storm/storm-0.9.0.1/lib/math.numeric-tower-0.0.1.jar:/opt/storm/storm-0.9.0.1/conf:/mnt/storm/supervisor/stormdist/match_parser-12-1391735866/stormjar.jar
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.io.tmpdir=/tmp
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.compiler=<NA>
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.name=Linux
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.arch=amd64
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.version=3.2.0-48-virtual
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.name=storm
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.home=/home/storm
>>    2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.dir=/opt/storm/storm-0.9.0.1
>>    2014-02-07 01:32:01 b.s.d.worker [INFO] Launching worker for match_parser-12-1391735866
on f1b66213-7237-483e-a5db-e543bb36bcd9:6710 with id 80483ed6-2c51-4f86-9d06-9d2bed705748
and conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs"
nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization"
true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, "topology.skip.missing.kryo.registrations"
false, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout"
20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500, "nimbus.monitor.freq.secs"
10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
"topology.executor.send.buffer.size" 1024, "storm.local.dir" "/mnt/storm", "storm.messaging.netty.buffer_size"
5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true,
"nim> bus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads"
64, "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "10.0.5.50", "storm.messaging.netty.min_wait_ms"
100, "storm.zookeeper.port" 2181, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis"
30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers"
["10.0.6.131" "10.0.6.130" "10.0.6.55"], "transactional.zookeeper.root" "/transactional",
"topology.acker.executors" nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts"
nil, "drpc.queue.size" 128, "worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=1%ID% -Xmx1200m", "sup> ervisor.heartbeat.frequency.secs"
5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs"
3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs"
3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 100, "topology.spout.wait.strategy"
"backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval"
1000, "topology.sleep.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator",
"supervisor.slots.ports" [6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712
6713 6714 6715 6716 6717 6718 6719 6720 6721], "topology.debug" false, "nimbus.task.launch.secs"
120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs"
10, "topology.workers" 1, "supervisor.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote -Dcom.> sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8091", "nimbus.thrift.port" 6627, "topology.stats.sample.rate"
0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer",
"topology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs"
30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory",
"drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchronization.timeout.secs"
60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport"
"backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms"
1000, "drpc.request.timeout.secs> " 600, "storm.local.mode.zmq" false, "ui.port" 8080,
"nimbus.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8090", "storm.cluster.mode" "distributed", "topology.optimize"
true, "topology.max.task.parallelism" nil}
>>    2014-02-07 01:32:01 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>    2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.6.131:2181,10.0.6.130:2181,10.0.6.55:2181
sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@25de152f
>>    2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Opening socket connection to server
/10.0.6.131:2181
>>    2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Socket connection established to ip-10-0-6-131.ec2.internal/10.0.6.131:2181,
initiating session
>>    2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Session establishment complete on
server ip-10-0-6-131.ec2.internal/10.0.6.131:2181, sessionid = 0xf343fa0ee4d600f0, negotiated
timeout = 20000
>>    2014-02-07 01:32:01 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
>>    2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Session: 0xf343fa0ee4d600f0 closed
>>    2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] EventThread shut down
>>    2014-02-07 01:32:01 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>    2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.6.131:2181,10.0.6.130:2181,10.0.6.55:2181/storm
sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@31930611
>>    2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Opening socket connection to server
/10.0.6.131:2181
>>    2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Socket connection established to ip-10-0-6-131.ec2.internal/10.0.6.131:2181,
initiating session
>>    2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Session establishment complete on
server ip-10-0-6-131.ec2.internal/10.0.6.131:2181, sessionid = 0xf343fa0ee4d600f1, negotiated
timeout = 20000
>>    2014-02-07 01:32:01 b.s.m.TransportFactory [INFO] Storm peer transport plugin:backtype.storm.messaging.netty.Context
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor 2:[2 2]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks 2:[2 2]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor 2:[2 2]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor __acker:[102 102]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt 2:(2)
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks __acker:[102 102]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Timeouts disabled for executor __acker:[102
102]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt __acker:(102)
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor __acker:[102
102]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Prepared bolt __acker:(102)
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor 3:[52 52]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks 3:[52 52]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor 3:[52 52]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt 3:(52)
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor __system:[-1 -1]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks __system:[-1 -1]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor __system:[-1
-1]
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt __system:(-1)
>>    2014-02-07 01:32:01 b.s.d.worker [INFO] Launching receive-thread for f1b66213-7237-483e-a5db-e543bb36bcd9:6710
>>    2014-02-07 01:32:01 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>>    2014-02-07 01:32:02 b.s.d.worker [INFO] Worker has topology config {"storm.id"
"match_parser-12-1391735866", "dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs"
nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization"
true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, "topology.skip.missing.kryo.registrations"
false, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout"
20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500, "nimbus.monitor.freq.secs"
10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
"executor.send.buffer.size" 16384, "topology.executor.send.buffer.size" 1024, "storm.local.dir"
"/mnt/storm", "iwparser_socket" "/tmp/iwparser2Sockets/iwparserClients.ipc", "storm.messaging.netty.buffer_size"
5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.t> imeouts"
true, "nimbus.cleanup.inbox.freq.secs" 600, "batch_size" 50, "nimbus.inbox.jar.expiration.secs"
3600, "aws_creds.aws_secret_access_key" "wmGLQ4ItvdhniRFMJe3yDJvBwTJUTqbuLYAS5ZPU", "drpc.worker.threads"
64, "redis.processed.match.log.size" 100000, "redis.key.match.ok" "MATCH-OK", "topology.worker.shared.thread.pool.size"
4, "nimbus.host" "10.0.5.50", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port"
2181, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "aws_creds.aws_access_key_id"
"AKIAJP4O6MTXON3MLV2Q", "executor.receive.buffer.size" 16384, "transactional.zookeeper.servers"
nil, "storm.zookeeper.root" "/storm", "aws.s3.match_bucket_name" "prod-matches", "storm.zookeeper.retry.intervalceiling.millis"
30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers"
["10.0.6.131" "10.0.6.130" "10.0.6.55"], "transactional.zookeeper.root" "/transactional",
"topology.acker.executors" 50, "topology.kryo.decorators" > (), "topology.name" "match_parser",
"topology.transfer.buffer.size" 1024, "topology.worker.childopts" "-Djruby.compat.version=RUBY1_9",
"drpc.queue.size" 128, "worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=1%ID% -Xmx1200m", "supervisor.heartbeat.frequency.secs"
5, "transfer.buffer.size" 32, "topology.error.throttle.interval.secs" 10, "throttle.spout"
1, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "drpc.childopts"
"-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks"
nil, "storm.messaging.netty.max_retries" 100, "topology.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy",
"receiver.buffer.size" 8, "debug.logging.redis" true, "topology.max.spout.pending" 1, "storm.zookeeper.retry.interval"
1000, "topology.sleep.spout.wait.strategy.time.ms" 1> , "platform" "xbox", "nimbus.topology.validator"
"backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" [6700 6701 6702
6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720
6721], "iwparser_port" 12330, "redis.port" 6379, "topology.debug" false, "nimbus.task.launch.secs"
120, "nimbus.supervisor.timeout.secs" 60, "debug.logging.stdout" true, "topology.kryo.register"
nil, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "dw_retries.max_attempts"
6, "topology.workers" 50, "supervisor.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8091", "nimbus.thrift.port" 6627, "redis_prefix" "prod",
"topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
"backtype.storm.serialization.types.ListDelegateSerializer", "topology.disruptor.w> ait.strategy"
"com.lmax.disruptor.BlockingWaitStrategy", "http_proxy" "proxy", "nimbus.task.timeout.secs"
30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory",
"drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchronization.timeout.secs"
60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport"
"backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms"
1000, "debug.logging.kibana" false, "iwparser_host" "10.200.6.145", "drpc.request.timeout.secs"
600, "demonware.endpoint" "http://10.0.1.78:9000/matches/", "redis_simple_prefix" "prod",
"redis.host" "storm090-redis.uvvujo.0001.use1.cache.amazonaws.com", "dw_retries.wait_interval"
2, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts> " "-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8090",
"storm.cluster.mode" "distributed", "topology.optimize" true, "topology.max.task.parallelism"
nil, "statsd.endpoint" "statsd", "redis.log.size" 100000}
>>    2014-02-07 01:32:02 b.s.d.worker [INFO] Worker 80483ed6-2c51-4f86-9d06-9d2bed705748
for storm match_parser-12-1391735866 on f1b66213-7237-483e-a5db-e543bb36bcd9:6710 has finished
loading
>>    2014-02-07 01:32:08 MatchRetrieveBolt [INFO] [BOLT-1] on_init
>>    2014-02-07 01:32:08 MatchProcessBolt [INFO] [BOlT-MatchProcessBolt] on_init
>>    2014-02-07 01:32:08 b.s.d.executor [INFO] Prepared bolt 2:(2)
>>    2014-02-07 01:32:08 b.s.d.executor [INFO] Prepared bolt 3:(52)
>> 
>> 
>>    The bolts have reinitialized. The Storm UI is showing that bolt1 is still being
handled by this specific worker.
>> 
>> 
>> 
>>    The spout still sends tuples to matchRetrieveBolt but never gets a response. The
failure is caught and retries continue.
>> 
>> 
>>    Can anyone comment or provide any thoughts on the above issue?
>> 
>>    Thanks,
>>    Mark
>> 
>> 
> 
> 


Mime
View raw message