storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "P. Taylor Goetz" <ptgo...@gmail.com>
Subject Re: Storm 0.9.0.1 - topology halts processing after a while
Date Wed, 12 Feb 2014 20:50:52 GMT
It looks like you have GC logging turned on, but you haven’t specified an output file:

"worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=1%ID% -Xmx1200m”

I’ve run into this before:

"The root cause was that GC logging for workers had been turned on without specifying an output
file. As a result the GC logging went to standard output without being redirected to logback.
Eventually the buffer filled and the JVM would hang (and thus stop heartbeating) and be killed.
The amount of time the worker would last depended on memory pressure and allocated heap size
(obviously, in hindsight — the more GC, the more GC logging, the faster the buffer fills).

The symptoms were workers timing out and being killed for no apparent reason.”

I would try with GC logging turned off or redirected to a file.

- Taylor

On Feb 12, 2014, at 3:12 PM, Mark Hu <markhu101@gmail.com> wrote:

> Hi guys,
> 
> Just an update.
> 
> 1. After investigating the hang, it looked like the Spout was properly sending tuples
to a bolt, but the bolt would never acknowledge the incoming tuple. After switching from using
the Netty transport layer back to 0MQ, recovery and reliability guarantees were restored.
Please note that we're using jruby-1.7.3 on top of storm java - could this cause issues with
netty?
> 2. Can anyone confirm if the above worker restart behaviour is normal?
> 
> Thanks,
> Mark
> 

On Feb 6, 2014, at 10:20 PM, Mark Hu <markhu101@gmail.com> wrote:

> HISTORY:
> I have past experience with Storm 0.8.2 and encountered silent worker death during development.
After experimenting with various parallelism, worker, and max_spout configurations, we were
able to find one that worked well with high reliability despite this issue. 
> 
> The silent worker death basically involves a worker not responding and waiting 30 seconds
before starting to process work again. A full load test reveals no data loss here.
> 
> I have this topology running well for the past 3 months and the only errors were due
to the user.
> 
> PRESENT:
> I am now testing Storm 0.9.0.1 and it appears the silent worker death issue is still
present, but now the topology hangs. Here are the details:
> 
> Type and description for my test topology
> 1. One spout - makes web calls to a rest service to retrieve ids.
> 2. One bolt - matchRetrieveBolt - retrieves blobs for those ids.
> 3. 50 bolts - matchProcessBolt - does some processing on the blobs.
> 
> Storm performed swimmingly for about 63,000 blobs before it hung. After examining the
logs, it appears that one worker was processing both bolt type 2 and bolt type 3. For some
reason, a reset occurs and the bolts reinitialize but become unresponsive. The spout continues
to send work but times out and retries.
> 
> Here is the data for worker-6710.log:
> 
> 2014-02-07 01:31:26 MatchProcessBolt [INFO] [BOLT-MatchProcessBolt] bitfield already
allocated, ttl: -1
> 2014-02-07 01:31:26 MatchProcessBolt [INFO] [BOLT-MatchProcessBolt] All done. Acking
now
> 2014-02-07 01:31:26 MatchRetrieveBolt [INFO] [BOLT-1] process: first = 66300, last =
66350
> 2014-02-07 01:31:26 MatchRetrieveBolt [INFO] [BOLT-1] url: SOMEURL
> 
> As per the above, this worker is performing bolt1 and bolt2 work.
> Suddenly, a restart occurs as per the below:
> 
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.3.3-1073969,
built on 02/23/2011 22:27 GMT
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:host.name=dev-shared2-supervisor-i-e67015c7
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.version=1.6.0_27
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.vendor=Sun Microsystems
Inc.
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.class.path=/opt/storm/storm-0.9.0.1/storm-netty-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-console-logging-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-core-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/lib/jline-0.9.94.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5.jar:/opt/storm/storm-0.9.0.1/lib/minlog-1.2.jar:/opt/storm/storm-0.9.0.1/lib/curator-framework-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/snakeyaml-1.11.jar:/opt/storm/storm-0.9.0.1/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/storm-0.9.0.1/lib/clj-time-0.4.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-core-1.1.5.jar:/opt/storm/storm-0.9.0.1/lib/mockito-all-1.9.5.jar:/opt/storm/storm-0.9.0.1/lib/tools.logging-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/disruptor-2.10.1.jar:/opt/storm/storm-0.9.0.1/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5-20081211.jar:/opt/storm/storm-0.9.0.1/lib/carbonite-1.5.0.jar:/opt/storm/storm-0.9.0.1/lib/guava-13.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-fileupload-1.2.1.jar:/opt/storm/storm-0.9.0.1/lib/joda-time-2.0.jar:/opt/storm/storm-0.9.0.1/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.9.0.1/lib/netty-3.6.3.Final.jar:/opt/storm/storm-0.9.0.1/lib/clojure-complete-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/tools.cli-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/logback-classic-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/ring-servlet-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/asm-4.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-exec-1.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/commons-lang-2.5.jar:/opt/storm/storm-0.9.0.1/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.9.0.1/lib/clout-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/clj-stacktrace-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/reflectasm-1.07-shaded.jar:/opt/storm/storm-0.9.0.1/lib/junit-3.8.1.jar:/opt/storm/storm-0.9.0.1/lib/jzmq-2.1.0.jar:/opt/storm/storm-0.9.0.1/lib/jetty-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/kryo-2.17.jar:/opt/storm/storm-0.9.0.1/lib/ring-devel-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.9.0.1/lib/compojure-1.1.3.jar:/opt/storm/storm-0.9.0.1/lib/jetty-util-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/commons-io-1.4.jar:/opt/storm/storm-0.9.0.1/lib/slf4j-api-1.6.5.jar:/opt/storm/storm-0.9.0.1/lib/clojure-1.4.0.jar:/opt/storm/storm-0.9.0.1/lib/tools.nrepl-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/curator-client-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/httpcore-4.1.jar:/opt/storm/storm-0.9.0.1/lib/logback-core-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/commons-codec-1.4.jar:/opt/storm/storm-0.9.0.1/lib/json-simple-1.1.jar:/opt/storm/storm-0.9.0.1/lib/core.incubator-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/libthrift7-0.7.0-2.jar:/opt/storm/storm-0.9.0.1/lib/tools.macro-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/httpclient-4.1.1.jar:/opt/storm/storm-0.9.0.1/lib/objenesis-1.2.jar:/opt/storm/storm-0.9.0.1/lib/math.numeric-tower-0.0.1.jar:/opt/storm/storm-0.9.0.1/conf:/mnt/storm/supervisor/stormdist/match_parser-12-1391735866/stormjar.jar
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.io.tmpdir=/tmp
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.compiler=<NA>
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:os.name=Linux
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:os.arch=amd64
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:os.version=3.2.0-48-virtual
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:user.name=storm
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:user.home=/home/storm
> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:user.dir=/opt/storm/storm-0.9.0.1
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:zookeeper.version=3.3.3-1073969,
built on 02/23/2011 22:27 GMT
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:host.name=dev-shared2-supervisor-i-e67015c7
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.version=1.6.0_27
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.vendor=Sun
Microsystems Inc.
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.class.path=/opt/storm/storm-0.9.0.1/storm-netty-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-console-logging-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-core-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/lib/jline-0.9.94.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5.jar:/opt/storm/storm-0.9.0.1/lib/minlog-1.2.jar:/opt/storm/storm-0.9.0.1/lib/curator-framework-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/snakeyaml-1.11.jar:/opt/storm/storm-0.9.0.1/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/storm-0.9.0.1/lib/clj-time-0.4.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-core-1.1.5.jar:/opt/storm/storm-0.9.0.1/lib/mockito-all-1.9.5.jar:/opt/storm/storm-0.9.0.1/lib/tools.logging-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/disruptor-2.10.1.jar:/opt/storm/storm-0.9.0.1/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5-20081211.jar:/opt/storm/storm-0.9.0.1/lib/carbonite-1.5.0.jar:/opt/storm/storm-0.9.0.1/lib/guava-13.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-fileupload-1.2.1.jar:/opt/storm/storm-0.9.0.1/lib/joda-time-2.0.jar:/opt/storm/storm-0.9.0.1/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.9.0.1/lib/netty-3.6.3.Final.jar:/opt/storm/storm-0.9.0.1/lib/clojure-complete-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/tools.cli-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/logback-classic-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/ring-servlet-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/asm-4.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-exec-1.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/commons-lang-2.5.jar:/opt/storm/storm-0.9.0.1/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.9.0.1/lib/clout-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/clj-stacktrace-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/reflectasm-1.07-shaded.jar:/opt/storm/storm-0.9.0.1/lib/junit-3.8.1.jar:/opt/storm/storm-0.9.0.1/lib/jzmq-2.1.0.jar:/opt/storm/storm-0.9.0.1/lib/jetty-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/kryo-2.17.jar:/opt/storm/storm-0.9.0.1/lib/ring-devel-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.9.0.1/lib/compojure-1.1.3.jar:/opt/storm/storm-0.9.0.1/lib/jetty-util-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/commons-io-1.4.jar:/opt/storm/storm-0.9.0.1/lib/slf4j-api-1.6.5.jar:/opt/storm/storm-0.9.0.1/lib/clojure-1.4.0.jar:/opt/storm/storm-0.9.0.1/lib/tools.nrepl-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/curator-client-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/httpcore-4.1.jar:/opt/storm/storm-0.9.0.1/lib/logback-core-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/commons-codec-1.4.jar:/opt/storm/storm-0.9.0.1/lib/json-simple-1.1.jar:/opt/storm/storm-0.9.0.1/lib/core.incubator-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/libthrift7-0.7.0-2.jar:/opt/storm/storm-0.9.0.1/lib/tools.macro-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/httpclient-4.1.1.jar:/opt/storm/storm-0.9.0.1/lib/objenesis-1.2.jar:/opt/storm/storm-0.9.0.1/lib/math.numeric-tower-0.0.1.jar:/opt/storm/storm-0.9.0.1/conf:/mnt/storm/supervisor/stormdist/match_parser-12-1391735866/stormjar.jar
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.io.tmpdir=/tmp
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.compiler=<NA>
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.name=Linux
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.arch=amd64
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.version=3.2.0-48-virtual
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.name=storm
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.home=/home/storm
> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.dir=/opt/storm/storm-0.9.0.1
> 2014-02-07 01:32:01 b.s.d.worker [INFO] Launching worker for match_parser-12-1391735866
on f1b66213-7237-483e-a5db-e543bb36bcd9:6710 with id 80483ed6-2c51-4f86-9d06-9d2bed705748
and conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs"
nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization"
true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, "topology.skip.missing.kryo.registrations"
false, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout"
20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500, "nimbus.monitor.freq.secs"
10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
"topology.executor.send.buffer.size" 1024, "storm.local.dir" "/mnt/storm", "storm.messaging.netty.buffer_size"
5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true,
"nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads"
64, "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "10.0.5.50", "storm.messaging.netty.min_wait_ms"
100, "storm.zookeeper.port" 2181, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis"
30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers"
["10.0.6.131" "10.0.6.130" "10.0.6.55"], "transactional.zookeeper.root" "/transactional",
"topology.acker.executors" nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts"
nil, "drpc.queue.size" 128, "worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=1%ID% -Xmx1200m", "supervisor.heartbeat.frequency.secs"
5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs"
3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs"
3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 100, "topology.spout.wait.strategy"
"backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval"
1000, "topology.sleep.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator",
"supervisor.slots.ports" [6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712
6713 6714 6715 6716 6717 6718 6719 6720 6721], "topology.debug" false, "nimbus.task.launch.secs"
120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs"
10, "topology.workers" 1, "supervisor.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8091", "nimbus.thrift.port" 6627, "topology.stats.sample.rate"
0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer",
"topology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs"
30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory",
"drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchronization.timeout.secs"
60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport"
"backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms"
1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts"
"-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8090",
"storm.cluster.mode" "distributed", "topology.optimize" true, "topology.max.task.parallelism"
nil}
> 2014-02-07 01:32:01 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.6.131:2181,10.0.6.130:2181,10.0.6.55:2181
sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@25de152f
> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Opening socket connection to server /10.0.6.131:2181
> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Socket connection established to ip-10-0-6-131.ec2.internal/10.0.6.131:2181,
initiating session
> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Session establishment complete on server
ip-10-0-6-131.ec2.internal/10.0.6.131:2181, sessionid = 0xf343fa0ee4d600f0, negotiated timeout
= 20000
> 2014-02-07 01:32:01 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
> 2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Session: 0xf343fa0ee4d600f0 closed
> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] EventThread shut down
> 2014-02-07 01:32:01 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.6.131:2181,10.0.6.130:2181,10.0.6.55:2181/storm
sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@31930611
> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Opening socket connection to server /10.0.6.131:2181
> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Socket connection established to ip-10-0-6-131.ec2.internal/10.0.6.131:2181,
initiating session
> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Session establishment complete on server
ip-10-0-6-131.ec2.internal/10.0.6.131:2181, sessionid = 0xf343fa0ee4d600f1, negotiated timeout
= 20000
> 2014-02-07 01:32:01 b.s.m.TransportFactory [INFO] Storm peer transport plugin:backtype.storm.messaging.netty.Context
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor 2:[2 2]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks 2:[2 2]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor 2:[2 2]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor __acker:[102 102]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt 2:(2)
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks __acker:[102 102]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Timeouts disabled for executor __acker:[102
102]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt __acker:(102)
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor __acker:[102 102]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Prepared bolt __acker:(102)
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor 3:[52 52]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks 3:[52 52]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor 3:[52 52]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt 3:(52)
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor __system:[-1 -1]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks __system:[-1 -1]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor __system:[-1 -1]
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt __system:(-1)
> 2014-02-07 01:32:01 b.s.d.worker [INFO] Launching receive-thread for f1b66213-7237-483e-a5db-e543bb36bcd9:6710
> 2014-02-07 01:32:01 b.s.d.executor [INFO] Prepared bolt __system:(-1)
> 2014-02-07 01:32:02 b.s.d.worker [INFO] Worker has topology config {"storm.id" "match_parser-12-1391735866",
"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.builtin.metrics.bucket.size.secs"
60, "topology.fall.back.on.java.serialization" true, "topology.max.error.report.per.interval"
5, "zmq.linger.millis" 5000, "topology.skip.missing.kryo.registrations" false, "storm.messaging.netty.client_worker_threads"
1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true,
"topology.trident.batch.emit.interval.millis" 500, "nimbus.monitor.freq.secs" 10, "logviewer.childopts"
"-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "executor.send.buffer.size"
16384, "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/mnt/storm", "iwparser_socket"
"/tmp/iwparser2Sockets/iwparserClients.ipc", "storm.messaging.netty.buffer_size" 5242880,
"supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
600, "batch_size" 50, "nimbus.inbox.jar.expiration.secs" 3600, "aws_creds.aws_secret_access_key"
"wmGLQ4ItvdhniRFMJe3yDJvBwTJUTqbuLYAS5ZPU", "drpc.worker.threads" 64, "redis.processed.match.log.size"
100000, "redis.key.match.ok" "MATCH-OK", "topology.worker.shared.thread.pool.size" 4, "nimbus.host"
"10.0.5.50", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2181, "transactional.zookeeper.port"
nil, "topology.executor.receive.buffer.size" 1024, "aws_creds.aws_access_key_id" "AKIAJP4O6MTXON3MLV2Q",
"executor.receive.buffer.size" 16384, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
"/storm", "aws.s3.match_bucket_name" "prod-matches", "storm.zookeeper.retry.intervalceiling.millis"
30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers"
["10.0.6.131" "10.0.6.130" "10.0.6.55"], "transactional.zookeeper.root" "/transactional",
"topology.acker.executors" 50, "topology.kryo.decorators" (), "topology.name" "match_parser",
"topology.transfer.buffer.size" 1024, "topology.worker.childopts" "-Djruby.compat.version=RUBY1_9",
"drpc.queue.size" 128, "worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=1%ID% -Xmx1200m", "supervisor.heartbeat.frequency.secs"
5, "transfer.buffer.size" 32, "topology.error.throttle.interval.secs" 10, "throttle.spout"
1, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "drpc.childopts"
"-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks"
nil, "storm.messaging.netty.max_retries" 100, "topology.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy",
"receiver.buffer.size" 8, "debug.logging.redis" true, "topology.max.spout.pending" 1, "storm.zookeeper.retry.interval"
1000, "topology.sleep.spout.wait.strategy.time.ms" 1, "platform" "xbox", "nimbus.topology.validator"
"backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" [6700 6701 6702
6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720
6721], "iwparser_port" 12330, "redis.port" 6379, "topology.debug" false, "nimbus.task.launch.secs"
120, "nimbus.supervisor.timeout.secs" 60, "debug.logging.stdout" true, "topology.kryo.register"
nil, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "dw_retries.max_attempts"
6, "topology.workers" 50, "supervisor.childopts" "-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8091", "nimbus.thrift.port" 6627, "redis_prefix" "prod",
"topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
"backtype.storm.serialization.types.ListDelegateSerializer", "topology.disruptor.wait.strategy"
"com.lmax.disruptor.BlockingWaitStrategy", "http_proxy" "proxy", "nimbus.task.timeout.secs"
30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory",
"drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchronization.timeout.secs"
60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport"
"backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms"
1000, "debug.logging.kibana" false, "iwparser_host" "10.200.6.145", "drpc.request.timeout.secs"
600, "demonware.endpoint" "http://10.0.1.78:9000/matches/", "redis_simple_prefix" "prod",
"redis.host" "storm090-redis.uvvujo.0001.use1.cache.amazonaws.com", "dw_retries.wait_interval"
2, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8090",
"storm.cluster.mode" "distributed", "topology.optimize" true, "topology.max.task.parallelism"
nil, "statsd.endpoint" "statsd", "redis.log.size" 100000}
> 2014-02-07 01:32:02 b.s.d.worker [INFO] Worker 80483ed6-2c51-4f86-9d06-9d2bed705748 for
storm match_parser-12-1391735866 on f1b66213-7237-483e-a5db-e543bb36bcd9:6710 has finished
loading
> 2014-02-07 01:32:08 MatchRetrieveBolt [INFO] [BOLT-1] on_init
> 2014-02-07 01:32:08 MatchProcessBolt [INFO] [BOlT-MatchProcessBolt] on_init
> 2014-02-07 01:32:08 b.s.d.executor [INFO] Prepared bolt 2:(2)
> 2014-02-07 01:32:08 b.s.d.executor [INFO] Prepared bolt 3:(52)
> 
> The bolts have reinitialized. The Storm UI is showing that bolt1 is still being handled
by this specific worker. 
> 
> 
> The spout still sends tuples to matchRetrieveBolt but never gets a response. The failure
is caught and retries continue.
> 
> Can anyone comment or provide any thoughts on the above issue?
> 
> Thanks,
> Mark



Mime
View raw message