storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hu <markhu...@gmail.com>
Subject re: Storm 0.9.0.1 - topology halts processing after a while
Date Fri, 07 Feb 2014 03:20:47 GMT
HISTORY:
I have past experience with Storm 0.8.2 and encountered silent worker death
during development. After experimenting with various parallelism, worker,
and max_spout configurations, we were able to find one that worked well
with high reliability despite this issue.

The silent worker death basically involves a worker not responding and
waiting 30 seconds before starting to process work again. A full load test
reveals no data loss here.

I have this topology running well for the past 3 months and the only errors
were due to the user.

PRESENT:
I am now testing Storm 0.9.0.1 and it appears the silent worker death issue
is still present, but now the topology hangs. Here are the details:

Type and description for my test topology
1. One spout - makes web calls to a rest service to retrieve ids.
2. One bolt - matchRetrieveBolt - retrieves blobs for those ids.
3. 50 bolts - matchProcessBolt - does some processing on the blobs.

Storm performed swimmingly for about 63,000 blobs before it hung. After
examining the logs, it appears that one worker was processing both bolt
type 2 and bolt type 3. For some reason, a reset occurs and the bolts
reinitialize but become unresponsive. The spout continues to send work but
times out and retries.

Here is the data for worker-6710.log:

2014-02-07 01:31:26 MatchProcessBolt [INFO] [BOLT-MatchProcessBolt]
bitfield already allocated, ttl: -1
2014-02-07 01:31:26 MatchProcessBolt [INFO] [BOLT-MatchProcessBolt]
All done. Acking now
2014-02-07 01:31:26 MatchRetrieveBolt [INFO] [BOLT-1] process: first =
66300, last = 66350
2014-02-07 01:31:26 MatchRetrieveBolt [INFO] [BOLT-1] url: SOMEURL


As per the above, this worker is performing bolt1 and bolt2 work.

Suddenly, a restart occurs as per the below:


2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27
GMT
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:host.name=dev-shared2-supervisor-i-e67015c7
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:java.version=1.6.0_27
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:java.vendor=Sun Microsystems Inc.
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:java.class.path=/opt/storm/storm-0.9.0.1/storm-netty-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-console-logging-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-core-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/lib/jline-0.9.94.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5.jar:/opt/storm/storm-0.9.0.1/lib/minlog-1.2.jar:/opt/storm/storm-0.9.0.1/lib/curator-framework-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/snakeyaml-1.11.jar:/opt/storm/storm-0.9.0.1/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/storm-0.9.0.1/lib/clj-time-0.4.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-core-1.1.5.jar:/opt/storm/storm-0.9.0.1/lib/mockito-all-1.9.5.jar:/opt/storm/storm-0.9.0.1/lib/tools.logging-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/disruptor-2.10.1.jar:/opt/storm/storm-0.9.0.1/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5-20081211.jar:/opt/storm/storm-0.9.0.1/lib/carbonite-1.5.0.jar:/opt/storm/storm-0.9.0.1/lib/guava-13.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-fileupload-1.2.1.jar:/opt/storm/storm-0.9.0.1/lib/joda-time-2.0.jar:/opt/storm/storm-0.9.0.1/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.9.0.1/lib/netty-3.6.3.Final.jar:/opt/storm/storm-0.9.0.1/lib/clojure-complete-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/tools.cli-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/logback-classic-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/ring-servlet-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/asm-4.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-exec-1.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/commons-lang-2.5.jar:/opt/storm/storm-0.9.0.1/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.9.0.1/lib/clout-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/clj-stacktrace-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/reflectasm-1.07-shaded.jar:/opt/storm/storm-0.9.0.1/lib/junit-3.8.1.jar:/opt/storm/storm-0.9.0.1/lib/jzmq-2.1.0.jar:/opt/storm/storm-0.9.0.1/lib/jetty-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/kryo-2.17.jar:/opt/storm/storm-0.9.0.1/lib/ring-devel-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.9.0.1/lib/compojure-1.1.3.jar:/opt/storm/storm-0.9.0.1/lib/jetty-util-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/commons-io-1.4.jar:/opt/storm/storm-0.9.0.1/lib/slf4j-api-1.6.5.jar:/opt/storm/storm-0.9.0.1/lib/clojure-1.4.0.jar:/opt/storm/storm-0.9.0.1/lib/tools.nrepl-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/curator-client-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/httpcore-4.1.jar:/opt/storm/storm-0.9.0.1/lib/logback-core-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/commons-codec-1.4.jar:/opt/storm/storm-0.9.0.1/lib/json-simple-1.1.jar:/opt/storm/storm-0.9.0.1/lib/core.incubator-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/libthrift7-0.7.0-2.jar:/opt/storm/storm-0.9.0.1/lib/tools.macro-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/httpclient-4.1.1.jar:/opt/storm/storm-0.9.0.1/lib/objenesis-1.2.jar:/opt/storm/storm-0.9.0.1/lib/math.numeric-tower-0.0.1.jar:/opt/storm/storm-0.9.0.1/conf:/mnt/storm/supervisor/stormdist/match_parser-12-1391735866/stormjar.jar
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:java.io.tmpdir=/tmp
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:java.compiler=<NA>
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:os.name=Linux
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:os.arch=amd64
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:os.version=3.2.0-48-virtual
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client environment:user.name=storm
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:user.home=/home/storm
2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client
environment:user.dir=/opt/storm/storm-0.9.0.1
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27
GMT
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:host.name=dev-shared2-supervisor-i-e67015c7
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.version=1.6.0_27
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.vendor=Sun Microsystems Inc.
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.class.path=/opt/storm/storm-0.9.0.1/storm-netty-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-console-logging-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/storm-core-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/lib/jline-0.9.94.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5.jar:/opt/storm/storm-0.9.0.1/lib/minlog-1.2.jar:/opt/storm/storm-0.9.0.1/lib/curator-framework-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/snakeyaml-1.11.jar:/opt/storm/storm-0.9.0.1/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/storm-0.9.0.1/lib/clj-time-0.4.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-core-1.1.5.jar:/opt/storm/storm-0.9.0.1/lib/mockito-all-1.9.5.jar:/opt/storm/storm-0.9.0.1/lib/tools.logging-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/disruptor-2.10.1.jar:/opt/storm/storm-0.9.0.1/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5-20081211.jar:/opt/storm/storm-0.9.0.1/lib/carbonite-1.5.0.jar:/opt/storm/storm-0.9.0.1/lib/guava-13.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-fileupload-1.2.1.jar:/opt/storm/storm-0.9.0.1/lib/joda-time-2.0.jar:/opt/storm/storm-0.9.0.1/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.9.0.1/lib/netty-3.6.3.Final.jar:/opt/storm/storm-0.9.0.1/lib/clojure-complete-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/tools.cli-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/logback-classic-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/ring-servlet-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/asm-4.0.jar:/opt/storm/storm-0.9.0.1/lib/commons-exec-1.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/commons-lang-2.5.jar:/opt/storm/storm-0.9.0.1/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.9.0.1/lib/clout-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/clj-stacktrace-0.2.2.jar:/opt/storm/storm-0.9.0.1/lib/reflectasm-1.07-shaded.jar:/opt/storm/storm-0.9.0.1/lib/junit-3.8.1.jar:/opt/storm/storm-0.9.0.1/lib/jzmq-2.1.0.jar:/opt/storm/storm-0.9.0.1/lib/jetty-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/kryo-2.17.jar:/opt/storm/storm-0.9.0.1/lib/ring-devel-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.9.0.1/lib/compojure-1.1.3.jar:/opt/storm/storm-0.9.0.1/lib/jetty-util-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/commons-io-1.4.jar:/opt/storm/storm-0.9.0.1/lib/slf4j-api-1.6.5.jar:/opt/storm/storm-0.9.0.1/lib/clojure-1.4.0.jar:/opt/storm/storm-0.9.0.1/lib/tools.nrepl-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/curator-client-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/httpcore-4.1.jar:/opt/storm/storm-0.9.0.1/lib/logback-core-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/commons-codec-1.4.jar:/opt/storm/storm-0.9.0.1/lib/json-simple-1.1.jar:/opt/storm/storm-0.9.0.1/lib/core.incubator-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/libthrift7-0.7.0-2.jar:/opt/storm/storm-0.9.0.1/lib/tools.macro-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/httpclient-4.1.1.jar:/opt/storm/storm-0.9.0.1/lib/objenesis-1.2.jar:/opt/storm/storm-0.9.0.1/lib/math.numeric-tower-0.0.1.jar:/opt/storm/storm-0.9.0.1/conf:/mnt/storm/supervisor/stormdist/match_parser-12-1391735866/stormjar.jar
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.io.tmpdir=/tmp
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.compiler=<NA>
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.name=Linux
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.arch=amd64
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.version=3.2.0-48-virtual
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.name=storm
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.home=/home/storm
2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.dir=/opt/storm/storm-0.9.0.1
2014-02-07 01:32:01 b.s.d.worker [INFO] Launching worker for
match_parser-12-1391735866 on
f1b66213-7237-483e-a5db-e543bb36bcd9:6710 with id
80483ed6-2c51-4f86-9d06-9d2bed705748 and conf {"dev.zookeeper.path"
"/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
"topology.builtin.metrics.bucket.size.secs" 60,
"topology.fall.back.on.java.serialization" true,
"topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
"topology.skip.missing.kryo.registrations" false,
"storm.messaging.netty.client_worker_threads" 1, "ui.childopts"
"-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign"
true, "topology.trident.batch.emit.interval.millis" 500,
"nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
"java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
"topology.executor.send.buffer.size" 1024, "storm.local.dir"
"/mnt/storm", "storm.messaging.netty.buffer_size" 5242880,
"supervisor.worker.start.timeout.secs" 120,
"topology.enable.message.timeouts" true,
"nimbus.cleanup.inbox.freq.secs" 600,
"nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
"topology.worker.shared.thread.pool.size" 4, "nimbus.host"
"10.0.5.50", "storm.messaging.netty.min_wait_ms" 100,
"storm.zookeeper.port" 2181, "transactional.zookeeper.port" nil,
"topology.executor.receive.buffer.size" 1024,
"transactional.zookeeper.servers" nil, "storm.zookeeper.root"
"/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
"supervisor.enable" true,
"storm.messaging.netty.server_worker_threads" 1,
"storm.zookeeper.servers" ["10.0.6.131" "10.0.6.130" "10.0.6.55"],
"transactional.zookeeper.root" "/transactional",
"topology.acker.executors" nil, "topology.transfer.buffer.size" 1024,
"topology.worker.childopts" nil, "drpc.queue.size" 128,
"worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=1%ID% -Xmx1200m",
"supervisor.heartbeat.frequency.secs" 5,
"topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port"
3772, "supervisor.monitor.frequency.secs" 3, "drpc.childopts"
"-Xmx768m", "topology.receiver.buffer.size" 8,
"task.heartbeat.frequency.secs" 3, "topology.tasks" nil,
"storm.messaging.netty.max_retries" 100,
"topology.spout.wait.strategy"
"backtype.storm.spout.SleepSpoutWaitStrategy",
"topology.max.spout.pending" nil, "storm.zookeeper.retry.interval"
1000, "topology.sleep.spout.wait.strategy.time.ms" 1,
"nimbus.topology.validator"
"backtype.storm.nimbus.DefaultTopologyValidator",
"supervisor.slots.ports" [6700 6701 6702 6703 6704 6705 6706 6707 6708
6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721],
"topology.debug" false, "nimbus.task.launch.secs" 120,
"nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
30, "task.refresh.poll.secs" 10, "topology.workers" 1,
"supervisor.childopts" "-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8091", "nimbus.thrift.port" 6627,
"topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs"
1, "topology.tuple.serializer"
"backtype.storm.serialization.types.ListDelegateSerializer",
"topology.disruptor.wait.strategy"
"com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs"
30, "storm.zookeeper.connection.timeout" 15000,
"topology.kryo.factory"
"backtype.storm.serialization.DefaultKryoFactory",
"drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1,
"storm.zookeeper.retry.times" 5, "storm.thrift.transport"
"backtype.storm.security.auth.SimpleTransportPlugin",
"topology.state.synchronization.timeout.secs" 60,
"supervisor.worker.timeout.secs" 30,
"nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport"
"backtype.storm.messaging.netty.Context", "logviewer.appender.name"
"A1", "storm.messaging.netty.max_wait_ms" 1000,
"drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false,
"ui.port" 8080, "nimbus.childopts" "-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8090", "storm.cluster.mode"
"distributed", "topology.optimize" true,
"topology.max.task.parallelism" nil}
2014-02-07 01:32:01 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=10.0.6.131:2181,10.0.6.130:2181,10.0.6.55:2181
sessionTimeout=20000
watcher=com.netflix.curator.ConnectionState@25de152f
2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Opening socket connection
to server /10.0.6.131:2181
2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Socket connection
established to ip-10-0-6-131.ec2.internal/10.0.6.131:2181, initiating
session
2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Session establishment
complete on server ip-10-0-6-131.ec2.internal/10.0.6.131:2181,
sessionid = 0xf343fa0ee4d600f0, negotiated timeout = 20000
2014-02-07 01:32:01 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Session: 0xf343fa0ee4d600f0 closed
2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] EventThread shut down
2014-02-07 01:32:01 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=10.0.6.131:2181,10.0.6.130:2181,10.0.6.55:2181/storm
sessionTimeout=20000
watcher=com.netflix.curator.ConnectionState@31930611
2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Opening socket connection
to server /10.0.6.131:2181
2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Socket connection
established to ip-10-0-6-131.ec2.internal/10.0.6.131:2181, initiating
session
2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Session establishment
complete on server ip-10-0-6-131.ec2.internal/10.0.6.131:2181,
sessionid = 0xf343fa0ee4d600f1, negotiated timeout = 20000
2014-02-07 01:32:01 b.s.m.TransportFactory [INFO] Storm peer transport
plugin:backtype.storm.messaging.netty.Context
2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor 2:[2 2]
2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks 2:[2 2]
2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor 2:[2 2]
2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor __acker:[102 102]
2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt 2:(2)
2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks
__acker:[102 102]
2014-02-07 01:32:01 b.s.d.executor [INFO] Timeouts disabled for
executor __acker:[102 102]
2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt __acker:(102)
2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor
__acker:[102 102]
2014-02-07 01:32:01 b.s.d.executor [INFO] Prepared bolt __acker:(102)
2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor 3:[52 52]
2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks 3:[52 52]
2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor 3:[52 52]
2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt 3:(52)
2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor __system:[-1 -1]
2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks __system:[-1 -1]
2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading executor
__system:[-1 -1]
2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt __system:(-1)
2014-02-07 01:32:01 b.s.d.worker [INFO] Launching receive-thread for
f1b66213-7237-483e-a5db-e543bb36bcd9:6710
2014-02-07 01:32:01 b.s.d.executor [INFO] Prepared bolt __system:(-1)
2014-02-07 01:32:02 b.s.d.worker [INFO] Worker has topology config
{"storm.id" "match_parser-12-1391735866", "dev.zookeeper.path"
"/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
"topology.builtin.metrics.bucket.size.secs" 60,
"topology.fall.back.on.java.serialization" true,
"topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
"topology.skip.missing.kryo.registrations" false,
"storm.messaging.netty.client_worker_threads" 1, "ui.childopts"
"-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign"
true, "topology.trident.batch.emit.interval.millis" 500,
"nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
"java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
"executor.send.buffer.size" 16384,
"topology.executor.send.buffer.size" 1024, "storm.local.dir"
"/mnt/storm", "iwparser_socket"
"/tmp/iwparser2Sockets/iwparserClients.ipc",
"storm.messaging.netty.buffer_size" 5242880,
"supervisor.worker.start.timeout.secs" 120,
"topology.enable.message.timeouts" true,
"nimbus.cleanup.inbox.freq.secs" 600, "batch_size" 50,
"nimbus.inbox.jar.expiration.secs" 3600,
"aws_creds.aws_secret_access_key"
"wmGLQ4ItvdhniRFMJe3yDJvBwTJUTqbuLYAS5ZPU", "drpc.worker.threads" 64,
"redis.processed.match.log.size" 100000, "redis.key.match.ok"
"MATCH-OK", "topology.worker.shared.thread.pool.size" 4, "nimbus.host"
"10.0.5.50", "storm.messaging.netty.min_wait_ms" 100,
"storm.zookeeper.port" 2181, "transactional.zookeeper.port" nil,
"topology.executor.receive.buffer.size" 1024,
"aws_creds.aws_access_key_id" "AKIAJP4O6MTXON3MLV2Q",
"executor.receive.buffer.size" 16384,
"transactional.zookeeper.servers" nil, "storm.zookeeper.root"
"/storm", "aws.s3.match_bucket_name" "prod-matches",
"storm.zookeeper.retry.intervalceiling.millis" 30000,
"supervisor.enable" true,
"storm.messaging.netty.server_worker_threads" 1,
"storm.zookeeper.servers" ["10.0.6.131" "10.0.6.130" "10.0.6.55"],
"transactional.zookeeper.root" "/transactional",
"topology.acker.executors" 50, "topology.kryo.decorators" (),
"topology.name" "match_parser", "topology.transfer.buffer.size" 1024,
"topology.worker.childopts" "-Djruby.compat.version=RUBY1_9",
"drpc.queue.size" 128, "worker.childopts" "-verbose:gc
-XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=1%ID% -Xmx1200m",
"supervisor.heartbeat.frequency.secs" 5, "transfer.buffer.size" 32,
"topology.error.throttle.interval.secs" 10, "throttle.spout" 1,
"zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3,
"drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8,
"task.heartbeat.frequency.secs" 3, "topology.tasks" nil,
"storm.messaging.netty.max_retries" 100,
"topology.spout.wait.strategy"
"backtype.storm.spout.SleepSpoutWaitStrategy", "receiver.buffer.size"
8, "debug.logging.redis" true, "topology.max.spout.pending" 1,
"storm.zookeeper.retry.interval" 1000,
"topology.sleep.spout.wait.strategy.time.ms" 1, "platform" "xbox",
"nimbus.topology.validator"
"backtype.storm.nimbus.DefaultTopologyValidator",
"supervisor.slots.ports" [6700 6701 6702 6703 6704 6705 6706 6707 6708
6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721],
"iwparser_port" 12330, "redis.port" 6379, "topology.debug" false,
"nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60,
"debug.logging.stdout" true, "topology.kryo.register" nil,
"topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10,
"dw_retries.max_attempts" 6, "topology.workers" 50,
"supervisor.childopts" "-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8091", "nimbus.thrift.port" 6627,
"redis_prefix" "prod", "topology.stats.sample.rate" 0.05,
"worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
"backtype.storm.serialization.types.ListDelegateSerializer",
"topology.disruptor.wait.strategy"
"com.lmax.disruptor.BlockingWaitStrategy", "http_proxy" "proxy",
"nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout"
15000, "topology.kryo.factory"
"backtype.storm.serialization.DefaultKryoFactory",
"drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1,
"storm.zookeeper.retry.times" 5, "storm.thrift.transport"
"backtype.storm.security.auth.SimpleTransportPlugin",
"topology.state.synchronization.timeout.secs" 60,
"supervisor.worker.timeout.secs" 30,
"nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport"
"backtype.storm.messaging.netty.Context", "logviewer.appender.name"
"A1", "storm.messaging.netty.max_wait_ms" 1000, "debug.logging.kibana"
false, "iwparser_host" "10.200.6.145", "drpc.request.timeout.secs"
600, "demonware.endpoint" "http://10.0.1.78:9000/matches/",
"redis_simple_prefix" "prod", "redis.host"
"storm090-redis.uvvujo.0001.use1.cache.amazonaws.com",
"dw_retries.wait_interval" 2, "storm.local.mode.zmq" false, "ui.port"
8080, "nimbus.childopts" "-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8090", "storm.cluster.mode"
"distributed", "topology.optimize" true,
"topology.max.task.parallelism" nil, "statsd.endpoint" "statsd",
"redis.log.size" 100000}
2014-02-07 01:32:02 b.s.d.worker [INFO] Worker
80483ed6-2c51-4f86-9d06-9d2bed705748 for storm
match_parser-12-1391735866 on
f1b66213-7237-483e-a5db-e543bb36bcd9:6710 has finished loading
2014-02-07 01:32:08 MatchRetrieveBolt [INFO] [BOLT-1] on_init
2014-02-07 01:32:08 MatchProcessBolt [INFO] [BOlT-MatchProcessBolt] on_init
2014-02-07 01:32:08 b.s.d.executor [INFO] Prepared bolt 2:(2)
2014-02-07 01:32:08 b.s.d.executor [INFO] Prepared bolt 3:(52)


The bolts have reinitialized. The Storm UI is showing that bolt1 is
still being handled by this specific worker.


The spout still sends tuples to matchRetrieveBolt but never gets a
response. The failure is caught and retries continue.


Can anyone comment or provide any thoughts on the above issue?

Thanks,
Mark

Mime
View raw message