storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Montalenti <and...@parsely.com>
Subject Re: Netty transport errors.
Date Wed, 23 Jul 2014 17:43:17 GMT
Tomas,

You don't happen to be running Ubuntu 14.04 on Xen kernel, do you? Eg on
Amazon EC2.

I discovered an issue where running Storm across many workers on that OS
led to me hitting an annoying network driver bug that would cause timeouts
and topology freezes like you are seeing. Check dmesg for odd messages from
your network stack. Just a guess.

(copied from my reply to another similar thread)

-AM
On Jul 23, 2014 10:07 AM, "Tomas Mazukna" <tomas.mazukna@gmail.com> wrote:

> I am really puzzled why processing stopped in the topology.
> Looks like the acking threads all stopped communicating. Only hint I saw
> was this netty exception:
> Any hints how to prevent this from happening again?
>
> 2014-07-23 08:56:03 b.s.m.n.Client [INFO] Closing Netty Client
> Netty-Client-ndhhadappp3.tsh.mis.mckesson.com/10.48.132.224:9703
>
> 2014-07-23 08:56:03 b.s.m.n.Client [INFO] Waiting for pending batchs to be
> sent with
> Netty-Client-ndhhadappp3.tsh.mis.mckesson.com/10.48.132.224:9703...,
> timeout: 600000ms, pendings: 0
>
> 2014-07-23 08:56:03 b.s.m.n.Client [INFO] Closing Netty Client
> Netty-Client-ndhhadappp3.tsh.mis.mckesson.com/10.48.132.224:9700
>
> 2014-07-23 08:56:03 b.s.m.n.Client [INFO] Waiting for pending batchs to be
> sent with
> Netty-Client-ndhhadappp3.tsh.mis.mckesson.com/10.48.132.224:9700...,
> timeout: 600000ms, pendings: 0
>
> 2014-07-23 08:56:03 b.s.util [ERROR] Async loop died!
>
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
>
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60]
>
> Caused by: java.lang.RuntimeException: Client is being closed, and does
> not take requests any more
>
>         at backtype.storm.messaging.netty.Client.send(Client.java:194)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at
> backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927$fn__5928.invoke(worker.clj:322)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927.invoke(worker.clj:320)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at
> backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>
>         ... 6 common frames omitted
>
> 2014-07-23 08:56:03 b.s.util [INFO] Halting process: ("Async loop died!")
>
>
> Configuration:
>
> worker.childopts: "-Xmx2048m -Xss256k -XX:MaxPermSize=256m
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:NewSize=128m -XX:CMSInitiatingOccupancyFraction=70
> -XX:-CMSConcurrentMTEnabled -Djava.net.preferIPv4Stack=true"
>
> supervisor.childopts: "-Xmx256m -Djava.net.preferIPv4Stack=true"
>
> nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
>
> ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
>
> nimbus.thrift.threads: 256
>
>
> storm.messaging.transport: "backtype.storm.messaging.netty.Context"
>
> storm.messaging.netty.server_worker_threads: 1
>
> storm.messaging.netty.client_worker_threads: 1
>
> storm.messaging.netty.buffer_size: 5242880
>
> storm.messaging.netty.max_retries: 100
>
> storm.messaging.netty.max_wait_ms: 1000
>
> storm.messaging.netty.min_wait_ms: 100
>
>
> Thanks,
> --
> Tomas Mazukna
> 678-557-3834
>

Mime
View raw message