flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: flink on my cluster gets stuck
Date Fri, 31 Oct 2014 05:41:00 GMT
Were you able to increase the number of file handles in your cluster?

I think the TaskManager is not reporting any heartbeats because it is
basically crashed once the "Too many open files" exception occured.

On Tue, Oct 21, 2014 at 3:56 AM, Attila Bernáth <bernath.athos@gmail.com>
wrote:

> Dear Ufuk,
>
> ulimit -n
> says
> 8192
>
> It seems that some of the task managers do not report a heartbeat
> (this is what I find in the job managers log), and the job manager
> fails to cancel the job.
>
> Attila
>
>
> 2014-10-21 12:05 GMT+02:00 Ufuk Celebi <uce@apache.org>:
> > Hey Attila,
> >
> > this means that your system is running out of file handles. Can you
> execute "ulimit -n" on your machines and report the value back? You will
> have to increase that value.
> >
> > We actually multiplex multiple logical channels over the same TCP
> connection in order to reduce the number of concurrently open files
> handles. The problem, which leads to "too many open files" is that channels
> are not closed. Let me look into that and get back to you.
> >
> > – Ufuk
> >
> > On 21 Oct 2014, at 11:25, Attila Bernáth <bernath.athos@gmail.com>
> wrote:
> >
> >> Dear Developers,
> >>
> >> I run some experiment on my cluster. I send the same job a couple of
> >> times, and it is finished on the first 5-6 occasions, but the next one
> >> fails and it gets stuck (the web dashboard stops moving on).
> >>
> >> I use flink 0.7, compiled from source.
> >>
> >> In the log file of one of my task managers I find the following
> >> (similar message is written in every second, I only copy the last 2):
> >>
> >> 10:58:21,540 WARN  io.netty.channel.DefaultChannelPipeline
> >>          - An exceptionCaught() event was fired, and it reached at
> >> the tail of the pipeline. It usually means the last handler in the
> >> pipeline did not handle the exception.
> >> java.io.IOException: Too many open files
> >>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >>        at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
> >>        at
> io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135)
> >>        at
> io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:68)
> >>        at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >>        at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >>        at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >>        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >>        at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> >>        at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> >>        at java.lang.Thread.run(Thread.java:745)
> >> 10:58:22,541 WARN  io.netty.channel.DefaultChannelPipeline
> >>          - An exceptionCaught() event was fired, and it reached at
> >> the tail of the pipeline. It usually means the last handler in the
> >> pipeline did not handle the exception.
> >> java.io.IOException: Too many open files
> >>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >>        at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
> >>        at
> io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135)
> >>        at
> io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:68)
> >>        at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >>        at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >>        at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >>        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >>        at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> >>        at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> >>        at java.lang.Thread.run(Thread.java:745)
> >>
> >> Any ideas what this can be?
> >>
> >> Attila
> >
>

Mime
View raw message