flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Bernáth <bernath.at...@gmail.com>
Subject Re: flink on my cluster gets stuck
Date Tue, 21 Oct 2014 10:56:54 GMT
Dear Ufuk,

ulimit -n
says
8192

It seems that some of the task managers do not report a heartbeat
(this is what I find in the job managers log), and the job manager
fails to cancel the job.

Attila


2014-10-21 12:05 GMT+02:00 Ufuk Celebi <uce@apache.org>:
> Hey Attila,
>
> this means that your system is running out of file handles. Can you execute "ulimit -n"
on your machines and report the value back? You will have to increase that value.
>
> We actually multiplex multiple logical channels over the same TCP connection in order
to reduce the number of concurrently open files handles. The problem, which leads to "too
many open files" is that channels are not closed. Let me look into that and get back to you.
>
> – Ufuk
>
> On 21 Oct 2014, at 11:25, Attila Bernáth <bernath.athos@gmail.com> wrote:
>
>> Dear Developers,
>>
>> I run some experiment on my cluster. I send the same job a couple of
>> times, and it is finished on the first 5-6 occasions, but the next one
>> fails and it gets stuck (the web dashboard stops moving on).
>>
>> I use flink 0.7, compiled from source.
>>
>> In the log file of one of my task managers I find the following
>> (similar message is written in every second, I only copy the last 2):
>>
>> 10:58:21,540 WARN  io.netty.channel.DefaultChannelPipeline
>>          - An exceptionCaught() event was fired, and it reached at
>> the tail of the pipeline. It usually means the last handler in the
>> pipeline did not handle the exception.
>> java.io.IOException: Too many open files
>>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
>>        at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135)
>>        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:68)
>>        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>>        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>>        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>>        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>>        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>>        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>>        at java.lang.Thread.run(Thread.java:745)
>> 10:58:22,541 WARN  io.netty.channel.DefaultChannelPipeline
>>          - An exceptionCaught() event was fired, and it reached at
>> the tail of the pipeline. It usually means the last handler in the
>> pipeline did not handle the exception.
>> java.io.IOException: Too many open files
>>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
>>        at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135)
>>        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:68)
>>        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>>        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>>        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>>        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>>        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>>        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>>        at java.lang.Thread.run(Thread.java:745)
>>
>> Any ideas what this can be?
>>
>> Attila
>

Mime
View raw message