flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Flink HA on AWS: Network related issue
Date Fri, 09 Sep 2016 07:34:34 GMT
Hi Deepak,

could you check the logs whether the JobManager has been quarantined and
thus, cannot be connected to anymore? The logs should at least contain a
hint why the TaskManager lost the connection initially.

Cheers,
Till

On Thu, Sep 8, 2016 at 7:08 PM, Deepak Jha <dkjhanitt@gmail.com> wrote:

> Hi,
> I've setup Flink HA on AWS ( 3 Taskmanagers and 2 Jobmanagers each are on
> EC2 m4.large instance with checkpoint enabled on S3 ). My topology works
> fine, but after few hours I do see that Taskmanagers gets detached with
> Jobmanager. I tried to reach Jobmanager using telnet at the same time and
> it worked but Taskmanager does not succeed in connecting again. It attaches
> only after I restart it. I tried following settings but still the problem
> persists.
>
> akka.ask.timeout: 20 s
> akka.lookup.timeout: 20 s
> akka.watch.heartbeat.interval: 20 s
>
> Please find attached snapshot on one of the Taskmanager. Is there any
> setting that I need to do ?
>
> --
> Thanks,
> Deepak Jha
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message