flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Jha <dkjhan...@gmail.com>
Subject Re: Remote TaskManager Connection Problem
Date Mon, 07 Mar 2016 18:37:10 GMT
Hi Stephan,
Thanks for the response. I was able to resolve the issue, I was using
localhost in jobmanager name instead of container name... There were few
more issues which I would like to mention
- I'm using S3 for storage/checkpoint in Flink HA mode, I realized that I
have to set fs.hdfs.hadoopconf in conf/flink-conf.yaml and add
core-site.xml in conf/ .. Since I'm deploying it on AWS I had to place
hadoop-aws.jar as well....


On Fri, Mar 4, 2016 at 1:22 AM, Stephan Ewen <sewen@apache.org> wrote:

> The  pull request https://github.com/apache/flink/pull/1758 should improve
> the TaskManager's network interface selection.
>
>
> On Fri, Mar 4, 2016 at 10:19 AM, Stephan Ewen <sewen@apache.org> wrote:
>
> > Hi!
> >
> > This registration phase means that the TaskManager tries to tell the
> > JobManager that it is available.
> > If that fails, there can be two reasons
> >
> >   1) Network communication not possible to the port
> >       1.1) JobManager IP really not reachable (not the case, as you
> > described)
> >       1.2) TaskManager selected a wrong network interface to work with
> >   2) JobManager not listening
> >
> >
> > To look into 1.2, can you check the TaskManager log at the beginning,
> > where it says what interface/hostname the TaskManager selected to use?
> >
> > Thanks,
> > Stephan
> >
> >
> >
> >
> >
> >
> > On Fri, Mar 4, 2016 at 2:48 AM, Deepak Jha <dkjhanitt@gmail.com> wrote:
> >
> >> Hi All,
> >> I've created 2 docker containers on my local machine, one running
> >> JM(192.168.99.104) and other running TM. I was expecting to see TM in
> the
> >> JM UI but it did not happen. On looking into the TM logs I see following
> >> lines
> >>
> >>
> >> 01:29:50,862 DEBUG org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Starting TaskManager process reaper
> >> 01:29:50,868 INFO  org.apache.flink.runtime.filecache.FileCache
> >>      - User file cache uses directory
> >> /tmp/flink-dist-cache-be63f351-2bce-48ef-bbc4-fb0f40fecd49
> >> 01:29:51,093 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Starting TaskManager actor at
> >> akka://flink/user/taskmanager#1222392284.
> >> 01:29:51,095 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - TaskManager data connection information: 140efeb188cc
> >> (dataPort=6122)
> >> 01:29:51,096 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - TaskManager has 1 task slot(s).
> >> 01:29:51,097 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Memory usage stats: [HEAP: 386/494/494 MB, NON HEAP: 30/31/-1 MB
> >> (used/committed/max)]
> >> 01:29:51,104 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 1, timeout: 500
> >> milliseconds)
> >> 01:29:51,633 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 2, timeout: 1000
> >> milliseconds)
> >> 01:29:52,652 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 3, timeout: 2000
> >> milliseconds)
> >> 01:29:54,672 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 4, timeout: 4000
> >> milliseconds)
> >> 01:29:58,693 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 5, timeout: 8000
> >> milliseconds)
> >> 01:30:06,702 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>      - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 6, timeout: 16000
> >> milliseconds)
> >>
> >>
> >> However, from TM i am able to reach JM on port 6123
> >> root@140efeb188cc:/# nc -v 192.168.99.104 6123
> >> Connection to 192.168.99.104 6123 port [tcp/*] succeeded!
> >>
> >>
> >> masters file on TM contains
> >> 192.168.99.104:8080
> >>
> >> Did anyone face this issue with remote JM/TM combination ?
> >>
> >> --
> >> Thanks,
> >> Deepak Jha
> >>
> >
> >
>



-- 
Thanks,
Deepak Jha

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message