flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Flink on YARN: Stuck on "Trying to register at JobManager"
Date Sat, 06 Feb 2016 15:22:17 GMT
Yeah, sounds a lot like the client cannot connect to the JobManager port.

The ports to communicate with HDFS and the YARN resource manager may be
whitelisted r forwarded, so you can submit the YARN session, but then not
connect to the JobManager afterwards.



On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phameete@gmail.com> wrote:

> Hi Max!
>
> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine,
> all in the JobManager Web UI looks good.
>
> It seems like the JobManager initiates the connection with my VM and
> cannot reach it. It could be that this is similar to the problem here:
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>
> I probably have to make some changes to the networking configuration of my
> VM so it can be reached by the JobManager despite using a different port
> each time.
>
> - Pieter
>
> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mxm@apache.org>:
>
>> Hi Pieter,
>>
>> Which version of Flink are you using? It appears you've created a
>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>
>> Cheers,
>> Max
>>
>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phameete@gmail.com>
>> wrote:
>> > Hi Robert,
>> >
>> > unfortunately there are no signs of what is going wrong in the logs. The
>> > last log messages are about succesful registration of the TaskManagers.
>> >
>> > I'm also fairly sure it must be something in my VM that is causing this,
>> > because when I start the yarn-session from a login node that is on the
>> same
>> > network as the hadoop cluster there are no problems registering with the
>> > JobManager. I did also notice the following message in the local
>> console:
>> >
>> > 12:30:27,173 WARN  Remoting
>> > - Tried to associate with unreachable remote address
>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for 5000
>> ms,
>> > all messages to this address will be delivered to dead letters. Reason:
>> > connection timed out: /145.100.41.13:41539
>> >
>> > I can ping the JobManager fine from with VM. Could there be some
>> invalid or
>> > missing configuration on my side?
>> >
>> > Cheers,
>> >
>> > Pieter
>> >
>> >
>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetzger@apache.org>:
>> >>
>> >> Hi,
>> >>
>> >> did you check the logs of the JobManager itself? Maybe it'll tell us
>> >> already whats going on.
>> >>
>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <phameete@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi Guys!
>> >>>
>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>> starting
>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until
>> after
>> >>> the JobManager web UI is started:
>> >>>
>> >>> JobManager web interface address
>> >>>
>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>> >>> Waiting until all TaskManagers have connected
>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>> >>> - Notification about new leader address
>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>> ID null.
>> >>> No status updates from the YARN cluster received so far. Waiting ...
>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>> >>> - Received address of new leader
>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>> ID null.
>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>> >>> - Disconnect from JobManager null.
>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>> >>> - Trying to register at JobManager
>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>> >>> No status updates from the YARN cluster received so far. Waiting ...
>> >>> No status updates from the YARN cluster received so far. Waiting ...
>> >>>
>> >>> It then hangs on these last steps (trying to register, no status
>> >>> updates..)
>> >>>
>> >>> Im sure there must be a problem on my side that is causing me not to
>> be
>> >>> able to register at the JobManager. What could cause such connection
>> >>> problems?
>> >>>
>> >>> Any tips are very welcome :-)
>> >>>
>> >>> Cheers and have a good weekend!
>> >>>
>> >>> - Pieter
>> >>>
>> >>>
>> >>
>> >
>>
>
>

Mime
View raw message