flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pieter Hameete <phame...@gmail.com>
Subject Re: Flink on YARN: Stuck on "Trying to register at JobManager"
Date Sun, 07 Feb 2016 18:25:00 GMT
Hi Stephan,

surely it seems this way! I must not be the first with this issue though?
I'll have to contact the cluster admins to find a solution together. What
would be a way of make the JobManagers accessible from outside the network,
because the IP and port number changes every time.

Alternatively, I can ask for ssh access to a node within the network. that
will surely work but it's not my preferred solution.

- Pieter

2016-02-06 16:22 GMT+01:00 Stephan Ewen <sewen@apache.org>:

> Yeah, sounds a lot like the client cannot connect to the JobManager port.
>
> The ports to communicate with HDFS and the YARN resource manager may be
> whitelisted r forwarded, so you can submit the YARN session, but then not
> connect to the JobManager afterwards.
>
>
>
> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phameete@gmail.com> wrote:
>
>> Hi Max!
>>
>> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine,
>> all in the JobManager Web UI looks good.
>>
>> It seems like the JobManager initiates the connection with my VM and
>> cannot reach it. It could be that this is similar to the problem here:
>>
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>
>> I probably have to make some changes to the networking configuration of
>> my VM so it can be reached by the JobManager despite using a different port
>> each time.
>>
>> - Pieter
>>
>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mxm@apache.org>:
>>
>>> Hi Pieter,
>>>
>>> Which version of Flink are you using? It appears you've created a
>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phameete@gmail.com>
>>> wrote:
>>> > Hi Robert,
>>> >
>>> > unfortunately there are no signs of what is going wrong in the logs.
>>> The
>>> > last log messages are about succesful registration of the TaskManagers.
>>> >
>>> > I'm also fairly sure it must be something in my VM that is causing
>>> this,
>>> > because when I start the yarn-session from a login node that is on the
>>> same
>>> > network as the hadoop cluster there are no problems registering with
>>> the
>>> > JobManager. I did also notice the following message in the local
>>> console:
>>> >
>>> > 12:30:27,173 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters. Reason:
>>> > connection timed out: /145.100.41.13:41539
>>> >
>>> > I can ping the JobManager fine from with VM. Could there be some
>>> invalid or
>>> > missing configuration on my side?
>>> >
>>> > Cheers,
>>> >
>>> > Pieter
>>> >
>>> >
>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetzger@apache.org>:
>>> >>
>>> >> Hi,
>>> >>
>>> >> did you check the logs of the JobManager itself? Maybe it'll tell us
>>> >> already whats going on.
>>> >>
>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <phameete@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hi Guys!
>>> >>>
>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>> starting
>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until
>>> after
>>> >>> the JobManager web UI is started:
>>> >>>
>>> >>> JobManager web interface address
>>> >>>
>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>> >>> Waiting until all TaskManagers have connected
>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>> >>> - Notification about new leader address
>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>>> ID null.
>>> >>> No status updates from the YARN cluster received so far. Waiting
...
>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>> >>> - Received address of new leader
>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>>> ID null.
>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>> >>> - Disconnect from JobManager null.
>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>> >>> - Trying to register at JobManager
>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>> >>> No status updates from the YARN cluster received so far. Waiting
...
>>> >>> No status updates from the YARN cluster received so far. Waiting
...
>>> >>>
>>> >>> It then hangs on these last steps (trying to register, no status
>>> >>> updates..)
>>> >>>
>>> >>> Im sure there must be a problem on my side that is causing me not
to
>>> be
>>> >>> able to register at the JobManager. What could cause such connection
>>> >>> problems?
>>> >>>
>>> >>> Any tips are very welcome :-)
>>> >>>
>>> >>> Cheers and have a good weekend!
>>> >>>
>>> >>> - Pieter
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Mime
View raw message