flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Flink on YARN: Stuck on "Trying to register at JobManager"
Date Mon, 08 Feb 2016 16:06:31 GMT
You said earlier that you are using Flink 0.10. The feature is only
available in 1.0-SNAPSHOT.

On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <phameete@gmail.com> wrote:

> Ive tried setting the yarn.application-master.port property in
> flink-conf.yaml to a range suggested in
> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
> rewalls
>
> The JobManager does not seem to be picking the property up. Am I setting
> this in the wrong place? Or is there another way to enforce this property?
>
> Cheers,
>
> Pieter
>
> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <phameete@gmail.com>:
>
>> I found the relevant information on the website. Ill consult with the
>> cluster admin tomorrow, thanks for the help :-)
>>
>> - Pieter
>>
>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rmetzger@apache.org>:
>>
>>> Hi,
>>>
>>> we had other users with a similar issue as well. There is a
>>> configuration value which allows you to specify a single port or a range of
>>> ports for the JobManager to allocate when running on YARN.
>>> Note that when using this with a single port, the JMs may collide.
>>>
>>>
>>>
>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <phameete@gmail.com>
>>> wrote:
>>>
>>>> Hi Stephan,
>>>>
>>>> surely it seems this way! I must not be the first with this issue
>>>> though? I'll have to contact the cluster admins to find a solution
>>>> together. What would be a way of make the JobManagers accessible from
>>>> outside the network, because the IP and port number changes every time.
>>>>
>>>> Alternatively, I can ask for ssh access to a node within the network.
>>>> that will surely work but it's not my preferred solution.
>>>>
>>>> - Pieter
>>>>
>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <sewen@apache.org>:
>>>>
>>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>>> port.
>>>>>
>>>>> The ports to communicate with HDFS and the YARN resource manager may
>>>>> be whitelisted r forwarded, so you can submit the YARN session, but then
>>>>> not connect to the JobManager afterwards.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phameete@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Max!
>>>>>>
>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>>> fine, all in the JobManager Web UI looks good.
>>>>>>
>>>>>> It seems like the JobManager initiates the connection with my VM
and
>>>>>> cannot reach it. It could be that this is similar to the problem
here:
>>>>>>
>>>>>>
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>>
>>>>>> I probably have to make some changes to the networking configuration
>>>>>> of my VM so it can be reached by the JobManager despite using a different
>>>>>> port each time.
>>>>>>
>>>>>> - Pieter
>>>>>>
>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mxm@apache.org>:
>>>>>>
>>>>>>> Hi Pieter,
>>>>>>>
>>>>>>> Which version of Flink are you using? It appears you've created
a
>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Max
>>>>>>>
>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phameete@gmail.com>
>>>>>>> wrote:
>>>>>>> > Hi Robert,
>>>>>>> >
>>>>>>> > unfortunately there are no signs of what is going wrong
in the
>>>>>>> logs. The
>>>>>>> > last log messages are about succesful registration of the
>>>>>>> TaskManagers.
>>>>>>> >
>>>>>>> > I'm also fairly sure it must be something in my VM that
is causing
>>>>>>> this,
>>>>>>> > because when I start the yarn-session from a login node
that is on
>>>>>>> the same
>>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>>> with the
>>>>>>> > JobManager. I did also notice the following message in the
local
>>>>>>> console:
>>>>>>> >
>>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>>> > - Tried to associate with unreachable remote address
>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated
for
>>>>>>> 5000 ms,
>>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>>> Reason:
>>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>>> >
>>>>>>> > I can ping the JobManager fine from with VM. Could there
be some
>>>>>>> invalid or
>>>>>>> > missing configuration on my side?
>>>>>>> >
>>>>>>> > Cheers,
>>>>>>> >
>>>>>>> > Pieter
>>>>>>> >
>>>>>>> >
>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetzger@apache.org>:
>>>>>>> >>
>>>>>>> >> Hi,
>>>>>>> >>
>>>>>>> >> did you check the logs of the JobManager itself? Maybe
it'll tell
>>>>>>> us
>>>>>>> >> already whats going on.
>>>>>>> >>
>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>>> phameete@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hi Guys!
>>>>>>> >>>
>>>>>>> >>> Im attempting to run Flink on YARN, but I run into
an issue. Im
>>>>>>> starting
>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM.
All goes well
>>>>>>> until after
>>>>>>> >>> the JobManager web UI is started:
>>>>>>> >>>
>>>>>>> >>> JobManager web interface address
>>>>>>> >>>
>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>> >>> - Notification about new leader address
>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager
with
>>>>>>> session ID null.
>>>>>>> >>> No status updates from the YARN cluster received
so far. Waiting
>>>>>>> ...
>>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>> >>> - Received address of new leader
>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager
with
>>>>>>> session ID null.
>>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>> >>> - Disconnect from JobManager null.
>>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>> >>> - Trying to register at JobManager
>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>>> >>> No status updates from the YARN cluster received
so far. Waiting
>>>>>>> ...
>>>>>>> >>> No status updates from the YARN cluster received
so far. Waiting
>>>>>>> ...
>>>>>>> >>>
>>>>>>> >>> It then hangs on these last steps (trying to register,
no status
>>>>>>> >>> updates..)
>>>>>>> >>>
>>>>>>> >>> Im sure there must be a problem on my side that
is causing me
>>>>>>> not to be
>>>>>>> >>> able to register at the JobManager. What could cause
such
>>>>>>> connection
>>>>>>> >>> problems?
>>>>>>> >>>
>>>>>>> >>> Any tips are very welcome :-)
>>>>>>> >>>
>>>>>>> >>> Cheers and have a good weekend!
>>>>>>> >>>
>>>>>>> >>> - Pieter
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message