flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Master (1.1-SNAPSHOT) Can't run on YARN
Date Tue, 19 Apr 2016 16:53:57 GMT
Hey Stefano,

Flink's resource management has been refactored for 1.1 recently. This
could be a regression introduced by this. Max can probably help you
with more details. Is this currently a blocker for you?

– Ufuk

On Tue, Apr 19, 2016 at 6:31 PM, Stefano Baghino
<stefano.baghino@radicalbit.io> wrote:
> Hi everyone,
>
> I'm currently experiencing a weird situation, I hope you can help me out
> with this.
>
> I've cloned and built from the master, then I've edited the default config
> fil by adding my Hadoop config path, exported the HADOOP_CONF_DIR env var
> and ran bin/yarn-session.sh -n 1 -s 2 -jm 2048 -tm 2048
>
> The first thing I noticed is that I had to put "-s 2" or the task managers
> gets created with -1 slots (!) by default.
>
> After putting "-s 2" the YARN session startup hangs when trying to register
> the task managers. I've stopped the session and aggregated the logs and read
> a lot (several thousands) of the messages I attach at the bottom; any idea
> of what this may be?
>
> Thank you a lot in advance!
>
> 2016-04-19 12:15:59,507 INFO  org.apache.flink.yarn.YarnTaskManager
> - Trying to register at JobManager
> akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 1, timeout:
> 500 milliseconds)
>
> 2016-04-19 12:15:59,649 ERROR org.apache.flink.yarn.YarnTaskManager
> - The registration at JobManager
> Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused,
> because: java.lang.IllegalStateException: Resource
> ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not
> registered with resource manager.. Retrying later...
>
> 2016-04-19 12:16:00,025 INFO  org.apache.flink.yarn.YarnTaskManager
> - Trying to register at JobManager
> akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 2, timeout:
> 1000 milliseconds)
>
> 2016-04-19 12:16:00,033 ERROR org.apache.flink.yarn.YarnTaskManager
> - The registration at JobManager
> Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused,
> because: java.lang.IllegalStateException: Resource
> ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not
> registered with resource manager.. Retrying later...
>
> 2016-04-19 12:16:01,045 INFO  org.apache.flink.yarn.YarnTaskManager
> - Trying to register at JobManager
> akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 3, timeout:
> 2000 milliseconds)
>
> 2016-04-19 12:16:01,053 ERROR org.apache.flink.yarn.YarnTaskManager
> - The registration at JobManager
> Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused,
> because: java.lang.IllegalStateException: Resource
> ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not
> registered with resource manager.. Retrying later...
>
> 2016-04-19 12:16:03,064 INFO  org.apache.flink.yarn.YarnTaskManager
> - Trying to register at JobManager
> akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 4, timeout:
> 4000 milliseconds)
>
> 2016-04-19 12:16:03,072 ERROR org.apache.flink.yarn.YarnTaskManager
> - The registration at JobManager
> Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused,
> because: java.lang.IllegalStateException: Resource
> ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not
> registered with resource manager.. Retrying later...
>
> 2016-04-19 12:16:07,085 INFO  org.apache.flink.yarn.YarnTaskManager
> - Trying to register at JobManager
> akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 5, timeout:
> 8000 milliseconds)
>
> 2016-04-19 12:16:07,092 ERROR org.apache.flink.yarn.YarnTaskManager
> - The registration at JobManager
> Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused,
> because: java.lang.IllegalStateException: Resource
> ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not
> registered with resource manager.. Retrying later...
>
> 2016-04-19 12:16:09,664 INFO  org.apache.flink.yarn.YarnTaskManager
> - Trying to register at JobManager
> akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 1, timeout:
> 500 milliseconds)
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit

Mime
View raw message