flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bajaj, Abhinav" <abhinav.ba...@here.com>
Subject Submit Flink Jobs to YARN running on AWS
Date Mon, 07 Mar 2016 23:25:53 GMT
Hi,

I am a newbie to Flink and trying to use it in AWS.
I have created a YARN cluster on AWS EC2 machines.
Trying to submit Flink job to the remote YARN cluster using the Flink Client running on my
local machine.

The Jobmanager start successfully on the YARN container but the client is not able to connect
to the Jobmanager.

Flink Client Logs -

13:57:34,877 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Deploying
cluster, current state ACCEPTED
13:57:35,951 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Deploying
cluster, current state ACCEPTED
13:57:37,027 INFO  org.apache.flink.yarn.FlinkYarnClient                         - YARN application
has been deployed successfully.
13:57:37,100 INFO  org.apache.flink.yarn.FlinkYarnCluster                        - Start actor
system.
13:57:37,532 INFO  org.apache.flink.yarn.FlinkYarnCluster                        - Start application
client.
YARN cluster started
JobManager web interface address http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8088/proxy/application_1456184947990_0003/
Waiting until all TaskManagers have connected
13:57:37,540 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification
about new leader address akka.tcp://flink@54.35.41.12<mailto://flink@54.35.41.12>:41292/user/jobmanager
with session ID null.
No status updates from the YARN cluster received so far. Waiting ...
13:57:37,543 INFO  org.apache.flink.yarn.ApplicationClient                       - Received
address of new leader akka.tcp://flink@54.35.41.12<mailto://flink@54.35.41.12>:41292/user/jobmanager
with session ID null.
13:57:37,543 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect
from JobManager null.
13:57:37,545 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying
to register at JobManager akka.tcp://flink@54.35.41.12<mailto://flink@54.35.41.12>:41292/user/jobmanager.
No status updates from the YARN cluster received so far. Waiting ...

The logs of the Jobmanager contains the following -

21:57:39,142 ERROR akka.remote.EndpointWriter                                    - dropping
message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]]
arriving at [akka.tcp://flink@54.35.41.12:41292] inbound addresses are [akka.tcp://flink@172.31.23.18<mailto://flink@172.31.23.18>:41292]
21:57:40,782 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered
TaskManager at ec2-54-35-41-12 (akka.tcp://flink@172.31.23.18<mailto://flink@172.31.23.18>:60565/user/taskmanager)
as 72101dd2ee94caa7a5ec5a75488359aa. Current number of registered hosts is 1. Current number
of alive task slots is 1.
21:57:41,162 ERROR akka.remote.EndpointWriter                                    - dropping
message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]]
arriving at [akka.tcp://flink@54.35.41.12:41292] inbound addresses are [akka.tcp://flink@172.31.23.18<mailto://flink@172.31.23.18>:41292]

It seems the problem is in the mismatch of the Jobmanager Akka actors system running address
and the one user by the Client.
172.31.23.18 – is the internal private IP of the EC2 machine where the Jobmanager container
is running.
54.35.41.12 – is the external IP of the EC2 machine, used by Flink client to submit the
Job.
Because of this mismatch the messages are ignored by the Akka actor System.

Can someone please help me with this issue.
I can share the detailed logs, if required.

Thanks,
Abhi

Mime
View raw message