flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Submit Flink Jobs to YARN running on AWS
Date Tue, 26 Apr 2016 08:47:28 GMT
Hi Abhi,

I'll try to reproduce the issue and come up with a solution.

On Tue, Apr 26, 2016 at 1:13 AM, Bajaj, Abhinav <abhinav.bajaj@here.com>
wrote:

> Hi Fabian,
>
> Thanks for your reply and the pointers to documentation.
>
> In these steps, I think the Flink client is installed on the master node,
> referring to steps mentioned in Flink docs here
> <https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html>
> .
> However, the scenario I have is to run the client on my local machine and
> submit jobs remotely to the YARN Cluster (running on EMR or independently).
>
> Let me describe in more detail here.
> I am trying to submit a single Flink Job to YARN using the client, running
> on my dev machine -
>
> ./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096
>  ./examples/batch/WordCount.jar
>
> In my understanding, YARN (running in AWS) allocates a container for the
> Jobmanager.
> Jobmanager discovers the IP and started the Actor system. At this step the
> IP it uses is the internal IP address, of the EC2 instance.
>
> The client, running on my dev machine, is not able to connect to the
> Jobmanager for reasons explained in my mail below.
>
> Is there a way, where I can set Jobmanager to use the hostname and not the
> IP address?
>
> Or any other suggestions?
>
> Thanks,
> Abhi
>
> *[image: cid:DACBF116-FD8C-48DB-B91D-D54510B306E8]*
>
> *Abhinav Bajaj*
>
> Senior Engineer
>
> HERE Predictive Analytics
>
> Office:  +12062092767
>
> Mobile: +17083299516
>
> *HERE Seattle*
>
> 701 Pike Street, #2000, Seattle, WA 98101, USA
>
> *47° 36' 41" N. 122° 19' 57" W*
>
> *HERE Maps*
>
>
>
>
> From: Fabian Hueske <fhueske@gmail.com>
> Reply-To: "user@flink.apache.org" <user@flink.apache.org>
> Date: Wednesday, March 9, 2016 at 12:51 AM
> To: "user@flink.apache.org" <user@flink.apache.org>
> Subject: Re: Submit Flink Jobs to YARN running on AWS
>
> Hi Abhi,
>
> I have used Flink on EMR via YARN a couple of times without problems.
> I started a Flink YARN session like this:
>
> ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
>
> This will start five YARN containers (1 JobManager with 1024MB, 4
> Taskmanagers with 4096MB). See more config options in the documentation [1].
> In one of the last lines of the std-out output you should find a line that
> tells you the IP and port of the JobManager.
>
> With the IP and port, you can submit a job as follows:
>
> ./bin/flink run -m jmIP:jmPort -p 4 jobJarFile.jar <arguments>
>
> This will send the job to the JobManager specified by IP and port and
> execute the program with a parallelism of 4. See more config options in the
> documentation [2].
>
> If this does not help, could you share the exact command that you use to
> start the YARN session and submit the job?
>
> Best, Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/yarn_setup.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/cli.html
>
> 2016-03-08 0:25 GMT+01:00 Bajaj, Abhinav <abhinav.bajaj@here.com>:
>
>> Hi,
>>
>> I am a newbie to Flink and trying to use it in AWS.
>> I have created a YARN cluster on AWS EC2 machines.
>> Trying to submit Flink job to the remote YARN cluster using the Flink
>> Client running on my local machine.
>>
>> The Jobmanager start successfully on the YARN container but the client is
>> not able to connect to the Jobmanager.
>>
>> Flink Client Logs -
>>
>> 13:57:34,877 INFO  org.apache.flink.yarn.FlinkYarnClient
>>         - Deploying cluster, current state ACCEPTED
>> 13:57:35,951 INFO  org.apache.flink.yarn.FlinkYarnClient
>>         - Deploying cluster, current state ACCEPTED
>> 13:57:37,027 INFO  org.apache.flink.yarn.FlinkYarnClient
>>         - YARN application has been deployed successfully.
>> 13:57:37,100 INFO  org.apache.flink.yarn.FlinkYarnCluster
>>        - Start actor system.
>> 13:57:37,532 INFO  org.apache.flink.yarn.FlinkYarnCluster
>>        - Start application client.
>> YARN cluster started
>> JobManager web interface address
>> http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8088/proxy/application_1456184947990_0003/
>> Waiting until all TaskManagers have connected
>> 13:57:37,540 INFO  org.apache.flink.yarn.ApplicationClient
>>         - Notification about new leader address akka.tcp:
>> //flink@54.35.41.12:41292/user/jobmanager with session ID null.
>> No status updates from the YARN cluster received so far. Waiting ...
>> 13:57:37,543 INFO  org.apache.flink.yarn.ApplicationClient
>>         - Received address of new leader akka.tcp://flink@54.35.41.12:41292/user/jobmanager
>> with session ID null.
>> 13:57:37,543 INFO  org.apache.flink.yarn.ApplicationClient
>>         - Disconnect from JobManager null.
>> 13:57:37,545 INFO  org.apache.flink.yarn.ApplicationClient
>>         - Trying to register at JobManager akka.tcp://flink@54.35.41.12
>> :41292/user/jobmanager.
>> No status updates from the YARN cluster received so far. Waiting ...
>>
>> The logs of the Jobmanager contains the following -
>>
>> 21:57:39,142 ERROR akka.remote.EndpointWriter                                   
- dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]]
arriving at [akka.tcp://flink@54.35.41.12:41292] inbound addresses are [akka.tcp://flink@172.31.23.18:41292]
>> 21:57:40,782 INFO  org.apache.flink.runtime.instance.InstanceManager            
- Registered TaskManager at ec2-54-35-41-12 (akka.tcp://flink@172.31.23.18:60565/user/taskmanager)
as 72101dd2ee94caa7a5ec5a75488359aa. Current number of registered hosts is 1. Current number
of alive task slots is 1.
>> 21:57:41,162 ERROR akka.remote.EndpointWriter                                   
- dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]]
arriving at [akka.tcp://flink@54.35.41.12:41292] inbound addresses are [akka.tcp://flink@172.31.23.18:41292]
>>
>> It seems the problem is in the mismatch of the Jobmanager Akka actors
>> system running address and the one user by the Client.
>> 172.31.23.18 – is the internal private IP of the EC2 machine where the
>> Jobmanager container is running.
>> 54.35.41.12 – is the external IP of the EC2 machine, used by Flink client
>> to submit the Job.
>> Because of this mismatch the messages are ignored by the Akka actor
>> System.
>>
>> Can someone please help me with this issue.
>> I can share the detailed logs, if required.
>>
>> Thanks,
>> Abhi
>>
>>
>

Mime
View raw message