flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang Wang <danrtsey...@gmail.com>
Subject Re: Submitted Flink Jobs EMR are failing (Could not start rest endpoint on any port in port range 8081)
Date Tue, 23 Jun 2020 03:07:48 GMT
Hi Sateesh, if the "rest.port" or "rest.bind-port" is configured
explicitly, it will be used to
start the rest server. So you need to remove them from the flink-conf.yaml
or configure them
to "0" or port range(50100-50200).

By default, "flink run" will always start a dedicated Flink cluster for
each job. If you want to use
session mode, you need to start with "yarn-session.sh" first. And then use
"flink run ... -yid application_id"
to submit a Flink job to existing cluster.


Best,
Yang

Arvid Heise <arvid@ververica.com> 于2020年6月22日周一 下午9:58写道:

> Hi Sateesh,
>
> the solution still applies, there are not all entries listed in the conf
> template.
>
> From what you have written, it's most certainly that the first jobs are
> not finished (hence port is taken). Make sure you don't use the detached
> mode when submitting.
> You can see the status of the jobs in YARN resource manager which also
> links to the respective Flink JobManagers.
>
> And yes, by default, each job creates a new YARN session unless you use
> them explicitly [1].
>
> If you need more help, please post your steps.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html#flink-yarn-session
>
> On Thu, Jun 18, 2020 at 4:15 PM sk_acura@yahoo.com <sk_acura@yahoo.com>
> wrote:
>
>> I am using EMR 5.30.0 and trying to submit a Flink (1.10.0) job using the
>> following command
>>
>> flink run -m yarn-cluster /home/hadoop/flink--test-0.0.1-SNAPSHOT.jar
>>
>> and i am getting the following error:
>>
>>     Caused by:
>> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
>> YARN application unexpectedly switched to state FAILED during deployment.
>>
>> After going through the logs on the worker nodes and job manager logs it
>> looks like there is a port conflict
>>
>>     2020-06-17 21:40:51,199 ERROR
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Could not
>> start cluster entrypoint YarnJobClusterEntrypoint.
>>     org.apache.flink.runtime.entrypoint.ClusterEntrypointException:
>> Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
>>             at
>> org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
>>     Caused by: org.apache.flink.util.FlinkException: Could not create the
>> DispatcherResourceManagerComponent.
>>             at
>> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
>>             at java.security.AccessController.doPrivileged(Native Method)
>>             at javax.security.auth.Subject.doAs(Subject.java:422)
>>             at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>>             at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
>>             ... 2 more
>>     Caused by: java.net.BindException: Could not start rest endpoint on
>> any port in port range 8081
>>             at
>> org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:219)
>>             at
>> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)
>>             ... 9 more
>>
>> There seems to be JIRA Ticket (
>> https://issues.apache.org/jira/browse/FLINK-15394) open for this (though
>> it is for 1.9 version of Flink) and the suggested solution is to use port
>> range for **rest.bind-port** in Flink config File.
>>
>> How ever in 1.10 version of Flink we only the following the the Yan Conf
>> YML File
>>
>>     rest.port: 8081
>>
>> Another issue i am facing is i have submitted multiple Flink jobs (same
>> job multiple times) using AWS Console and via Add Step ui. Only one of the
>> job succeeded and the rest have failed with the error posted above. And
>> when i go to Flink UI it doesn't show any jobs at all.
>>
>> Wondering whether each of the submitted jobs trying to create a Flink
>> Yarn session instead of using the existing one.
>>
>> Thanks
>> Sateesh
>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Mime
View raw message