flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Zagrebin <and...@ververica.com>
Subject Re: Jira issue Flink-11127
Date Fri, 22 Feb 2019 16:35:51 GMT
cc aleksey@ververica.com

On Fri, Feb 22, 2019 at 1:28 AM Boris Lublinsky <
boris.lublinsky@lightbend.com> wrote:

> Adding metric-query port makes it a bit better, but there is still an error
>
>
> 019-02-22 00:03:56,173 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager,
> retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message
> of type "akka.actor.Identify"..
> 2019-02-22 00:04:16,213 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager,
> retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message
> of type "akka.actor.Identify"..
> 2019-02-22 00:04:36,253 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager,
> retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message
> of type "akka.actor.Identify"..
> 2019-02-22 00:04:56,293 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager,
> retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message
> of type "akka.actor.Identify”..
>
> In the task manager and
>
> 2019-02-21 23:59:46,479 ERROR akka.remote.EndpointWriter
>                  - dropping message [class
> akka.actor.ActorSelectionMessage] for non-local recipient [Actor[
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound
> addresses are [akka.tcp://flink@127.0.0.1:6123]
> 2019-02-21 23:59:57,808 ERROR akka.remote.EndpointWriter
>                  - dropping message [class
> akka.actor.ActorSelectionMessage] for non-local recipient [Actor[
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound
> addresses are [akka.tcp://flink@127.0.0.1:6123]
> 2019-02-22 00:00:06,519 ERROR akka.remote.EndpointWriter
>                  - dropping message [class
> akka.actor.ActorSelectionMessage] for non-local recipient [Actor[
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound
> addresses are [akka.tcp://flink@127.0.0.1:6123]
> 2019-02-22 00:00:17,849 ERROR akka.remote.EndpointWriter
>                  - dropping message [class
> akka.actor.ActorSelectionMessage] for non-local recipient [Actor[
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound
> addresses are [akka.tcp://flink@127.0.0.1:6123]
> 2019-02-22 00:00:26,558 ERROR akka.remote.EndpointWriter
>                  - dropping message [class
> akka.actor.ActorSelectionMessage] for non-local recipient [Actor[
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound
> addresses are [akka.tcp://flink@127.0.0.1:6123]
> 2019-02-22 00:00:37,888 ERROR akka.remote.EndpointWriter
>                  - dropping message [class
> akka.actor.ActorSelectionMessage] for non-local recipient [Actor[
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [
> akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound
> addresses are [akka.tcp://flink@127.0.0.1:6123]
>
> I the job manager
>
> Port 6123 is opened in both Job Manager deployment
>
> apiVersion: extensions/v1beta1
> kind: Deployment
> metadata:
>   name: {{ template "fullname" . }}-jobmanager
> spec:
>   replicas: 1
>   template:
>     metadata:
>       annotations:
>         prometheus.io/scrape: 'true'
>         prometheus.io/port: '9249'
>       labels:
>         server: flink
>         app: {{ template "fullname" . }}
>         component: jobmanager
>     spec:
>       containers:
>       - name: jobmanager
>         image: {{ .Values.image }}:{{ .Values.imageTag }}
>         imagePullPolicy: {{ .Values.imagePullPolicy }}
>         args:
>         - jobmanager
>         ports:
>         - containerPort: 6123
>           name: rpc
>         - containerPort: 6124
>           name: blob
>         - containerPort: 8081
>           name: ui
>         env:
>         - name: CONTAINER_METRIC_PORT
>           value: '{{ .Values.flink.metric_query_port }}'
>         - name: JOB_MANAGER_RPC_ADDRESS
>           value : {{ template "fullname" . }}-jobmanager
>         livenessProbe:
>           httpGet:
>             path: /overview
>             port: 8081
>           initialDelaySeconds: 30
>           periodSeconds: 10
>         resources:
>           limits:
>             cpu: {{ .Values.resources.jobmanager.limits.cpu }}
>             memory: {{ .Values.resources.jobmanager.limits.memory }}
>           requests:
>             cpu: {{ .Values.resources.jobmanager.requests.cpu }}
>             memory: {{ .Values.resources.jobmanager.requests.memory }}
>
> And Job manager service
>
> apiVersion: v1
> kind: Service
> metadata:
>   name: {{ template "fullname" . }}-jobmanager
> spec:
>   ports:
>   - name: rpc
>     port: 6123
>   - name: blob
>     port: 6124
>   - name: ui
>     port: 8081
>   selector:
>     app: {{ template "fullname" . }}
>     component: jobmanager
>
>
>
>
> Boris Lublinsky
> FDP Architect
> boris.lublinsky@lightbend.com
> https://www.lightbend.com/
>
> On Feb 21, 2019, at 6:13 PM, Boris Lublinsky <
> boris.lublinsky@lightbend.com> wrote:
>
>
> Boris Lublinsky
> FDP Architect
> boris.lublinsky@lightbend.com
> https://www.lightbend.com/
>
> On Feb 21, 2019, at 2:05 AM, Konstantin Knauf <konstantin@ververica.com>
> wrote:
>
> Hi Boris,
>
> the exact command depends on the docker-entrypoint.sh script and the image
> you are using. For the example contained in the Flink repository it is
> "task-manager", I think. The important thing is to pass "taskmanager.host"
> to the Taskmanager process. You can verify by checking the Taskmanager
> logs. These should contain lines like below:
>
> 2019-02-21 08:03:00,004 INFO
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] -  Program
> Arguments:
> 2019-02-21 08:03:00,008 INFO
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] -
> -Dtaskmanager.host=10.12.10.173
>
> In the Jobmanager logs you should see that the Taskmanager is registered
> under the IP above in a line similar to:
>
> 2019-02-21 08:03:26,874 INFO
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
> Registering TaskManager with ResourceID a0513ba2c472d2d1efc07626da9c1bda
> (akka.tcp://flink@10.12.10.173:46531/user/taskmanager_0) at
> ResourceManager
>
> A service per Taskmanager is not required. The purpose of the config
> parameter is that the Jobmanager addresses the taskmanagers by IP instead
> of hostname.
>
> Hope this helps!
>
> Cheers,
>
> Konstantin
>
>
>
> On Wed, Feb 20, 2019 at 4:37 PM Boris Lublinsky <
> boris.lublinsky@lightbend.com> wrote:
>
>> Also, The suggested workaround does not quite work.
>> 2019-02-20 15:27:43,928 WARN akka.remote.ReliableDeliverySupervisor -
>> Association with remote system [
>> akka.tcp://flink-metrics@flink-taskmanager-1:6170] has failed, address
>> is now gated for [50] ms. Reason: [Association failed with [
>> akka.tcp://flink-metrics@flink-taskmanager-1:6170]] Caused by:
>> [flink-taskmanager-1: No address associated with hostname]
>> 2019-02-20 15:27:48,750 ERROR
>> org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler
>> - Caught exception
>>
>> I think the problem is that its trying to connect to flink-task-manager-1
>>
>> Using busybody to experiment with nslookup, I can see
>> / # nslookup flink-taskmanager-1.flink-taskmanager
>> Server:    10.0.11.151
>> Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal
>>
>> Name:      flink-taskmanager-1.flink-taskmanager
>> Address 1: 10.131.2.136
>> flink-taskmanager-1.flink-taskmanager.flink.svc.cluster.local
>> / # nslookup flink-taskmanager-1
>> Server:    10.0.11.151
>> Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal
>>
>> nslookup: can't resolve 'flink-taskmanager-1'
>> / # nslookup flink-taskmanager-0.flink-taskmanager
>> Server:    10.0.11.151
>> Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal
>>
>> Name:      flink-taskmanager-0.flink-taskmanager
>> Address 1: 10.131.0.111
>> flink-taskmanager-0.flink-taskmanager.flink.svc.cluster.local
>> / # nslookup flink-taskmanager-0
>> Server:    10.0.11.151
>> Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal
>>
>> nslookup: can't resolve 'flink-taskmanager-0'
>> / #
>>
>> So the name should be postfixed with the service name. How do I force it?
>> I suspect I am missing config parameter
>>
>>
>> Boris Lublinsky
>> FDP Architect
>> boris.lublinsky@lightbend.com
>> https://www.lightbend.com/
>>
>> On Feb 19, 2019, at 4:33 AM, Konstantin Knauf <konstantin@ververica.com>
>> wrote:
>>
>> Hi Boris,
>>
>> the solution is actually simpler than it sounds from the ticket. The only
>> thing you need to do is to set the "taskmanager.host" to the Pod's IP
>> address in the Flink configuration. The easiest way to do this is to pass
>> this config dynamically via a command-line parameter.
>>
>> The Deployment spec could looks something like this:
>>
>> containers:
>> - name: taskmanager
>>   [...]
>>   args:
>>   - "taskmanager.sh"
>>   - "start-foreground"
>>   - "-Dtaskmanager.host=$(K8S_POD_IP)"
>>   [...]
>>
>>   env:
>>   - name: K8S_POD_IP
>>     valueFrom:
>>       fieldRef:
>>         fieldPath: status.podIP
>>
>>
>> Hope this helps and let me know if this works.
>>
>> Best,
>>
>> Konstantin
>>
>> On Sun, Feb 17, 2019 at 9:51 PM Boris Lublinsky <
>> boris.lublinsky@lightbend.com> wrote:
>>
>>> I was looking at this issue
>>> https://issues.apache.org/jira/browse/FLINK-11127
>>> Apparently there is a workaround for it.
>>> Is it possible provide the complete helm chart for it.
>>> Bits and pieces are in the ticket, but it would be nice to see the full
>>> chart
>>>
>>> Boris Lublinsky
>>> FDP Architect
>>> boris.lublinsky@lightbend.com
>>> https://www.lightbend.com/
>>>
>>>
>>
>> --
>> Konstantin Knauf | Solutions Architect
>> +49 160 91394525
>>
>> <https://www.ververica.com/>
>>
>> Follow us @VervericaData
>> --
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>> Stream Processing | Event Driven | Real Time
>> --
>> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>> --
>> Data Artisans GmbH
>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>>
>>
>>
>
> --
> Konstantin Knauf | Solutions Architect
> +49 160 91394525
> <https://www.ververica.com/>
>
> Follow us @VervericaData
> --
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
> Stream Processing | Event Driven | Real Time
> --
> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> --
> Data Artisans GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>
>
>
>

Mime
View raw message