From user-return-26096-archive-asf-public=cust-asf.ponee.io@flink.apache.org Fri Feb 22 00:13:23 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id F125518064C for ; Fri, 22 Feb 2019 01:13:21 +0100 (CET) Received: (qmail 5992 invoked by uid 500); 22 Feb 2019 00:13:15 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 5982 invoked by uid 99); 22 Feb 2019 00:13:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2019 00:13:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 508ADC2491 for ; Fri, 22 Feb 2019 00:13:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.259 X-Spam-Level: *** X-Spam-Status: No, score=3.259 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_INFOUSMEBIZ=0.75, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=lightbend-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id km9yBZwUf57O for ; Fri, 22 Feb 2019 00:13:12 +0000 (UTC) Received: from mail-qk1-f195.google.com (mail-qk1-f195.google.com [209.85.222.195]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id EB260623C6 for ; Fri, 22 Feb 2019 00:13:11 +0000 (UTC) Received: by mail-qk1-f195.google.com with SMTP id y140so160981qkb.9 for ; Thu, 21 Feb 2019 16:13:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lightbend-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=aFrvMs8fjmTLOOeBLBmtNzt2YI4pxo7j+mjf+6yBy00=; b=ERTkwz7xbLxJrbG1C7ksfPx60Q96SLzL8sHTIS5qtTMcVP8f9Vxf81fAHr7MPT00wK 0S1JgWMzOVDa0TT4kXeYH1+SlfK8mlgRzkAaoGUtPhZrp8viwLuT1DF3AYYfARj1rfnv fSHnyjkSIUBaH0u0glqKn7WdwN84GePjXDOHPhm0ogZU52TfGyqCoIXD5TJt+1ksnNQX mL/sJHHpGKwLpu185CnaR72JFRrk3gt88z0TqDH3+6mHfJHdBKK+XIGlahp1EgjNhzxb Xqwc+rW9QQujzzXW2+9YOWEh5p/ZULvp1+WRYb5zO2nkRX0yiYMi5CeZarcXLbvApvdS 9jfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=aFrvMs8fjmTLOOeBLBmtNzt2YI4pxo7j+mjf+6yBy00=; b=J0UEcEUxZhGCn43vvkcv1InYyegQXwaSb18OejmbAO1QyIcAuGvo9UDlgwtqbMKnpr 9Qj+PXMEu4lG0WKFN0zH6IVj7gnTlWcoI1Q3pL59OoV4+XH2nDGrT4Qvn9OcMrZOQh1g lzJDlLgW9Exjexqp1ox3ka30zwCplDW8lb93/7EUgABtOdpjvQBq+8jUYw6tWWDBOrrB Ddq9fIWzMWcs55WlL9UywD6vTk3aaelv8zQO1uzpXDxcpldJsV1Mx8tuvSxY62jHkLNR +QlGrGWOhQ2+/h/Of5txFzwaDI2YedqHLpIKTysPzqbtzX4G0apexULcaUxAXDvxPnpM rV2w== X-Gm-Message-State: AHQUAuZJZzN96OY0h2M2McRe6MXFfd0jhhPepCyb0Yg9UW0GVKXLgJKz MH7wEV4ly7SfF5qenUXdZ2yAR3phNgk= X-Google-Smtp-Source: AHgI3IZ10PtwvuJ7cs/ZRkt0KPv41Zdyi/qRVbtavDyzoK/KJ7eAf706qm5BOftBL/ilO5bA62jJEg== X-Received: by 2002:a37:378f:: with SMTP id e137mr985957qka.137.1550794391390; Thu, 21 Feb 2019 16:13:11 -0800 (PST) Received: from [192.168.0.2] (205-178-36-187.s1463.c3-0.alc-cbr1.chi-alc.il.cable.rcncustomer.com. [205.178.36.187]) by smtp.gmail.com with ESMTPSA id l24sm188013qtf.27.2019.02.21.16.13.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Feb 2019 16:13:10 -0800 (PST) From: Boris Lublinsky Message-Id: <28D34571-7F4F-477D-8E6E-DEF6BF9D33B4@lightbend.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_2D0FACD1-D339-400A-9DC2-470BFF26A0C7" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Jira issue Flink-11127 Date: Thu, 21 Feb 2019 18:13:09 -0600 In-Reply-To: Cc: user To: Konstantin Knauf References: <26734366-6E33-4B85-9DFB-759F89D97A58@lightbend.com> <71DA7E36-6136-4734-AFAF-CDEB3C80C099@lightbend.com> X-Mailer: Apple Mail (2.3273) --Apple-Mail=_2D0FACD1-D339-400A-9DC2-470BFF26A0C7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Boris Lublinsky FDP Architect boris.lublinsky@lightbend.com https://www.lightbend.com/ > On Feb 21, 2019, at 2:05 AM, Konstantin Knauf = wrote: >=20 > Hi Boris,=20 >=20 > the exact command depends on the docker-entrypoint.sh script and the = image you are using. For the example contained in the Flink repository = it is "task-manager", I think. The important thing is to pass = "taskmanager.host" to the Taskmanager process. You can verify by = checking the Taskmanager logs. These should contain lines like below: >=20 > 2019-02-21 08:03:00,004 INFO = org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - = Program Arguments: > 2019-02-21 08:03:00,008 INFO = org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - = -Dtaskmanager.host=3D10.12.10.173 >=20 > In the Jobmanager logs you should see that the Taskmanager is = registered under the IP above in a line similar to: >=20 > 2019-02-21 08:03:26,874 INFO = org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - = Registering TaskManager with ResourceID a0513ba2c472d2d1efc07626da9c1bda = (akka.tcp://flink@10.12.10.173:46531/user/taskmanager_0 = ) at ResourceManager >=20 > A service per Taskmanager is not required. The purpose of the config = parameter is that the Jobmanager addresses the taskmanagers by IP = instead of hostname. >=20 > Hope this helps! >=20 > Cheers,=20 >=20 > Konstantin >=20 >=20 >=20 > On Wed, Feb 20, 2019 at 4:37 PM Boris Lublinsky = > = wrote: > Also, The suggested workaround does not quite work. > 2019-02-20 15:27:43,928 WARN akka.remote.ReliableDeliverySupervisor = - Association with remote system = [akka.tcp://flink-metrics@flink-taskmanager-1:6170 <>] has failed, = address is now gated for [50] ms. Reason: [Association failed with = [akka.tcp://flink-metrics@flink-taskmanager-1:6170 <>]] Caused by: = [flink-taskmanager-1: No address associated with hostname] > 2019-02-20 15:27:48,750 ERROR = org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler= - Caught exception >=20 > I think the problem is that its trying to connect to = flink-task-manager-1 >=20 > Using busybody to experiment with nslookup, I can see > / # nslookup flink-taskmanager-1.flink-taskmanager > Server: 10.0.11.151 > Address 1: 10.0.11.151 ip-10-0-11-151.us = -west-2.compute.internal >=20 > Name: flink-taskmanager-1.flink-taskmanager > Address 1: 10.131.2.136 = flink-taskmanager-1.flink-taskmanager.flink.svc.cluster.local > / # nslookup flink-taskmanager-1 > Server: 10.0.11.151 > Address 1: 10.0.11.151 ip-10-0-11-151.us = -west-2.compute.internal >=20 > nslookup: can't resolve 'flink-taskmanager-1' > / # nslookup flink-taskmanager-0.flink-taskmanager > Server: 10.0.11.151 > Address 1: 10.0.11.151 ip-10-0-11-151.us = -west-2.compute.internal >=20 > Name: flink-taskmanager-0.flink-taskmanager > Address 1: 10.131.0.111 = flink-taskmanager-0.flink-taskmanager.flink.svc.cluster.local > / # nslookup flink-taskmanager-0 > Server: 10.0.11.151 > Address 1: 10.0.11.151 ip-10-0-11-151.us = -west-2.compute.internal >=20 > nslookup: can't resolve 'flink-taskmanager-0' > / #=20 >=20 > So the name should be postfixed with the service name. How do I force = it? I suspect I am missing config parameter >=20 > =20 > Boris Lublinsky > FDP Architect > boris.lublinsky@lightbend.com > https://www.lightbend.com/ >> On Feb 19, 2019, at 4:33 AM, Konstantin Knauf = > wrote: >>=20 >> Hi Boris,=20 >>=20 >> the solution is actually simpler than it sounds from the ticket. The = only thing you need to do is to set the "taskmanager.host" to the Pod's = IP address in the Flink configuration. The easiest way to do this is to = pass this config dynamically via a command-line parameter.=20 >>=20 >> The Deployment spec could looks something like this: >> containers: >> - name: taskmanager >> [...] >> args: >> - "taskmanager.sh" >> - "start-foreground" >> - "-Dtaskmanager.host=3D$(K8S_POD_IP)" >> [...] >> env: >> - name: K8S_POD_IP >> valueFrom: >> fieldRef: >> fieldPath: status.podIP >>=20 >> Hope this helps and let me know if this works.=20 >>=20 >> Best,=20 >>=20 >> Konstantin >>=20 >> On Sun, Feb 17, 2019 at 9:51 PM Boris Lublinsky = > = wrote: >> I was looking at this issue = https://issues.apache.org/jira/browse/FLINK-11127 = >> Apparently there is a workaround for it. >> Is it possible provide the complete helm chart for it. >> Bits and pieces are in the ticket, but it would be nice to see the = full chart >>=20 >> Boris Lublinsky >> FDP Architect >> boris.lublinsky@lightbend.com >> https://www.lightbend.com/ >>=20 >>=20 >> --=20 >> Konstantin Knauf | Solutions Architect >> +49 160 91394525 >>=20 >> >> Follow us @VervericaData >> -- >> Join Flink Forward - The Apache Flink = Conference >> Stream Processing | Event Driven | Real Time >> -- >> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >> -- >> Data Artisans GmbH >> Registered at Amtsgericht Charlottenburg: HRB 158244 B >> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen =20 >=20 >=20 >=20 > --=20 > Konstantin Knauf | Solutions Architect > +49 160 91394525 > > Follow us @VervericaData > -- > Join Flink Forward - The Apache Flink = Conference > Stream Processing | Event Driven | Real Time > -- > Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > -- > Data Artisans GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen =20 --Apple-Mail=_2D0FACD1-D339-400A-9DC2-470BFF26A0C7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
Boris Lublinsky
FDP = Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/

On Feb 21, 2019, at 2:05 AM, Konstantin Knauf <konstantin@ververica.com> wrote:

Hi Boris,

the exact command = depends on the docker-entrypoint.sh script and the image you are using. = For the example contained in the Flink repository it is "task-manager", = I think. The important thing is to pass "taskmanager.host" to the = Taskmanager process. You can verify by checking the Taskmanager logs. = These should contain lines like below:

2019-02-21 08:03:00,004 = INFO  = org.apache.flink.runtime.taskexecutor.TaskManagerRunner   &= nbsp;  [] -  Program Arguments:
2019-02-21 = 08:03:00,008 INFO  = org.apache.flink.runtime.taskexecutor.TaskManagerRunner   &= nbsp;  [] -     = -Dtaskmanager.host=3D10.12.10.173

In the Jobmanager logs you should see = that the Taskmanager is registered under the IP above in a line similar = to:

2019-02-21 08:03:26,874 INFO  = org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - = Registering TaskManager with ResourceID a0513ba2c472d2d1efc07626da9c1bda = (akka.tcp://flink@10.12.10.173:46531/user/taskmanager_0) at = ResourceManager

A = service per Taskmanager is not required. The purpose of the config = parameter is that the Jobmanager addresses the taskmanagers by IP = instead of hostname.

Hope this helps!

Cheers,

Konstantin



On Wed, Feb = 20, 2019 at 4:37 PM Boris Lublinsky <boris.lublinsky@lightbend.com> wrote:
Also, The suggested workaround does not quite = work.
2019-02-20 = 15:27:43,928 WARN akka.remote.ReliableDeliverySupervisor = - Association with remote system [akka.tcp://flink-metrics@flink-taskmanager-1:6170] has = failed, address is now gated for [50] ms. Reason: [Association failed = with [akka.tcp://flink-metrics@flink-taskmanager-1:6170]] = Caused by: [flink-taskmanager-1: No address associated with hostname]
2019-02-20 = 15:27:48,750 ERROR = org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler= - Caught exception

I think the problem is that its trying = to connect to flink-task-manager-1

Using busybody to experiment with = nslookup, I can see
/ # = nslookup flink-taskmanager-1.flink-taskmanager
Server:    10.0.11.151
Address = 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal
Name:      = flink-taskmanager-1.flink-taskmanager
Address = 1: 10.131.2.136 = flink-taskmanager-1.flink-taskmanager.flink.svc.cluster.local
=
/ # = nslookup flink-taskmanager-1
Server:    10.0.11.151
Address = 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal
nslookup: = can't resolve 'flink-taskmanager-1'
/ # = nslookup flink-taskmanager-0.flink-taskmanager
Server:    10.0.11.151
Address = 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal
Name:      = flink-taskmanager-0.flink-taskmanager
Address = 1: 10.131.0.111 = flink-taskmanager-0.flink-taskmanager.flink.svc.cluster.local
=
/ # = nslookup flink-taskmanager-0
Server:    10.0.11.151
Address = 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal
nslookup: = can't resolve 'flink-taskmanager-0'
/ = # 

So the = name should be postfixed with the service name. How do I force it? I = suspect I am missing config parameter

 

On Feb 19, 2019, at 4:33 AM, Konstantin Knauf <konstantin@ververica.com> wrote:

Hi Boris,

the = solution is actually simpler than it sounds from the ticket. The only = thing you need to do is to set the "taskmanager.host" to the Pod's IP = address in the Flink configuration. The easiest way to do this is to = pass this config dynamically via a command-line = parameter. 

The Deployment spec could looks something like = this:
containers:
- name: taskmanager
[...]
args:
- = "taskmanager.sh"
- "start-foreground"
- = "-Dtaskmanager.host=3D$(K8S_POD_IP)"
[...]
  env:
- name: K8S_POD_IP
valueFrom:
fieldRef:
fieldPath: = status.podIP

Hope this helps and let me know if this = works. 

Best, 

Konstantin

On Sun, Feb 17, 2019 at 9:51 PM Boris = Lublinsky <boris.lublinsky@lightbend.com> = wrote:
Apparently there is a workaround for it.
Is it possible provide the complete helm chart for = it.
Bits and pieces are in the ticket, but it would = be nice to see the full chart




--
Konstantin = Knauf | Solutions = Architect
+49 160 = 91394525


Follow us = @VervericaData
--
Join Flink Forward - The Apache = Flink = Conference
Stream Processing | = Event Driven | Real Time
--
Data Artisans GmbH | = Invalidenstrasse 115, 10115 Berlin, Germany
--
Data Artisans = GmbH
Registered at = Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. = Kostas Tzoumas, Dr. Stephan Ewen    =


--
Konstantin Knauf | Solutions Architect
+49 160 91394525

Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event = Driven | Real Time
--
Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, = Germany
--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 = B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan = Ewen    =

= --Apple-Mail=_2D0FACD1-D339-400A-9DC2-470BFF26A0C7--