Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 04E76200D36 for ; Mon, 6 Nov 2017 13:41:19 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 0349C160BEC; Mon, 6 Nov 2017 12:41:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C70151609E0 for ; Mon, 6 Nov 2017 13:41:17 +0100 (CET) Received: (qmail 25581 invoked by uid 500); 6 Nov 2017 12:41:16 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 25572 invoked by uid 99); 6 Nov 2017 12:41:16 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Nov 2017 12:41:16 +0000 Received: from mail-qk0-f179.google.com (mail-qk0-f179.google.com [209.85.220.179]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id F35161A0250 for ; Mon, 6 Nov 2017 12:41:15 +0000 (UTC) Received: by mail-qk0-f179.google.com with SMTP id a142so1895561qkb.5 for ; Mon, 06 Nov 2017 04:41:15 -0800 (PST) X-Gm-Message-State: AJaThX7uShWJoy2Ky9RmYmmIjV/klrqxIbNnPnXHmMvsi+d6BSZGYtQE E92TW8kFBhA9rT/qet28wo1qg1OKxWWSgon4HOA= X-Google-Smtp-Source: ABhQp+QmhpZ+Y7z6nnKMeSgIm5YzMDdlYGmNquRyA0ODyWS31R2kL8QNcrfB8Z/SjDsTWEiJZXC2/aBu2i2q92t8+JQ= X-Received: by 10.55.19.158 with SMTP id 30mr6041600qkt.0.1509972074007; Mon, 06 Nov 2017 04:41:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.172.70 with HTTP; Mon, 6 Nov 2017 04:40:33 -0800 (PST) In-Reply-To: <49247705-BA31-4F91-80CC-F006BAD9A497@data-artisans.com> References: <1509630241859-0.post@n4.nabble.com> <49247705-BA31-4F91-80CC-F006BAD9A497@data-artisans.com> From: Till Rohrmann Date: Mon, 6 Nov 2017 13:40:33 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes To: Piotr Nowojski Cc: "Vergilio, Thalita" , "user@flink.apache.org" , Patrick Lucas Content-Type: multipart/alternative; boundary="001a113f8dbc179506055d4fc54f" archived-at: Mon, 06 Nov 2017 12:41:19 -0000 --001a113f8dbc179506055d4fc54f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Thalita, in order to make Flink work, I think you have to expose the JobManager RPC port, the Blob server port and make sure that the TaskManager can talk to each other by exposing the `taskmanager.data.port`. The query server port is only necessary if you want to use queryable state. I've pulled in Patrick who has more experience with running Flink on top of Docker. He'll definitely be able to provide more detailed recommendations. Cheers, Till On Mon, Nov 6, 2017 at 9:22 AM, Piotr Nowojski wrote: > Till, is there somewhere a list of ports that need to exposed that=E2=80= =99s more > up to date compared to docker-flunk README? > > Piotrek > > On 3 Nov 2017, at 10:23, Vergilio, Thalita leedsbeckett.ac.uk> wrote: > > Just an update: by changing the JOB_MANAGER_RPC_ADDRESS to the public IP > of the JobManager and exposing port 6123 as {{PUBLIC_IP}}:6123:6123, I > manged to get the TaskManagers from different nodes and even different > subnets to talk to the JobManager. > > This is how I created the services: > > docker network create -d overlay overlay > > docker service create --name jobmanager --env JOB_MANAGER_RPC_ADDRESS=3D{= { > PUBLIC_IP}} -p 8081:8081 -p{{PUBLIC_IP}}:6123:6123 -p 48081:48081 -p > 6124:6124 -p 6125:6125 --network overlay --constraint 'node.hostname =3D= =3D > ubuntu-swarm-manager' flink jobmanager > > docker service create --name taskmanager --env JOB_MANAGER_RPC_ADDRESS=3D= {{ > PUBLIC_IP}} -p 6121:6121 -p 6122:6122 --network overlay --constraint > 'node.hostname !=3D ubuntu-swarm-manager' flink taskmanager > > However, I am still encountering errors further down the line. When I > submit a job using the Web UI, it fails because the JobManager can't talk > to the TaskManager on port 35033. I presume this is the > taskmanager.data.port, which needs to be set to a range and this range > exposed when I create the service? > > Are there any other ports that I need to open at service creation time? > > Connecting the channel failed: Connecting to remote task manager + '/{{IP= _ADDRESS_OF_MANAGER}}:35033' has failed. This might indicate that the remot= e task manager has been lost. > at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFacto= ry$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:196) > at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFacto= ry$ConnectingChannel.access$000(PartitionRequestClientFactory.java:131) > at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFacto= ry.createPartitionRequestClient(PartitionRequestClientFactory.java:83) > at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.crea= tePartitionRequestClient(NettyConnectionManager.java:59) > at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputCha= nnel.requestSubpartition(RemoteInputChannel.java:112) > at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGat= e.requestPartitions(SingleInputGate.java:433) > at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGat= e.getNextBufferOrEvent(SingleInputGate.java:455) > at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocke= d(BarrierTracker.java:91) > at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInp= ut(StreamInputProcessor.java:213) > at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneIn= putStreamTask.java:69) > at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask= .java:263) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:748) > > > > ------------------------------ > *From:* Piotr Nowojski > *Sent:* 02 November 2017 14:26:32 > *To:* Vergilio, Thalita > *Cc:* user@flink.apache.org > *Subject:* Re: Docker-Flink Project: TaskManagers can't talk to > JobManager if they are on different nodes > > Did you try to expose required ports that are listed in the README when > starting the containers? > > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > > Ports: > =E2=80=A2 The Web Client is on port 48081 > =E2=80=A2 JobManager RPC port 6123 (default, not exposed to host) > =E2=80=A2 TaskManagers RPC port 6122 (default, not exposed to host) > =E2=80=A2 TaskManagers Data port 6121 (default, not exposed to host) > > Piotrek > > On 2 Nov 2017, at 14:44, javalass leedsbeckett.ac.uk> wrote: > > I am using the Docker-Flink project in: > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > > I am creating the services with the following commands: > docker network create -d overlay overlay > docker service create --name jobmanager --env > JOB_MANAGER_RPC_ADDRESS=3Djobmanager -p 8081:8081 --network overlay > --constraint 'node.hostname =3D=3D ubuntu-swarm-manager' flink jobmanager > docker service create --name taskmanager --env > JOB_MANAGER_RPC_ADDRESS=3Djobmanager --network overlay --constraint > 'node.hostname !=3D ubuntu-swarm-manager' flink taskmanager > > I wonder if there's any configuration I'm missing. This is the error I ge= t: > - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/ > user/jobmanager (attempt 4, timeout: 4000 milliseconds) > > > > > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive. > 2336050.n4.nabble.com/ > > > To view the terms under which this email is distributed, please go to:- > http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html > > > --001a113f8dbc179506055d4fc54f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Thalita,

in order to make Flink work= , I think you have to expose the JobManager RPC port, the Blob server port = and make sure that the TaskManager can talk to each other by exposing the `= taskmanager.data.port`. The query server port is only necessary if you want= to use queryable state.=C2=A0

I've pulled in = Patrick who has more experience with running Flink on top of Docker. He'= ;ll definitely be able to provide more detailed recommendations.
=
Cheers,
Till

On Mon, Nov 6, 2017 at 9:22 AM, Piotr Nowoj= ski <piotr@data-artisans.com> wrote:
Till, is there so= mewhere a list of ports that need to exposed that=E2=80=99s more up to date= compared to docker-flunk README?

Piotrek

On 3 Nov 2017, = at 10:23, Vergilio, Thalita <t.vergilio4822@student.leedsbecket= t.ac.uk> wrote:

J= ust an update: by changing the=C2=A0JOB_MANAGER_RPC_ADDRESS to the public IP of the Job= Manager and exposing port 6123 as {{PUBLIC_IP}}:6123:6123, I manged to get = the TaskManagers from different nodes and even different subnets to talk to= the JobManager.

This is how I created the servic= es:

<= p style=3D"margin-top:0px;margin-bottom:0px">

docker network create= -d overlay overlay

docker service create --name j= obmanager --env JOB_MANAGER_RPC_ADDRESS=3D{{PUBLIC_IP}}=C2=A0=C2=A0-p 8081:8081 -p<= span style=3D"font-family:Calibri,Helvetica,sans-serif,EmojiFont,'Apple= Color Emoji','Segoe UI Emoji',NotoColorEmoji,'Segoe UI Sym= bol','Android Emoji',EmojiSymbols;font-size:16px">{{PUBLIC_IP}}= :6123:6123 -p 48081:48081 -p 6124:6124 -p 6125:6125 --network overlay --c= onstraint 'node.hostname =3D=3D ubuntu-swarm-manager' flink jobmana= ger

docker service create --name taskmanager --env= JOB_MANAGER_RPC_ADDRESS=3D{{PUBLIC_IP}}=C2=A0=C2=A0-p 6121:6121 -p 6122:6122 =C2=A0--ne= twork overlay --constraint 'node.hostname !=3D ubuntu-swarm-manager'= ; flink taskmanager

However, I am still encounteri= ng errors further down the line. When I submit a job using the Web UI, it f= ails because the JobManager can't talk to the TaskManager on port 35033= . I presume this is the taskmanager.data.port, which needs to be set to a r= ange and this range exposed when I create the service?

=
Are there any other ports that I need to open at service creation time= ?

Connecting the channel failed: Connecting to remote task mana=
ger + '/{{IP_ADDRESS_OF_MANAGER}}:35033' has failed. This migh=
t indicate that the remote task manager has been lost.
	at org.apache.flink.runtime=
.io.network.netty.PartitionRequestClientFactory$Connecti=
ngChannel.waitForChannel(PartitionRequestClientFactory.java:=
196)
	at org.apache.flink.runtime=
.io.network.netty.PartitionRequestClientFactory$Connecti=
ngChannel.access$000(PartitionRequestClientFactory.java:131)
	at org.apache.flink.runtime=
.io.network.netty.PartitionRequestClientFactory.createPa=
rtitionRequestClient(PartitionRequestClientFactory.java:83)
	at org.apache.flink.runtime=
.io.network.netty.NettyConnectionManager.createPartition=
RequestClient(NettyConnectionManager.java:59)
	at org.apache.flink.runtime=
.io.network.partition.consumer.RemoteInputChannel.reques=
tSubpartition(RemoteInputChannel.java:112)
	at org.apache.flink.runtime=
.io.network.partition.consumer.SingleInputGate.requestPa=
rtitions(SingleInputGate.java:433)
	at org.apache.flink.runtime=
.io.network.partition.consumer.SingleInputGate.getNextBu=
fferOrEvent(SingleInputGate.java:455)
	at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextN=
onBlocked(BarrierTracker.java:91)
	at org.apache.flink.streaming.runtime.i=
o.StreamInputProcessor.processInput(StreamInputProcessor=
.java:213)
	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.r=
un(OneInputStreamTask.java:69)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(St=
reamTask.java:263)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
	at java.lang.Thread.run(Thread.java:748)



From:=C2=A0Piotr Nowojski <piotr@data-artisans.com>
Se= nt:=C2=A002 November 2017 14:26:32
To:=C2=A0Vergilio, Thalita
Cc:=C2=A0
user@flink.apache.or= g
Subject:=C2=A0Re: Docker-Flink Project: TaskManagers can't talk= to JobManager if they are on different nodes
=C2=A0
=
Did you try to expose required ports that are listed i= n the README=C2=A0when starting the containers?


Ports:
=E2=80=A2 The Web Client is on port=C2=A048081
=E2=80=A2 JobManager RPC port=C2=A06123=C2=A0(default, n= ot exposed to host)
=E2=80=A2 TaskManagers = RPC port=C2=A06122=C2=A0(default, not exposed to host)
=E2=80=A2 TaskManagers Data port=C2=A06121=C2=A0(default, not e= xposed to host)

Piotrek

=
On 2 Nov 2017, at 14:44, javalass <t.vergilio4822@student.= leedsbeckett.ac.uk> wrote:

I am using the Docker-Flink project i= n:
https://github.com/apache/flink/tree/m= aster/flink-contrib/docker-flink=C2=A0

I am creating the services w= ith the following commands:
docker network create -d overlay overlay
= docker service create --name jobmanager --env
JOB_MANAGER_RPC_ADDRESS=3D= jobmanager -p 8081:8081 --network overlay
--constraint 'node.ho= stname =3D=3D ubuntu-swarm-manager' flink jobmanager
docker service = create --name taskmanager --env
JOB_MANAGER_RPC_ADDRESS=3Djobmanage= r --network overlay --constraint
'node.hostname !=3D ubuntu-swarm-ma= nager' flink taskmanager

I wonder if there's any configurati= on I'm missing. This is the error I get:
- Trying to register at Job= Manager=C2=A0akka.tcp://flink@jobmanager:6123/=C2=A0=C2=A0
user/jobmanager (= attempt 4, timeout: 4000 milliseconds)







--Sent from:=C2= =A0http://apache-flink-user-mailing-lis= t-archive.2336050.n4.nabble.com/

To view the te= rms under which this email is distributed, please go to:-=C2=A0
= http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html


--001a113f8dbc179506055d4fc54f--