flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Jha <dkjhan...@gmail.com>
Subject Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS
Date Mon, 14 Mar 2016 04:42:21 GMT
Hi Stephan & Ufuk,
Thanks for your response.

Yes there is a way in which you can run docker (net = host mode) in which
guest machine's network stack gets shared by docker container.
Unfortunately its not supported by AWS ECS.

I do have one more question for you. Can you guys please explain me what
happens when taskmanager's register themselves to jobmanager in HA mode?
Does each taskmanager gets connected to jobmanager on separate port ? The
reason I'm asking is because if I run 2 taskmanager's (on separate docker
container), they are able to attach themselves to the Jobmanager (another
docker container) ( Flink HA setup using remote zk cluster) but soon after
that they get disconnected. Logs are not very helpful either... I suspect
that each taskmanager gets connected on new port and since by default
docker does not expose all ports, this may happen.... I do not see this
happen when I do not use docker container....

Here is the log file that I saw in jobmanager....

2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 5673db03e679 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
is 1. *Current
number of alive task slots is 1.*
2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 7200a7da4da7 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
is 2. *Current
number of alive task slots is 2.*
2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager terminated.
2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -*
Unregistered task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
<http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
managers 1. Number of available slots 1.*
2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
a.remote.ReliableDeliverySupervisor - Association with remote system
[akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
[5000] ms. Reason is: [Disassociated].
2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
TaskManager akka://flink/user/taskmanager is disassociating.
2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.r.instance.InstanceManager - *Unregistered
task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
<http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
managers 0. Number of available slots 0.*
2016-03-12 08:58:01,465 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.r.instance.InstanceManager - *Registered
TaskManager at 7200a7da4da7
(akka.tcp://flink@172.17.0.3:6121/user/taskmanager
<http://flink@172.17.0.3:6121/user/taskmanager>) as
b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts is 1.
Current number of alive task slots is 1.*
2016-03-12 08:58:03,383 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager terminated.
2016-03-12 08:58:03,384 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -*
Unregistered task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
<http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
managers 0. Number of available slots 0.*
2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registering TaskManager at akka.tcp://flink@172.17.0.3:6121/user/taskmanager
which was marked as dead earlier because of a heart-beat timeout.
2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 7200a7da4da7 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts is 1.
Current number of alive task slots is 1.
2016-03-12 08:58:21,382 PST [WARN]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
a.remote.ReliableDeliverySupervisor - Association with remote system
[akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
[5000] ms. Reason is: [Disassociated].
2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
TaskManager akka://flink/user/taskmanager is disassociating.
2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Unregistered task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager.
Number of registered task managers 0. Number of available slots 0.
2016-03-12 08:58:21,390 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 7200a7da4da7 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts is 1.
Current number of alive task slots is 1.
2016-03-12 08:58:25,433 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-18]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager terminated.
2016-03-12 08:58:25,434 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-18] o.a.f.r.instance.InstanceManager -
Unregistered task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager.
Number of registered task managers 0. Number of available slots 0.
2016-03-12 08:58:28,947 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registering TaskManager at akka.tcp://flink@172.17.0.3:6121/user/taskmanager
which was marked as dead earlier because of a heart-beat timeout.
2016-03-12 08:58:28,948 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 7200a7da4da7 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts is 1.
Current number of alive task slots is 1.


On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <sewen@apache.org> wrote:

> Hi Deepak!
>
> We can currently not split the bind address and advertised address, because
> the Akka library only accepts packages sent explicitly to the bind address
> (not sure why Akka has this artificial limitation, but it is there).
>
> Can you bridge the container IP address to be visible from the outside?
>
> Stephan
>
>
> On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <uce@apache.org> wrote:
>
> > Hey Deepak!
> >
> > Your description of Flink's behaviour is correct. To summarize:
> >
> > # Host Address
> >
> > If you specify a host address as an argument to the JVM (via
> > jobmanager.sh or the start-cluster.sh scripts) then that one is used.
> > If you don't, it falls back to the value configured in flink-conf.yaml
> > (what you describe).
> >
> > # Ports
> >
> > Default used random port and publishes via ZooKeeper. You can
> > configure a port range only via recovery.jobmanager.port (what you
> > describe).
> >
> > ---
> >
> > Your proposal would likely solve the issue, but isn't it possible to
> > handle this outside of Flink? I've found this stack overflow question,
> > which should be related:
> >
> >
> http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address
> >
> > What's your opinion?
> >
>



-- 
Thanks,
Deepak Jha

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message