flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Jha <dkjhan...@gmail.com>
Subject Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS
Date Mon, 14 Mar 2016 15:49:26 GMT
Hi Maximilian,
Thanks for your response. I will wait for the update.

On Monday, March 14, 2016, Maximilian Michels <mxm@apache.org> wrote:

> Hi Deepak,
>
> We'll look more into this problem this week. Until now we considered it a
> configuration issue if the bind address was not externally reachable.
> However, one might not always have the possibility to change this network
> configuration.
>
> Looking further, it is actually possible to let the bind address be
> different from the advertised address. From the Akka FAQ at
> http://doc.akka.io/docs/akka/2.4.1/additional/faq.html:
>
> If you are running an ActorSystem under a NAT or inside a docker container,
> > make sure to set akka.remote.netty.tcp.hostname and
> > akka.remote.netty.tcp.port to the address it is reachable at from other
> > ActorSystems. If you need to bind your network interface to a different
> > address - use akka.remote.netty.tcp.bind-hostname and
> > akka.remote.netty.tcp.bind-port settings. Also make sure your network is
> > configured to translate from the address your ActorSystem is reachable at
> > to the address your ActorSystem network interface is bound to.
> >
>
> It looks like we have to expose this configuration to users who have a
> special network setup.
>
> Best,
> Max
>
> On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha <dkjhanitt@gmail.com
> <javascript:;>> wrote:
>
> > Hi Stephan & Ufuk,
> > Thanks for your response.
> >
> > Yes there is a way in which you can run docker (net = host mode) in which
> > guest machine's network stack gets shared by docker container.
> > Unfortunately its not supported by AWS ECS.
> >
> > I do have one more question for you. Can you guys please explain me what
> > happens when taskmanager's register themselves to jobmanager in HA mode?
> > Does each taskmanager gets connected to jobmanager on separate port ? The
> > reason I'm asking is because if I run 2 taskmanager's (on separate docker
> > container), they are able to attach themselves to the Jobmanager (another
> > docker container) ( Flink HA setup using remote zk cluster) but soon
> after
> > that they get disconnected. Logs are not very helpful either... I suspect
> > that each taskmanager gets connected on new port and since by default
> > docker does not expose all ports, this may happen.... I do not see this
> > happen when I do not use docker container....
> >
> > Here is the log file that I saw in jobmanager....
> >
> > 2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 5673db03e679 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
> > is 1. *Current
> > number of alive task slots is 1.*
> > 2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
> > is 2. *Current
> > number of alive task slots is 2.*
> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager terminated.
> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
> > -*
> > Unregistered task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager
> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> task
> > managers 1. Number of available slots 1.*
> > 2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > a.remote.ReliableDeliverySupervisor - Association with remote system
> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
> > [5000] ms. Reason is: [Disassociated].
> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
> > TaskManager akka://flink/user/taskmanager is disassociating.
> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.r.instance.InstanceManager - *Unregistered
> > task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> task
> > managers 0. Number of available slots 0.*
> > 2016-03-12 08:58:01,465 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.r.instance.InstanceManager - *Registered
> > TaskManager at 7200a7da4da7
> > (akka.tcp://flink@172.17.0.3:6121/user/taskmanager
> > <http://flink@172.17.0.3:6121/user/taskmanager>) as
> > b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts is
> 1.
> > Current number of alive task slots is 1.*
> > 2016-03-12 08:58:03,383 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager terminated.
> > 2016-03-12 08:58:03,384 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
> > -*
> > Unregistered task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager
> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> task
> > managers 0. Number of available slots 0.*
> > 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registering TaskManager at akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager
> > which was marked as dead earlier because of a heart-beat timeout.
> > 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts is
> 1.
> > Current number of alive task slots is 1.
> > 2016-03-12 08:58:21,382 PST [WARN]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > a.remote.ReliableDeliverySupervisor - Association with remote system
> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
> > [5000] ms. Reason is: [Disassociated].
> > 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
> > TaskManager akka://flink/user/taskmanager is disassociating.
> > 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Unregistered task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager.
> > Number of registered task managers 0. Number of available slots 0.
> > 2016-03-12 08:58:21,390 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts is
> 1.
> > Current number of alive task slots is 1.
> > 2016-03-12 08:58:25,433 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-18]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager terminated.
> > 2016-03-12 08:58:25,434 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-18]
> o.a.f.r.instance.InstanceManager -
> > Unregistered task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager.
> > Number of registered task managers 0. Number of available slots 0.
> > 2016-03-12 08:58:28,947 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registering TaskManager at akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager
> > which was marked as dead earlier because of a heart-beat timeout.
> > 2016-03-12 08:58:28,948 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts is
> 1.
> > Current number of alive task slots is 1.
> >
> >
> > On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <sewen@apache.org
> <javascript:;>> wrote:
> >
> > > Hi Deepak!
> > >
> > > We can currently not split the bind address and advertised address,
> > because
> > > the Akka library only accepts packages sent explicitly to the bind
> > address
> > > (not sure why Akka has this artificial limitation, but it is there).
> > >
> > > Can you bridge the container IP address to be visible from the outside?
> > >
> > > Stephan
> > >
> > >
> > > On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <uce@apache.org
> <javascript:;>> wrote:
> > >
> > > > Hey Deepak!
> > > >
> > > > Your description of Flink's behaviour is correct. To summarize:
> > > >
> > > > # Host Address
> > > >
> > > > If you specify a host address as an argument to the JVM (via
> > > > jobmanager.sh or the start-cluster.sh scripts) then that one is used.
> > > > If you don't, it falls back to the value configured in
> flink-conf.yaml
> > > > (what you describe).
> > > >
> > > > # Ports
> > > >
> > > > Default used random port and publishes via ZooKeeper. You can
> > > > configure a port range only via recovery.jobmanager.port (what you
> > > > describe).
> > > >
> > > > ---
> > > >
> > > > Your proposal would likely solve the issue, but isn't it possible to
> > > > handle this outside of Flink? I've found this stack overflow
> question,
> > > > which should be related:
> > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address
> > > >
> > > > What's your opinion?
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Deepak Jha
> >
>


-- 
Sent from Gmail Mobile

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message