mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kesler <DKes...@yodle.com>
Subject Mesos marks framework as registered successfully despite being unreachable
Date Wed, 04 Feb 2015 20:22:29 GMT
I've been playing around with marathon and mesos recently and I was encountering a bunch of
weird, inconsistent behavior with marathon.  It turns out that some overly-strict iptables
rules were blocking traffic between the mesos master and the ephemeral port of the marathon
framework leader (unless by chance they were on the same box).

The net result is that mesos would constantly spam re-registration requests, think that they
succeeded, then disconnect the framework since it couldn't connect.  Mesos would mark the
framework as active in the ui and successfully registered (Although the re-registered time
was getting continuously updated.  During this time, the mesos leader's logs contained tons
of entries of the form:

Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.611101 12534 master.cpp:1573]
Re-registering framework 20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6)  at scheduler-d79772ac-cfca-4fd5-a503-879f1a1ee190@10.3.24.57:58021
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.611127 12534 master.cpp:1602]
Framework 20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6) at scheduler-d79772ac-cfca-4fd5-a503-879f1a1ee190@10.3.24.57:58021
failed over
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.611335 12534 hierarchical_allocator_process.hpp:375]
Activated framework 20141111-001826-924320522-5050-26663-0000
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.611882 12534 master.cpp:3843]
Sending 4 offers to framework 20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6) at
scheduler-d79772ac-cfca-4fd5-a503-879f1a1ee190@10.3.24.57:58021
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.612428 12529 master.cpp:789]
Framework 20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6) at scheduler-d79772ac-cfca-4fd5-a503-879f1a1ee190@10.3.24.57:58021
disconnected
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.612452 12529 master.cpp:1752]
Disconnecting framework 20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6) at scheduler-d79772ac-cfca-4fd5-a503-879f1a1ee190@10.3.24.57:58021
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.612463 12529 master.cpp:1768]
Deactivating framework 20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6) at scheduler-d79772ac-cfca-4fd5-a503-879f1a1ee190@10.3.24.57:58021
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.612586 12530 hierarchical_allocator_process.hpp:405]
Deactivated framework 20141111-001826-924320522-5050-26663-0000

Where 10.3.0.57 was the box hosting the marathon leader.

I've posted this as an issue in marathon's github (https://github.com/mesosphere/marathon/issues/1140),
but I also wanted to post here as it may be an issue that mesos seems to not be handling the
case where it cannot successfully connect to a framework.  (Obviously mesos handling this
better wouldn't fix the issues that crop up in marathon, but it'd be nice if mesos gave some
indication that it's not actually able to successfully communicate with a framework.



Mime
View raw message