mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Borodachev <nbo...@adobe.com>
Subject RE: Marathon chage of leader and stalled deployments
Date Tue, 28 Apr 2015 14:52:30 GMT
Is it for all 3 processes: master, slave, and marathon?

Thanks
Nikolay

From: Dario Rexin [mailto:dario@mesosphere.io]
Sent: Tuesday, April 28, 2015 9:47 AM
To: user@mesos.apache.org
Subject: Re: Marathon chage of leader and stalled deployments

On each host you have to set it to the interface that is connected to the network your cluster
is running in.


On 28.04.2015, at 16:41, Nikolay Borodachev <nborod@adobe.com<mailto:nborod@adobe.com>>
wrote:
Hi Dario,

This could be the reason but why would it not bind to all network interfaces by default?
To test it out, should I set LIBPROCESS_IP to an IP address of mesos1 server?

Thank you
Nikolay

From: Dario Rexin [mailto:dario@mesosphere.io]
Sent: Tuesday, April 28, 2015 4:31 AM
To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: Re: Marathon chage of leader and stalled deployments

Hi Nikolay,

could this be the problem?

Apr 27 22:36:00 mesos1 marathon[6289]: **************************************************
Apr 27 22:36:00 mesos1 marathon[6289]: Scheduler driver bound to loopback interface! Cannot
communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable
to use a routable IP address.
Apr 27 22:36:00 mesos1 marathon[6289]: **************************************************

This would explain why only a certain node (most likely the one that’s running on the same
machine as the current Mesos leader) can start tasks.

Cheers,
Dario

On 27 Apr 2015, at 23:49, Nikolay Borodachev <nborod@adobe.com<mailto:nborod@adobe.com>>
wrote:

Dario,

The logs are quote lengthy, so I sent them to you directly. Marathon version is 0.8.1.

Thank you
Nikolay

From: Dario Rexin [mailto:dario@mesosphere.io]
Sent: Monday, April 27, 2015 4:01 PM
To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: Re: Marathon chage of leader and stalled deployments

Hi Nikolay,

this is an unexpected behavior. Could you please post the log output from the leading node
around the time you try to scale? Also, what version of Marathon are you running?

Thanks,
Dario


On 27.04.2015, at 20:41, Nikolay Borodachev <nborod@adobe.com<mailto:nborod@adobe.com>>
wrote:
Hello All,

I noticed a strange behavior of a Marathon cluster. The cluster consist of 3 mesos/marathon
masters and 3 slaves.

Once the cluster is freshly started I can start a process (e.g. httpd) and scale it up and
down without any problems. Everything works as it should.
However, if a Marathon leader goes down or gets restarted, the managed processes cannot be
scaled anymore. The scaling request gets queued but does not get executed by a new Marathon
leader.
I found that if I recycle the current leader until the original server becomes a leader again,
the  scaling request would not move.
It is only when the server that used to be a leader when the tasks were created becomes a
leader again then these tasks can be scaled.

Is this a known and expected behavior?

Thanks
Nikolay

Mime
View raw message