mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashic Mahtab <as...@live.com>
Subject RE: Lots of master elections
Date Sat, 04 Jul 2015 13:04:34 GMT
Hm...will delete everything in /var/lib/mesos (which are replicated logs), and retry. Guess
I don't need to delete mesos things under /etc, then. Will report back. Checking the logs,
I see that a master is elected but then writes this to FATAL:
F0704 12:52:38.078475  5847 master.cpp:1176] Recovery failed: Failed to recover registrar:
Failed to perform fetch within 1mins
Then dies. Guess that's kicking off the new election.
-Ashic.

From: nikolaos.ballas@nexusgroup.com
To: user@mesos.apache.org
Subject: RE: Lots of master elections
Date: Sat, 4 Jul 2015 12:47:53 +0000








Based on your configuration under /var/  mesos creates  files. Under the directory mesos.
Go inside var and run on command line find . - name *mesos* 










Sent from my Samsung device





-------- Original message --------

From: Ashic Mahtab <ashic@live.com> 

Date: 04/07/2015 14:34 (GMT+01:00) 

To: Apache Mesos <user@mesos.apache.org> 

Subject: RE: Lots of master elections 




Thanks for the reply, Niklaos. Extrme noob question...when you say mesos files, which are
you referring to? Would I also need to delete the /mesos value in Zookeeper?





From: nikolaos.ballas@nexusgroup.com

To: user@mesos.apache.org

Subject: RE: Lots of master elections

Date: Sat, 4 Jul 2015 12:29:44 +0000




You have to  clean the mesos files and restart the masters 










Sent from my Samsung device





-------- Original message --------

From: Ashic Mahtab <ashic@live.com> 

Date: 04/07/2015 14:08 (GMT+01:00) 

To: user@mesos.apache.org 

Subject: Lots of master elections 




Hello,
Just getting started with Mesos, and in the process of "graduating" from Vagrant to a cluster
on Azure. Here's what I have:



* 1 Zookeeper node exposing 2181, running as expected.
* 2 Mesos masters - mesos1.x.net, mesos2.x.net. Both exposing 5050. These have private and
public ips. All nodes are on the same network, and have access to each other.



[I'll set up a third master, and add slaves soon.]



It all seems ok, and the web UI works. I can see mesos entries in Zookeeper. However, I've
seeing a couple of things:



* A node is elected master. And about a minute later, another election is held. (say, mesos1.x.net)
* If the other node wins, in the UI, I get the message that this is no longer the master and
am redirected.
* Sometimes the redirection is to mesos2.x.net, and all is fine (except another election soon).

* Sometimes the redirection is to the internal ip of mesos2.x.net, which obviously gets a
404.



I should add that all the nodes are the lowest powered crappy Azure instances you can get.




Is this constant re-election "normal"? Should I specify hostnames or public ips in /etc/default/mesos-master?
I tried the latter, but the symptoms remained. Will adding a a third master make it work?
(I have quorum set to 2).



Any help will be greatly appreciated.



Thanks,
Ashic.







 		 	   		  
Mime
View raw message