mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Massenzio <ma...@mesosphere.io>
Subject Re: Initial leader election
Date Wed, 25 Nov 2015 17:31:12 GMT
A quick glance of the logs doesn't show anything that stands out, apart
from:

--zk_session_timeout="10secs"

which seems to lead to:

Nov 23 16:50:13 node1 mesos-master[17501]: I1123 16:50:13.594151 17521
recover.cpp:111] Unable to finish the recover protocol in 10secs,
retrying

That is the default value, but maybe your setup may need longer than that
(it is possible that the time it takes for all master nodes to come up and
reach quorum may be the issue).

--
*Marco Massenzio*
Distributed Systems Engineer
http://codetrips.com

On Wed, Nov 25, 2015 at 3:06 AM, Guilherme Moro <guilherme.moro@ammeon.com>
wrote:

> https://issues.apache.org/jira/browse/MESOS-4010
>
> On 24 November 2015 at 13:55, Klaus Ma <klaus1982.cn@gmail.com> wrote:
>
> > I'd suggest to open a JIRA to trace issue; I think you can append
> > master.log & slave.log for owner reference.
> >
> > ----
> > Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
> > Platform Symphony/DCOS Development & Support, STG, IBM GCG
> > +86-10-8245 4084 | klaus1982.cn@gmail.com | http://k82.me
> >
> > On Tue, Nov 24, 2015 at 8:45 PM, Guilherme Moro <
> guilherme.moro@ammeon.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I'm having a problem while trying to create the initial cluster, no
> > leader
> > > is elected.
> > > For a start, let me explain my setup:
> > > 3 nodes
> > > 3 zookeepers
> > > 3 mesos-master services, configured as initctl services and controlled
> by
> > > puppet, RPM's installed are from the RHEL repository at mesosphere
> > > (installed through puppet as well), running on RHEL 6.6
> > > Quorum is set to 2, as expected, all the remaining configs were double
> > > checked and appears to be correct.
> > > Most of times I can get the cluster to bootstrap after rebooting the
> > nodes
> > > (sometimes more than once).
> > > The whole thing resembles a bit
> > > https://issues.apache.org/jira/browse/MESOS-2148 and
> > > https://issues.apache.org/jira/browse/MESOS-2014
> > >
> > > Even when I get the master elected, sometimes another couple of reboots
> > or
> > > restarts of the services are needed to get all the slave nodes added
> > (they
> > > are the same nodes as the masters).
> > >
> > > I can quite easily reproduce this behavior, if someone cares to look at
> > > logs tell me exactly what to collect and what logging flags I should
> > > enable.
> > >
> > > So, should I maybe open a bug or is there any trick to bootstrap the
> > > cluster that I'm losing here.
> > >
> > > Regards,
> > >
> > > Guilherme Moro
> > >
> > > --
> > > This email and any files transmitted with it are confidential and
> > intended
> > > solely for the use of the individual or entity to whom they are
> > addressed.
> > > If you have received this email in error please notify the system
> > manager.
> > > This message contains confidential information and is intended only for
> > the
> > > individual named. If you are not the named addressee you should not
> > > disseminate, distribute or copy this e-mail.
> > >
> > >
> >
>
> --
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message