mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakub Veverka <veverka.k...@gmail.com>
Subject mesos cluster setup
Date Wed, 08 Jul 2015 20:17:21 GMT
Hi Guys,

We have mesos stack up and running and I've started testing what happens
when I shut down one node from cluster. The result was that healing of
cluster took 10~20 minutes and we were hoping for not more than instand or
max 1 minute long recovery.

Here is summarized our setup:

We are running 4 CoreOS hosts.
Each host is capable of running every mesos component but always only once
per node.
Every mesos component is running as docker container:
- Each host is running mesos slave.
- 3 instances of zookeeper (3.4.6) - managed by exhibitor
- 3 instances of mesos-master (0.22.1)
- 2 instances of marathon (0.8.2)

The behavior after one node is removed is:
- mesos masters start failing, sometimes master is elected but it doesn't
have any slaves or tasks, later this master fails as well.
- mesos slave - once there was task hanging in marathon even though slave
was dead for long time and task was unhealty - probably related to this
issue - https://github.com/mesosphere/marathon/issues/1279
- mesos master keeps failing and re-electing leader for about 10 minutes.

I've googled a while and it seems that recommeded concept is to run
separate master and slave nodes (
http://open.mesosphere.com/getting-started/datacenter/install/).
Should this solve our issue?

I am also attaching mesos-master logs from all hosts running mesos master.

Thanks for any advice,
Jakub

Mime
View raw message