mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Hinek <frank.hi...@gmail.com>
Subject Re: Issue with Multinode Cluster
Date Tue, 26 Aug 2014 01:20:17 GMT
Thanks.  Greatly appreciate the help.  Cluster seems to be behaving properly now.

Good point.  I like the idea of having cluster, ip, quorum, work_dir, etc. files in /etc/mesos/
so it is clear what environment variables are being set.



On August 25, 2014 at 8:43:50 PM, Ryan Thomas (r.n.thomas@gmail.com) wrote:

I'm not sure what the best-practice is, but I use the /etc/mesos* method as I find it more
explicit.


On 26 August 2014 10:38, Frank Hinek <frank.hinek@gmail.com> wrote:
Vinod: bingo!  I’ve spent 2 days trying to figure this out.  The only interfaces on the
VMs were eth0 and lo—interesting that it picked the loopback automatically or that the
tutorials didn’t note this.

Ryan: Is it considered better practice to modify /etc/default/mesos-master or write the IP
to /etc/mesos-master/ip ?


On August 25, 2014 at 8:31:42 PM, Ryan Thomas (r.n.thomas@gmail.com) wrote:

If you're using the mesos-init-wrapper you can write the IP to /etc/mesos-master/ip and that
flag will be set. This goes for all the flags, and can be done for the slave as well in /etc/mesos-slave.


On 26 August 2014 10:18, Vinod Kone <vinodkone@gmail.com> wrote:
From the logs, it looks like master is binding to its loopback address (127.0.0.1) and publishing
that to ZK. So the slave is trying to reach the master on its loopback interface, which is
failing.

Start the master with "--ip" flag set to its visible ip (10.1.100.116). Mesosphere probably
has a file (/etc/defaults/mesos-master?) to set these flags.


On Mon, Aug 25, 2014 at 3:26 PM, Frank Hinek <frank.hinek@gmail.com> wrote:
Logs attached from master, slave, and zookeeper after a reboot of both nodes.




On August 25, 2014 at 1:14:07 PM, Vinod Kone (vinodkone@gmail.com) wrote:

what do the master and slave logs say?


On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek <frank.hinek@gmail.com> wrote:
I was able to get a single node environment setup on Ubuntu 14.04.1 following this guide: http://mesosphere.io/learn/install_ubuntu_debian/

The single slave registered with the master via the local Zookeeper and I could run basic
commands by posting to Marathon.

I then tried to build a multi node cluster following this guide: http://mesosphere.io/docs/mesosphere/getting-started/cloud-install/

The guide walks you through using the Mesosphere packages to install Mesos, Marathon, and
Zookeeper one one node that will be the master and on the slave just Mesos.  You then disable
automatic start of: mesos-slave on the master, mesos-master on the slave, and zookeeper on
the slave.  It ends up looking like:

NODE 1 (MASTER):
- IP Address: 10.1.100.116
- mesos-master
- marathon
- zookeeper

NODE 2 (SLAVE):
- IP Address: 10.1.100.117
- mesos-slave

The issue I’m running into is that the slave rarely is able to register with the master
using the Zookeeper.  I can never run any jobs from marathon (just trying a simple sleep
5 command).  Even when the slave does register the Mesos UI shows 1 “Deactivated” slave
— it never goes active.

Here are the values I have for /etc/mesos/zk:

MASTER: zk://10.1.100.116:2181/mesos
SLAVE: zk://10.1.100.116:2181/mesos

Any ideas of what to troubleshoot?  Would greatly appreciate pointers.

Environment details:
- Ubuntu Server 14.04.1 running as VMs on ESXi 5.5U1
- Mesos: 0.20.0
- Marathon 0.6.1

There are no apparent connectivity issues, and I’m not having any problems with other VMs
on the ESXi host.  All VM to VM communication is on the same VLAN and within the same host.

Zookeeper log on master (slave briefly registered so I tried to run a sleep 5 command from
marathon and then the slave disconnected):

2014-08-25 11:50:34,976 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.1.100.117:45778
2014-08-25 11:50:34,977 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793]
- Connection request from old client /10.1.100.117:45778; will be dropped if server is in
r-o mode
2014-08-25 11:50:34,977 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839]
- Client attempting to establish new session at /10.1.100.117:45778
2014-08-25 11:50:34,978 - INFO  [SyncThread:0:ZooKeeperServer@595] - Established session
0x1480b22f7f0000c with negotiated timeout 10000 for client /10.1.100.117:45778
2014-08-25 11:51:05,724 - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627]
- Got user-level KeeperException when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafa9
zxid:0x49 txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode = NodeExists for
/marathon
2014-08-25 11:51:05,724 - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627]
- Got user-level KeeperException when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafaa
zxid:0x4a txntype:-1 reqpath:n/a Error Path:/marathon/state Error:KeeperErrorCode = NodeExists
for /marathon/state
2014-08-25 11:51:09,145 - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627]
- Got user-level KeeperException when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb5
zxid:0x4d txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode = NodeExists for
/marathon
2014-08-25 11:51:09,146 - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627]
- Got user-level KeeperException when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb6
zxid:0x4e txntype:-1 reqpath:n/a Error Path:/marathon/state Error:KeeperErrorCode = NodeExists
for /marathon/state






Mime
View raw message