mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Carlen <devin.car...@gmail.com>
Subject Re: Mesos Master / Slave communications issues
Date Wed, 25 Feb 2015 07:33:20 GMT
Thanks all, figured it out - the env variable for the hostname passed into mesos-master was
being set wrong. Thanks for the input!

Devin




On February 24, 2015 at 2:21:49 PM, Ken Sipe (kensipe@gmail.com) wrote:

It appears your configuration is off… as you suspected.. the master registration should
NOT be 127.0.0.1 or 127.0.1.1.    For each master if you configure the IP in a file named
ip under `/etc/mesos-master` you should be good (after restarting the master)

my configurations under /etc/mesos-master looks like this:
/etc/mesos-master/
├── cluster
├── hostname
├── ip
├── quorum
├── registry
└── work_dir

these are just plan text files.  ip has the internal IP of the master, hostname has the fqdn
of the master, cluster is the name of the cluster, etc.

good luck!
ken

On Feb 24, 2015, at 4:06 PM, Kenneth Su <su.kench@gmail.com> wrote:

Hi Devin,

I am new to Mesos as well, and I just configured it had the same problem like yours.

For your reference, what my fix was use the actually master IP instead, then slave will pick
it up and connected. I really wonder if 127.0.0.1, then Slave will use it to connect itself
and that is why never get to master one.

Hope it helps!

Kenneth

On Tue, Feb 24, 2015 at 2:50 PM, Devin Carlen <devin.carlen@gmail.com> wrote:
Hello all,

I’m new to Mesos but have recently started trying to stand up a cluster using BOSH.  There
is a BOSH release for it at https://github.com/cf-platform-eng/mesos-boshrelease that is
under active development.

I was able to successfully deploy the cluster, however the slaves are not communicating with
the master.  Upon investigation I found that the leader election is happening properly with
ZooKeeper.  For this test I only have 1 Mesos master, 3 Mesos slaves, and 1 ZooKeeper instance
for this test.  All are running on their own VMs.  The single master gets elected upon startup:

I0224 21:20:40.716702 12024 contender.cpp:243] New candidate (id='0') has entered the contest
for leadership
I0224 21:20:40.717182 12024 detector.cpp:134] Detected a new leader: (id='0')
I0224 21:20:40.717718 12030 group.cpp:629] Trying to get '/mesos/info_0000000000' in ZooKeeper
I0224 21:20:40.722229 12030 detector.cpp:351] A new leading master (UPID=master@127.0.0.1:80)
is detected
I0224 21:20:40.722367 12030 master.cpp:734] The newly elected leader is master@127.0.0.1:80
I0224 21:20:40.722394 12030 master.cpp:742] Elected as the leading master!

I thought it odd that the IP listed here is 127.0.0.1.  I have not specified localhost anywhere
and I explicitly specify —ip=0.0.0.0 in my mesos-master command.

The slave sees the election happen, but then appears to connect to 127.0.0.1:80:

I0224 21:24:18.892083 17316 detector.cpp:134] Detected a new leader: (id='0')
I0224 21:24:18.892290 17316 group.cpp:629] Trying to get '/mesos/info_0000000000' in ZooKeeper
I0224 21:24:18.894039 17316 detector.cpp:351] A new leading master (UPID=master@127.0.0.1:80)
is detected
I0224 21:24:18.894130 17316 slave.cpp:500] New master detected at master@127.0.0.1:80
I0224 21:24:18.894383 17316 slave.cpp:525] Detecting new master
I0224 21:24:18.894443 17316 status_update_manager.cpp:162] New master detected at master@127.0.0.1:80
I0224 21:24:18.894630 17320 slave.cpp:1957] master@127.0.0.1:80 exited
W0224 21:24:18.894665 17320 slave.cpp:1960] Master disconnected! Waiting for a new master
to be elected

At this point the slave never successfully connects.  Just to verify, I also checked what
ZooKeeper was reporting:

$ /zkCli.sh get /mesos/info_0000000000

201502242120-16777343-80-12000��P"master@127.0.0.1:80
cZxid = 0x20
ctime = Tue Feb 24 21:20:40 UTC 2015
mZxid = 0x20
mtime = Tue Feb 24 21:20:40 UTC 2015
pZxid = 0x20
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14bbd711b6e0012
dataLength = 60
numChildren = 0

So somehow the IP 127.0.0.1 is written instead of the correct IP.  Any thoughts on how I
can fix this?

Best,

Devin



Mime
View raw message