mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arunabha Ghosh <arunabha...@gmail.com>
Subject Re: Mesos slaves keep disconnecting
Date Tue, 16 Dec 2014 03:23:14 GMT
Setting the ip parameter did the trick !

Thanks, Benjamin !!

Arunabha

On Mon, Dec 15, 2014 at 5:28 PM, Benjamin Mahler <benjamin.mahler@gmail.com>
wrote:
>
> Currently, the master needs to be able to create a connection back to the
> slave.
> If you look at the logs lines, you'll see the master is seeing the slave
> on 127.0.1.1:
>
> "slave(1)@127.0.1.1:5051 (192.168.48.150)"
>
> The master is going to try to connect to 127.0.1.1:5051, which appears to
> fail immediately (hence the disconnection).
> Can you try setting --ip=192.168.48.150 on the slave? That will ensure the
> slave binds to the IP address you're expecting.
>
> On Mon, Dec 15, 2014 at 4:33 PM, Tim Chen <tim@mesosphere.io> wrote:
>>
>> Is there anything in the ERROR/WARNING logs?
>>
>> Tim
>>
>> On Mon, Dec 15, 2014 at 4:22 PM, Arunabha Ghosh <arunabha.gh@gmail.com>
>> wrote:
>>>
>>> Hi,
>>>     I've setup a test mesos cluster on a few VM's running locally. I
>>> have three masters and two slaves
>>>
>>> masters : 192.168.48.14[5 - 7]
>>> slaves : 192.168.48.15[0 - 1]
>>>
>>> The masters startup correctly and are able to elect a leader. The slaves
>>> can find the master and register, but for some reason they immediately
>>> disconnect.
>>>
>>>
>>> *On the master (mesos-master.INFO)*
>>>
>>> master.cpp:3122] Registered slave
>>> 20141215-160321-2435885248-5050-20424-S68 at slave(1)@127.0.1.1:5051
>>> (192.168.48.150) with cpus(*):1; mem(*):489; disk(*):13901;
>>> ports(*):[31000-32000]
>>> I1215 16:15:51.970082 20448 hierarchical_allocator_process.hpp:442]
>>> Added slave 20141215-160321-2435885248-5050-20424-S68 (192.168.48.150) with
>>> cpus(*):1; mem(*):489; disk(*):13901; ports(*):[31000-32000] (and
>>> cpus(*):1; mem(*):489; disk(*):13901; ports(*):[31000-32000] available)
>>> I1215 16:15:51.970474 20454 master.cpp:839] Slave
>>> 20141215-160321-2435885248-5050-20424-S68 at slave(1)@127.0.1.1:5051
>>> (192.168.48.150) disconnected
>>> I1215 16:15:51.970546 20454 master.cpp:1789] Disconnecting slave
>>> 20141215-160321-2435885248-5050-20424-S68 at slave(1)@127.0.1.1:5051
>>> (192.168.48.150)
>>> I1215 16:15:51.970612 20454 master.cpp:1808] Deactivating slave
>>> 20141215-160321-2435885248-5050-20424-S68 at slave(1)@127.0.1.1:5051
>>> (192.168.48.150)
>>> I1215 16:15:51.970772 20454 hierarchical_allocator_process.hpp:481]
>>> Slave 20141215-160321-2435885248-5050-20424-S68 deactivated
>>> I1215 16:15:51.975980 20453 replica.cpp:655] Replica received learned
>>> notice for position 276
>>> I1215 16:15:51.977501 20453 leveldb.cpp:343] Persisting action (20
>>> bytes) to leveldb took 1.475474ms
>>> I1215 16:15:51.977625 20453 leveldb.cpp:401] Deleting ~2 keys from
>>> leveldb took 50280ns
>>>
>>> *On the slave (mesos-slave.INFO)*
>>>
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: 2014-12-15
>>> 16:06:09,209:18118(0x7fa67d700700):ZOO_INFO@check_events@1750: session
>>> establishment complete on server [192.168.48.147:2181],
>>> sessionId=0x34a5067fd9e0001, negotiated timeout=10000
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.210183 18140
>>> group.cpp:313] Group process (group(1)@127.0.1.1:5051) connected to
>>> ZooKeeper
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.210248 18140
>>> group.cpp:790] Syncing group operations: queue size (joins, cancels, datas)
>>> = (0, 0, 0)
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.210270 18140
>>> group.cpp:385] Trying to create path '/mesos' in ZooKeeper
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.213835 18140
>>> detector.cpp:138] Detected a new leader: (id='55')
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.214570 18140
>>> group.cpp:659] Trying to get '/mesos/info_0000000055' in ZooKeeper
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.215833 18141
>>> detector.cpp:433] A new leading master (UPID=master@192.168.48.145:5050)
>>> is detected
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.220592 18141
>>> state.cpp:33] Recovering state from '/home/agh/mesos-work/meta'
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.220757 18141
>>> state.cpp:62] Failed to find the latest slave from
>>> '/home/agh/mesos-work/meta'
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.226416 18136
>>> status_update_manager.cpp:197] Recovering status update manager
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.226963 18134
>>> containerizer.cpp:281] Recovering containerizer
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.228973 18135
>>> slave.cpp:3466] Finished recovery
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.230242 18137
>>> status_update_manager.cpp:171] Pausing sending status updates
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.230450 18135
>>> slave.cpp:602] New master detected at master@192.168.48.145:5050
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.230873 18135
>>> slave.cpp:627] No credentials provided. Attempting to register without
>>> authentication
>>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.231045 18135
>>> slave.cpp:638] Detecting new master
>>> Dec 15 16:07:09 ubuntu mesos-slave[18118]: I1215 16:07:09.225389 18141
>>> slave.cpp:3321] Current usage 12.01%. Max allowed age: 5.459239732780289days
>>> Dec 15 16:08:09 ubuntu mesos-slave[18118]: I1215 16:08:09.228869 18141
>>> slave.cpp:3321] Current usage 12.01%. Max allowed age: 5.459239732780289days
>>> Dec 15 16:09:09 ubuntu mesos-slave[18118]: I1215 16:09:09.252048 18141
>>> slave.cpp:3321] Current usage 12.01%. Max allowed age: 5.459239732780289days
>>> Dec 15 16:09:27 ubuntu mesos-slave[18118]: I1215 16:09:27.288277 18141
>>> http.cpp:330] HTTP request for '/slave(1)/state.json'
>>> Dec 15 16:10:09 ubuntu mesos-slave[18118]: I1215 16:10:09.271672 18138
>>> slave.cpp:3321] Current usage 12.01%. Max allowed age: 5.459239732780289days
>>>
>>> It does not look like the slave is disconnecting, so why does the master
>>> think the slave keeps disconnecting and deactivate the slave ?
>>>
>>> Thanks,
>>> Arunabha
>>>
>>>
>>>
>>>

Mime
View raw message