mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Smith <scott.sm...@gmail.com>
Subject Re: Problem setting up cluster
Date Fri, 20 Apr 2012 04:47:30 GMT
Well the logs say this:

I0420 04:40:30.870983  8193 master.cpp:814] Attempting to register
slave 201204200437-0-162 at slave@127.0.1.1:51851
I0420 04:40:30.871330  8193 master.cpp:1057] Master now considering a
slave at ip-10-252-94-24.us-west-2.compute.internal:51851 as active
I0420 04:40:30.871415  8193 master.cpp:1588] Adding slave
201204200437-0-162 at ip-10-252-94-24.us-west-2.compute.internal with
cpus=1; mem=1024
I0420 04:40:30.871599  8193 simple_allocator.cpp:71] Added slave
201204200437-0-162 with cpus=1; mem=1024
I0420 04:40:30.871680  8193 master.cpp:1143] Slave 201204200437-0-162
disconnected
I0420 04:40:30.871819  8193 simple_allocator.cpp:83] Removed slave
201204200437-0-162

tcp dump says this:

POST /master/mesos.internal.RegisterSlaveMessage HTTP/1.0
User-Agent: libprocess/slave@127.0.1.1:51851
Connection: Keep-Alive
Transfer-Encoding: chunked

87

..
*ip-10-252-94-24.us-west-2.compute.internal.*ip-10-252-94-24.us-west-2.compute.internal..
.cpus...		.......?..
.mem...		.......@ .?
0



so it looks like its reporting both a valid hostname and a loopback
addr.  Which will the master use?

btw I have both machines in the same security group, and opened all
tcp inbound for the group to the group.


On Thu, Apr 19, 2012 at 9:42 PM, Matei Zaharia <matei@eecs.berkeley.edu> wrote:
> What hostname and port does the slave report for itself (i.e. when the master sees it
connect, what message does it print)? It could be that the master cannot connect back to that
address. Maybe you need to open up communication among machines in your EC2 security groups.
>
> Matei
>
> On Apr 19, 2012, at 9:10 PM, Scott Smith wrote:
>
>> Direct IP/port.  No zookeeper.
>> On Apr 19, 2012 7:35 PM, "John Sirois" <jsirois@twitter.com> wrote:
>>
>>> How are your slaves connecting to the master?  Via zookeeper or via known
>>> hostname/ip ?
>>>
>>> On Thursday, April 19, 2012, Scott Smith wrote:
>>>
>>>> I'm trying to set up a cluster on ec2, but not using the canned
>>>> scripts/image.  I built the latest svn on Ubuntu 11.10 amd64, and copied
>>>> the build to a second node.  Both are c1.medium instances (not that it
>>>> should matter).  No other software is running (no hdfs, no hadoop, etc).
>>>>
>>>> The problem I have is the slave repeatedly (approx once per second)
>>>> connects, advertises its resources, gets added, and then disconnects.  No
>>>> reason is given for disconnecting.  There are no messages on the slave,
>>>> only 5 or 6 messages on the master.
>>>>
>>>> I'm not sure what the next diagnostic step should be; I was hoping
>>> someone
>>>> else ran into the same problem and could point out what I did wrong.  Any
>>>> advice?
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>> --
>>> John Sirois
>>> 303-512-3301
>>>
>



-- 
        Scott

Mime
View raw message