mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <benjamin.mah...@gmail.com>
Subject Re: Slave Removedo
Date Mon, 15 Apr 2013 18:34:50 GMT
Filed: https://issues.apache.org/jira/browse/MESOS-435


On Mon, Apr 15, 2013 at 11:27 AM, Benjamin Mahler <benjamin.mahler@gmail.com
> wrote:

> Seems like we need to fix this as it's a common hurdle.
>
> Has anyone looked at the root of this problem?
>
>
> On Mon, Apr 15, 2013 at 11:25 AM, Eduardo Alfaia <eduardocalfaia@gmail.com
> > wrote:
>
>> Hi Guys,
>>
>> the Private IP instead the FQDN is working, however I had had to change
>> the
>> /etc/hosts
>>
>> thanks
>>
>>
>> 2013/4/15 Benjamin Mahler <benjamin.mahler@gmail.com>
>>
>> > Can you try using the private IP instead? You can find it using
>> ifconfig.
>> >
>> >
>> > On Mon, Apr 15, 2013 at 10:33 AM, Eduardo Alfaia
>> > <eduardocalfaia@gmail.com>wrote:
>> >
>> > > Hi Vinod, thanks by your fast replay
>> > >
>> > > I'm not using EC2 but I'm using the name of server like, for example
>> > > blockmon1.ing.unibs.it. Could be this?
>> > >
>> > > I'm using 3 nodes ( 1 Master and 2 Slaves)
>> > >
>> > > Regards
>> > >
>> > >
>> > > 2013/4/15 Vinod Kone <vinodkone@gmail.com>
>> > >
>> > > > Hi Eduardo,
>> > > >
>> > > > This looks like a networking issue. What is your cluster setup like?
>> > > >
>> > > > Are you running on Amazon EC2? We have seen similar behavior before
>> > when
>> > > > users were running Mesos on EC2. If I remember correctly, the fix
>> was
>> > to
>> > > to
>> > > > use private ip addresses for master and slaves, instead of
>> "localhost"
>> > or
>> > > > "public ip".
>> > > >
>> > > > @vinodkone
>> > > >
>> > > >
>> > > > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <
>> > > eduardocalfaia@gmail.com
>> > > > >
>> > > >  wrote:
>> > > >
>> > > > > Hi Guys,
>> > > > > I am newer in Mesos and I am having some problems when running
the
>> > > launch
>> > > > > mesos scripts bellow. Why does the master remove the slave? I
have
>> > seen
>> > > > > something about checkpoint.
>> > > > >
>> > > > > MASTER
>> > > > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
>> > > > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14
>> 23:48:51
>> > by
>> > > > > root
>> > > > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
>> > > > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
>> > > > > 127.0.1.1:5050
>> > > > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
>> > > > > 201304151800-16842879-5050-17720
>> > > > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
>> > > > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > >
>> > > > > se it is not checkpointing!
>> > > > > I0415 18:01:59.379076 17735
>> hierarchical_allocator_process.hpp:423]
>> > > > Removed
>> > > > > slave 201304151800-16842879-5050-17720-28
>> > > > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
>> > > slave
>> > > > on
>> > > > > blockmon2 at slave(1)@127.0.1.1:36820
>> > > > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now
>> considering a
>> > > > slave
>> > > > > at blockmon2:36820 as active
>> > > > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
>> > > > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1;
>> > mem=979;
>> > > > > ports=[31000-32000]; disk=2801
>> > > > > I0415 18:02:00.380813 17737
>> hierarchical_allocator_process.hpp:395]
>> > > Added
>> > > > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
>> > > > mem=979;
>> > > > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
>> > > ports=[31000-32000];
>> > > > > disk=2801 available)
>> > > > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
>> > > > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
>> > > > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected
>> > slave
>> > > > > 201304151800-16842879-5050-17720-29(blockmon2) because it is
not
>> > > > > checkpointing!
>> > > > > I0415 18:02:00.381882 17735
>> hierarchical_allocator_process.hpp:423]
>> > > > Removed
>> > > > > slave 201304151800-16842879-5050-17720-29
>> > > > >
>> > > > > Thanks Guys
>> > > > >
>> > > > > --
>> > > > > MSc Eduardo Costa Alfaia
>> > > > > PhD Student
>> > > > > Università degli Studi di Brescia
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > -- Vinod
>> > > >
>> > > >
>> > > > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
>> > > > <eduardocalfaia@gmail.com>wrote:
>> > > >
>> > > > > Hi Guys,
>> > > > > I am newer in Mesos and I am having some problems when running
the
>> > > launch
>> > > > > mesos scripts bellow. Why does the master remove the slave? I
have
>> > seen
>> > > > > something about checkpoint.
>> > > > >
>> > > > > MASTER
>> > > > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
>> > > > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14
>> 23:48:51
>> > by
>> > > > > root
>> > > > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
>> > > > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
>> > > > > 127.0.1.1:5050
>> > > > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
>> > > > > 201304151800-16842879-5050-17720
>> > > > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
>> > > > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > >
>> > > > > se it is not checkpointing!
>> > > > > I0415 18:01:59.379076 17735
>> hierarchical_allocator_process.hpp:423]
>> > > > Removed
>> > > > > slave 201304151800-16842879-5050-17720-28
>> > > > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
>> > > slave
>> > > > on
>> > > > > blockmon2 at slave(1)@127.0.1.1:36820
>> > > > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now
>> considering a
>> > > > slave
>> > > > > at blockmon2:36820 as active
>> > > > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
>> > > > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1;
>> > mem=979;
>> > > > > ports=[31000-32000]; disk=2801
>> > > > > I0415 18:02:00.380813 17737
>> hierarchical_allocator_process.hpp:395]
>> > > Added
>> > > > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
>> > > > mem=979;
>> > > > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
>> > > ports=[31000-32000];
>> > > > > disk=2801 available)
>> > > > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
>> > > > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
>> > > > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected
>> > slave
>> > > > > 201304151800-16842879-5050-17720-29(blockmon2) because it is
not
>> > > > > checkpointing!
>> > > > > I0415 18:02:00.381882 17735
>> hierarchical_allocator_process.hpp:423]
>> > > > Removed
>> > > > > slave 201304151800-16842879-5050-17720-29
>> > > > >
>> > > > > Thanks Guys
>> > > > >
>> > > > > --
>> > > > > MSc Eduardo Costa Alfaia
>> > > > > PhD Student
>> > > > > Università degli Studi di Brescia
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > MSc Eduardo Costa Alfaia
>> > > PhD Student
>> > > Università degli Studi di Brescia
>> > >
>> >
>>
>>
>>
>> --
>> MSc Eduardo Costa Alfaia
>> PhD Student
>> Università degli Studi di Brescia
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message