mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johnas, Nalini" <njoh...@ebay.com>
Subject RE: Mesos slave not starting up
Date Sat, 27 Jul 2013 16:39:44 GMT

Downloaded and installed 0.12.0 and didn’t come across this issue and the install was smooth
on Red Hat.

-Nalini

From: Johnas, Nalini [mailto:njohnas@ebay.com]
Sent: Friday, July 26, 2013 8:13 PM
To: <user@mesos.apache.org>
Subject: RE: Mesos slave not starting up

Hi Vinod,

Still couldn’t find out why slave is not registering with master.  Hence I restarted from
scratch to build it, but now I am stuck at  the “make check” step with the following test
failed. Looks like all related to cgroups.

[==========] 230 tests from 44 test cases ran. (75199 ms total)
[  PASSED  ] 207 tests.
[  FAILED  ] 23 tests, listed below:
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen
[  FAILED  ] CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryFreezerTest.ROOT_CGROUPS_Freeze
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryFreezerTest.ROOT_CGROUPS_Kill
[  FAILED  ] CgroupsAnyHierarchyWithCpuMemoryFreezerTest.ROOT_CGROUPS_Destroy
[  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
[  FAILED  ] IsolatorTest/1.Usage, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.RecoverSlaveState, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.RecoverStatusUpdateManager, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.ReconnectExecutor, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.RecoverUnregisteredExecutor, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.RecoverTerminatedExecutor, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.CleanupExecutor, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.RemoveNonCheckpointingFramework, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.NonCheckpointingFramework, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.NonCheckpointingSlave, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.KillTask, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.GCExecutor, where TypeParam = mesos::internal::slave::CgroupsIsolator
[  FAILED  ] SlaveRecoveryTest/1.ShutdownSlave, where TypeParam = mesos::internal::slave::CgroupsIsolator

Thanks
Nalini


From: Johnas, Nalini
Sent: Thursday, July 25, 2013 7:00 PM
To: <user@mesos.apache.org<mailto:user@mesos.apache.org>>
Subject: Re: Mesos slave not starting up

Thanks Vinod. Actually i am running the slave and master on the same node for now (playing
with just one VM). I will remove the redirect and see what's going on

Nalini

Sent from my iPhone

On Jul 25, 2013, at 10:15 AM, "Vinod Kone" <vinodkone@gmail.com<mailto:vinodkone@gmail.com>>
wrote:
Hmm. Looks like the slaves are not registering with the master. Could you just manually ssh
into one slave box and start the slave to see whats going wrong? I would recommend not redirecting
the stdout/stderr to /dev/null when you do this, so that you can catch the error.

On Thu, Jul 25, 2013 at 9:03 AM, Johnas, Nalini <njohnas@ebay.com<mailto:njohnas@ebay.com>>
wrote:
Hi Vinod,

Tried with and without the isolation parameter.

/usr/local/sbin/mesos-start-slaves.sh --isolation=cgroups

Here’s what I modified in the start slave script. I have the same in the other VM as well.
# Launch slaves.
for slave in ${SLAVES}; do
  echo "Starting mesos-slave on ${slave}"
  #echo ssh ${SSH_OPTS} ${slave} "${daemon} mesos-slave </dev/null >/dev/null"
  #ssh ${SSH_OPTS} ${slave} "${daemon} mesos-slave </dev/null >/dev/null" &

  echo ssh ${SSH_OPTS} ${slave} "${daemon} mesos-slave --master=${MESOS_MASTER} </dev/null
>/dev/null"
  ssh ${SSH_OPTS} ${slave} "${daemon} mesos-slave --master=${MESOS_MASTER} </dev/null >/dev/null"
&

  sleep 0.1
done

wait # Wait for all the ssh's to finish.

Thanks
Nalini

From: Vinod Kone [mailto:vinodkone@gmail.com<mailto:vinodkone@gmail.com>]
Sent: Thursday, July 25, 2013 8:23 AM
To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Cc: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: Re: Mesos slave not starting up

How did you start the slave?

@vinodkone
Sent from my mobile

On Jul 25, 2013, at 8:13 AM, "Johnas, Nalini" <njohnas@ebay.com<mailto:njohnas@ebay.com>>
wrote:
Hi,

I recently downloaded the version of mesos  from https://github.com/airbnb/mesos/tree/testing.
This was pointed to me by Brenden from twitter where he has Hadoop working in this.  I configured
everything and started mesos, I see master running, but don’t see the slave running.  I
don’t see any logs for the slave to deep dive.  I have full permission as root, hence installed
and running as “root” user.

ps -ef | grep mesos
root      1914     1  0 14:43 ?        00:00:00 /usr/local/sbin/mesos-master --port=5051

I am not facing this issue with the mesos0.10.0 downloaded from http://mesos.apache.org/ 
which I have it running on another VM. (haven’t tried the latest version 0.12.0).

Can someone help me understand , what could be the potential root cause to this behavior.
Below is the log from the master.

Log file created at: 2013/07/25 00:51:22
Running on machine: xxxx (masked the server name on purpose)
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0725 00:51:22.906127  2077 main.cpp:114] Build: 2013-07-25 00:18:50 by root
I0725 00:51:22.906353  2077 main.cpp:115] Starting Mesos master
I0725 00:51:22.906492  2079 master.cpp:232] Master started on 127.0.0.1:5051<http://127.0.0.1:5051>
I0725 00:51:22.906543  2079 master.cpp:247] Master ID: 201307250051-16777343-5051-2077
W0725 00:51:22.906720  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
I0725 00:51:22.907521  2079 master.cpp:587] Elected as master!
W0725 00:51:27.907698  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:51:32.908694  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:51:37.909572  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:51:42.910531  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:51:47.911334  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:51:52.912281  2080 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:51:57.913521  2080 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:02.914510  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:07.915556  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:12.916482  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:17.916887  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:22.917218  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:27.918201  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:32.919304  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:37.920503  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:42.921346  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:47.922140  2078 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:52.923137  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:52:57.924008  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:53:02.925042  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves
W0725 00:53:07.926026  2079 master.cpp:84] No whitelist given. Advertising offers for all
slaves

Thanks
Nalini



Mime
View raw message