mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johnas, Nalini" <njoh...@ebay.com>
Subject FW: Mesos slave not starting up
Date Sun, 04 Aug 2013 17:34:13 GMT
Hi Vinod,

No executor error log, but this is what I see.

>>ls  -lrt
total 8
-rw-r--r-- 1 njohnas_dev hadoop_dev   0 Aug  4 01:52 stdout
-rw-r--r-- 1 njohnas_dev hadoop_dev   0 Aug  4 01:52 stderr
-rw-r--r-- 1 njohnas_dev hadoop_dev 828 Aug  4 01:52 syslog
-rw-r--r-- 1 njohnas_dev hadoop_dev 153 Aug  4 01:52 log.index


>>cat syslog
2013-08-04 01:52:39,719 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2013-08-04 01:52:40,022 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code
0
2013-08-04 01:52:40,038 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin@44676e3f<mailto:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@44676e3f>
2013-08-04 01:52:40,119 INFO org.apache.hadoop.mapred.Task: Task:attempt_201308040151_0002_m_000004_0
is done. And is in the process of commiting
2013-08-04 01:52:43,020 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201308040151_0002_m_000004_0'
done.
2013-08-04 01:52:43,050 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
[njohnas_dev@mshddp-18887 attempt_201308040151_0002_m_000004_0]$


Based on this slave log, it is successfully launching the task. Somewhere in the process_based_isolation
step is where the task is getting lost.
It repeats the cycle of staging and losing the task.

I0804 01:53:47.136688 11058 slave.cpp:475] Got assigned task Task_Tracker_115 for framework
201308040150-3892119818-5051-11035-0000
I0804 01:53:47.138695 11058 paths.hpp:235] Created executor directory '/tmp/mesos/slaves/201308040150-3892119818-5051-11035-0/frameworks/201308040150-3892119818-5051-11035-0000/executors/executor_Task_Tracker_115/runs/d9094b15-540e-4370-a5b5-042b8c5ae6fa'

I0804 01:53:47.139754 11055 slave.cpp:361] Successfully attached file '/tmp/mesos/slaves/201308040150-3892119818-5051-11035-0/frameworks/201308040150-3892119818-5051-11035-0000/executors/executor_Task_Tracker_115/runs/d9094b15-540e-4370-a5b5-042b8c5ae6fa'

I0804 01:53:47.140209 11058 process_based_isolation_module.cpp:108] Launching executor_Task_Tracker_115
(cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201308040150-3892119818-5051-11035-0/frameworks/201308040150-3892119818-5051-11035-0000/executors/executor_Task_Tracker_115/runs/d9094b15-540e-4370-a5b5-042b8c5ae6fa
with resources cpus=1; mem=1280' for framework 201308040150-3892119818-5051-11035-0000
I0804 01:53:47.141726 11058 process_based_isolation_module.cpp:153] Forked executor at 16852
I0804 01:53:47.629719 11056 process_based_isolation_module.cpp:344] Telling slave of lost
executor executor_Task_Tracker_115 of framework 201308040150-3892119818-5051-11035-0000


-Nalini

From: Vinod Kone [mailto:vinodkone@gmail.com]
Sent: Saturday, August 03, 2013 2:48 PM
To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Cc: user@mesos.apache.org<mailto:user@mesos.apache.org>; Johnas, Nalini
Subject: Re: Mesos slave not starting up

What do the executor logs say? You can find them in their sandbox.

@vinodkone
Sent from my mobile

On Aug 3, 2013, at 2:14 PM, "Johnas, Nalini" <njohnas@ebay.com<mailto:njohnas@ebay.com>>
wrote:
Vinod,

I have mesos 0.12.0 and Hadoop up and running,. But when I run the task it loads it up, but
the task goes into LOST state post the error highlighted below.

W0803 20:55:01.061257 20259 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201308010609-3892119818-5051-20233-0001': Unknown executor

Slave logs enclosed below. Any input on how I can rectify this.

-Nalini

I0803 20:55:00.654008 20257 slave.cpp:475] Got assigned task Task_Tracker_120 for framework
201308010609-3892119818-5051-20233-0007
I0803 20:55:00.656186 20257 paths.hpp:235] Created executor directory '/tmp/mesos/slaves/201308010609-3892119818-5051-20233-0/frameworks/201308010609-3892119818-5051-20233-0007/executors/executor_Task_Tracker_120/runs/5a72375e-0938-4eaa-a009-281b178fc19b'
I0803 20:55:00.656919 20257 process_based_isolation_module.cpp:108] Launching executor_Task_Tracker_120
(cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201308010609-3892119818-5051-20233-0/frameworks/201308010609-3892119818-5051-20233-0007/executors/executor_Task_Tracker_120/runs/5a72375e-0938-4eaa-a009-281b178fc19b
with resources cpus=1; mem=1280' for framework 201308010609-3892119818-5051-20233-0007
I0803 20:55:00.659153 20257 process_based_isolation_module.cpp:153] Forked executor at 9360
I0803 20:55:00.662657 20254 slave.cpp:361] Successfully attached file '/tmp/mesos/slaves/201308010609-3892119818-5051-20233-0/frameworks/201308010609-3892119818-5051-20233-0007/executors/executor_Task_Tracker_120/runs/5a72375e-0938-4eaa-a009-281b178fc19b'
W0803 20:55:01.061257 20259 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201308010609-3892119818-5051-20233-0001': Unknown executor
I0803 20:55:01.562062 20258 process_based_isolation_module.cpp:344] Telling slave of lost
executor executor_Task_Tracker_120 of framework 201308010609-3892119818-5051-20233-0007
I0803 20:55:01.564221 20261 slave.cpp:1053] Executor 'executor_Task_Tracker_120' of framework
201308010609-3892119818-5051-20233-0007 has terminated with signal Aborted
I0803 20:55:01.567446 20261 slave.cpp:830] Status update: task Task_Tracker_120 of framework
201308010609-3892119818-5051-20233-0007 is now in state TASK_LOST
I0803 20:55:01.569561 20261 gc.cpp:97] Scheduling /tmp/mesos/slaves/201308010609-3892119818-5051-20233-0/frameworks/201308010609-3892119818-5051-20233-0007/executors/executor_Task_Tracker_120/runs/5a72375e-0938-4eaa-a009-281b178fc19b
for removal
I0803 20:55:01.570945 20254 slave.cpp:727] Got acknowledgement of status update for task Task_Tracker_120
of framework 201308010609-3892119818-5051-20233-0007
I0803 20:55:01.562294 20258 process_utils.hpp:64] Stopping ... 9360

From: Johnas, Nalini [mailto:njohnas@ebay.com]
Sent: Wednesday, July 31, 2013 9:31 PM
To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: RE: Mesos slave not starting up


Yes I tried to install 0.9.0 a while back, but I removed everything.

-Nalini

From: vinod@twitter.com<mailto:vinod@twitter.com> [mailto:vinod@twitter.com] On Behalf
Of Vinod Kone
Sent: Wednesday, July 31, 2013 9:14 PM
To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: Re: Mesos slave not starting up

That is strange that it complaints about ExecutorInfo not having 'name' field. This was field
was introduced in 0.12.0. Do you have any old mesos (pre 0.12.0) artifacts lying around that
are improperly getting linked into?

On Wed, Jul 31, 2013 at 7:20 AM, Johnas, Nalini <njohnas@ebay.com<mailto:njohnas@ebay.com>>
wrote:
Hi Vinod,

Here’s the error it throws out.   I have python 2.6.6.

Note: Google Test filter = *Python*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ExamplesTest
[ RUN      ] ExamplesTest.PythonFramework
Using temporary directory '/tmp/ExamplesTest_PythonFramework_URC9FN'
Traceback (most recent call last):
  File "/root/mesos-0.12.0/build/../src/examples/python/test_framework.py", line 126, in <module>
    executor.name<http://executor.name> = "Test Executor (Python)"
AttributeError: 'ExecutorInfo' object has no attribute 'name'
../../src/tests/script.cpp:74: Failure
Failed
python_framework_test.sh exited with status 1
[  FAILED  ] ExamplesTest.PythonFramework (96 ms)
[----------] 1 test from ExamplesTest (96 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (96 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ExamplesTest.PythonFramework

1 FAILED TEST
  YOU HAVE 1 DISABLED TEST

-Nalini

From: vinod@twitter.com<mailto:vinod@twitter.com> [mailto:vinod@twitter.com<mailto:vinod@twitter.com>]
On Behalf Of Vinod Kone
Sent: Tuesday, July 30, 2013 9:29 PM

To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: Re: Mesos slave not starting up

Do you know why the python test is failing? If you want to get more verbose info you can do

./bin/mesos-tests.sh --gtest_filter="*Python*" --verbose

I wouldn't recommend 0.13.0 for production because it has not been tested. In fact its not
even released! That said if you are feeling brave enough, you can checkout 0.13.0-rc4<https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=5c19b044a6d8d7b0f24c219965b3b7ab8674d7ba>
tag.

On Tue, Jul 30, 2013 at 6:38 PM, Johnas, Nalini <njohnas@ebay.com<mailto:njohnas@ebay.com>>
wrote:

Thanks Vinod.  Yes, this is the first Hadoop task and this is a VM with 4CPU & 32G RAM

Right now installing it on a bigger grid, but running into a weird problem (make check error
with python test framework) which I didn’t face during the previous installation. (same
0.12.0)

It is mentioned in the mailing list below as well.  Is there some place where I can download
13 and try out.

http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201306.mbox/%3CCA+5QmYCiWgEA44nbrb+26EuRnd1aArnpqv2z79Q5Y5C_cmb2=Q@mail.gmail.com%3E

-Nalini


From: vinod@twitter.com<mailto:vinod@twitter.com> [mailto:vinod@twitter.com<mailto:vinod@twitter.com>]
On Behalf Of Vinod Kone
Sent: Monday, July 29, 2013 11:58 AM

To: user@mesos.apache.org<mailto:user@mesos.apache.org>
Subject: Re: Mesos slave not starting up


On Sun, Jul 28, 2013 at 11:29 AM, Johnas, Nalini <njohnas@ebay.com<mailto:njohnas@ebay.com>>
wrote:
13/07/28 09:19:48 INFO mapred.MesosScheduler: Declining offer with insufficient resources
for a TaskTracker:
  cpus: offered 4.0 needed 5.0
  mem : offered 31084.0 needed 9472.0


Looks like the cpus offered (4) are not enough to run the hadoop task (5). Is this the first
hadoop task that was launched? By the way, you can manually specify the resources on the slave
command line with resources flag(e.g., "--resources=cpus:14;mem:21913;ports:[31000-32000];disk:400000").


Mime
View raw message