mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vi...@twitter.com>
Subject Re: Tasks always lost when running hadoop test!
Date Wed, 15 May 2013 15:14:54 GMT
logs? Also what version of mesos?

@vinodkone
Sent from my mobile 

On May 15, 2013, at 12:00 AM, 王瑜 <wangyu@nfs.iscas.ac.cn> wrote:

> Hi Ben,
> 
> I think the problem is mesos have found the executor on hdfs://master/user/mesos/hadoop.tar.gz,
but it did not download it, so did not use it.
> Mesos found the executor, so it did not output error, just update the task status as
lost; but mesos did not use the executor, so the executor directory contains nothing! 
> 
> But I am not very familiar with source code, so I do not know why mesos can not use the
executor. And I also do not know whether my analysis is right. Thanks very much for your help!
> 
> 
> 
> 
> Wang Yu
> 
> 发件人: 王瑜
> 发送时间: 2013-05-15 11:04
> 收件人: mesos-dev
> 抄送: Benjamin Mahler
> 主题: 回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker:
http://slave5:50060
> Hi, Ben,
> 
> I have reworked the test, and checked log directory again, it is still null. The same
as following.
> I think there is the problem with my executor, but I do not know how to let the executor
works. Logs is as following...
> " Asked to update resources for an unknown/killed executor" why it always kill the executor?
> 
> 1. I opened all the executor directory, but all of them are null. I do not know what
happened to them...
> [root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
> 总用量 0
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
> .  ..
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
> 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always
lost. But there is no error any more, I still do not know what happened to the executor...Logs
on one slave is as follows. Please help me, thanks very much!
> 
> mesos-slave.INFO
> Log file created at: 2013/05/13 09:12:54
> Running on machine: slave1
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root
> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668
> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000];
disk=29143
> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy
root
> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050
> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO'
> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050
> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0
> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1'
for removal
> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1'
for removal
> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1'
for removal
> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1'
for removal
> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1'
for removal
> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1'
for removal
> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2'
for removal
> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age:
5.11days
> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age:
5.11days
> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age:
5.11days
> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework
201305130913-33597632-5050-3893-0000
> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0
(cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup
mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup controls for executor
executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1;
mem=1280
> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for
executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes'
to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes'
to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening for OOM events
for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at = 24552
> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task Task_Tracker_1 for framework
201305130913-33597632-5050-3893-0000
> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_1
(cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup
mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup controls for executor
executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1;
mem=1280
> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for
executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes'
to 1342177280 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening for OOM events
for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at = 24628
> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_0
of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_0
of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is triggered for
executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid
6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor
executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
after 1 attempts
> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully destroyed cgroup
mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor 'executor_Task_Tracker_0' of framework
201305130913-33597632-5050-3893-0000 has exited with status 1
> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update TASK_LOST from task
Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update TASK_LOST from task
Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update resources for an
unknown/killed executor
> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status update TASK_LOST
from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating StatusUpdate stream
for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE for status
update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding status update TASK_LOST
from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the master at
master@192.168.0.2:5050
> I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status update for task
Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status update acknowledgement
for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK for status update
TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487547 24185 status_upda

Mime
View raw message