hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baran cakici <barancak...@gmail.com>
Subject Lost Task Tracker
Date Tue, 05 Apr 2011 15:21:56 GMT
 Hi,

First of all, I use Hadoop-0.20.2 on Windows XP Pro with Eclipse Plug-In. I
have a Cluster with 1 Master (Jobtracker and Namenode) and 4 Slaves(Datanode
and TaskTracker).

I have some problems about my Hadoop-Cluster last few weeks. When I start a
job with big Input(4GB - it`s may be not to big, but algorithm require some
time), then i lose one of my Task Trackers in several minutes or seconds,
but Datanode still alive. That means, i have still commucation with
this slave. In addition, it is not always happen. I can say 50 procent of my
Jobs...

I examined my Log files and I saw that I lose my Task Tracker after this
Line in tasktracker.log : "JVM Runner jvm_201103311133_0005_m_850277002
spawned."

one slice of my tasktracker.log until Lost Task Tracker problem:

2011-03-31 16:10:03,522 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201103311133_0021_m_000132_0 is done.
2011-03-31 16:10:03,522 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201103311133_0021_m_000132_0  was 0
2011-03-31 16:10:03,538 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2011-03-31 16:10:03,741 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201103311133_0021_m_-1306155067 exited. Number of tasks it ran: 1
2011-03-31 16:10:03,741 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201103311133_0021_m_-470981146 exited. Number of tasks it ran: 1
2011-03-31 16:10:04,069 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201103311133_0021_m_000138_0 task's
state:UNASSIGNED
2011-03-31 16:10:04,069 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
launch : attempt_201103311133_0021_m_000138_0
2011-03-31 16:10:04,069 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 2 and trying to launch
attempt_201103311133_0021_m_000138_0

   - 2011-03-31 16:10:05,882 INFO org.apache.hadoop.mapred.JvmManager: In
   JvmRunner constructed JVM ID: jvm_201103311133_0021_m_34147226
   - 2011-03-31 16:10:05,882 INFO org.apache.hadoop.mapred.JvmManager: JVM
   Runner jvm_201103311133_0021_m_34147226 spawned.
   - 2011-03-31 16:10:07,100 INFO org.apache.hadoop.mapred.TaskTracker:
   LaunchTaskAction (registerTask): attempt_201103311133_0021_m_000140_0 task's
   state:UNASSIGNED
   - 2011-03-31 16:10:07,100 INFO org.apache.hadoop.mapred.TaskTracker:
   Trying to launch : attempt_201103311133_0021_m_000140_0
   - 2011-03-31 16:10:07,100 INFO org.apache.hadoop.mapred.TaskTracker: In
   TaskLauncher, current free slots : 1 and trying to launch
   attempt_201103311133_0021_m_000140_0
   - 2011-03-31 16:10:07,460 INFO org.apache.hadoop.mapred.JvmManager: In
   JvmRunner constructed JVM ID: jvm_201103311133_0021_m_1797481322
   - 2011-03-31 16:10:07,460 INFO org.apache.hadoop.mapred.JvmManager: JVM
   Runner jvm_201103311133_0021_m_1797481322 spawned.
   - 2011-03-31 16:10:10,116 INFO org.apache.hadoop.mapred.TaskTracker: JVM
   with ID: jvm_201103311133_0021_m_34147226 given task:
   attempt_201103311133_0021_m_000138_0
   - 2011-03-31 16:10:14,007 INFO org.apache.hadoop.mapred.TaskTracker: JVM
   with ID: jvm_201103311133_0021_m_1797481322 given task:
   attempt_201103311133_0021_m_000140_0
   - 2011-03-31 16:10:18,694 INFO org.apache.hadoop.mapred.TaskTracker:
   attempt_201103311133_0021_m_000138_0 0.0%

2011-03-31 16:10:20,803 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.0%
2011-03-31 16:10:21,538 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
4685696 bytes for reduce: 0 from map: attempt_201103311133_0021_m_000130_0
given 4685696/4685692
2011-03-31 16:10:21,538 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 157.163.194.181:50060,
dest: 157.163.200.75:2223, bytes: 4685696, op: MAPRED_SHUFFLE, cliID:
attempt_201103311133_0021_m_000130_0
2011-03-31 16:10:21,710 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.050699152%
2011-03-31 16:10:23,803 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.0%
2011-03-31 16:10:24,725 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.21863972%
2011-03-31 16:10:26,803 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.022273919%
2011-03-31 16:10:27,757 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.3422284%
2011-03-31 16:10:29,803 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.119258955%
2011-03-31 16:10:30,850 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.44045892%
2011-03-31 16:10:31,178 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
4618152 bytes for reduce: 0 from map: attempt_201103311133_0021_m_000132_0
given 4618152/4618148
2011-03-31 16:10:31,178 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 157.163.194.181:50060,
dest: 157.163.200.75:2227, bytes: 4618152, op: MAPRED_SHUFFLE, cliID:
attempt_201103311133_0021_m_000132_0
2011-03-31 16:10:32,944 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.20685153%
2011-03-31 16:10:33,882 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.55659634%
2011-03-31 16:10:36,022 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.28004178%
2011-03-31 16:10:36,897 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.6907835%
2011-03-31 16:10:39,132 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.36914998%
2011-03-31 16:10:39,944 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.77318066%
2011-03-31 16:10:42,225 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.4423412%
2011-03-31 16:10:42,991 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.85240847%
2011-03-31 16:10:45,319 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.5346344%
2011-03-31 16:10:46,038 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 0.94430083%
2011-03-31 16:10:48,319 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.636471%
2011-03-31 16:10:49,053 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 1.0%
2011-03-31 16:10:51,319 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.7351242%
2011-03-31 16:10:52,085 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 1.0%
2011-03-31 16:10:54,335 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 0.8974207%
2011-03-31 16:10:57,335 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 1.0%
2011-03-31 16:10:58,944 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000138_0 1.0%
2011-03-31 16:10:58,944 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201103311133_0021_m_000138_0 is done.
2011-03-31 16:10:58,944 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201103311133_0021_m_000138_0  was 0
2011-03-31 16:10:58,944 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 1
2011-03-31 16:10:59,007 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201103311133_0021_m_000140_0 1.0%
2011-03-31 16:10:59,007 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201103311133_0021_m_000140_0 is done.
2011-03-31 16:10:59,007 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201103311133_0021_m_000140_0  was 0
2011-03-31 16:10:59,007 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2011-03-31 16:10:59,210 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201103311133_0021_m_34147226 exited. Number of tasks it ran: 1
2011-03-31 16:10:59,319 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201103311133_0021_m_1797481322 exited. Number of tasks it ran: 1

   - 2011-03-31 16:11:03,210 INFO org.apache.hadoop.mapred.TaskTracker:
   LaunchTaskAction (registerTask): attempt_201103311133_0021_m_000145_0 task's
   state:UNASSIGNED
   - 2011-03-31 16:11:03,210 INFO org.apache.hadoop.mapred.TaskTracker:
   Trying to launch : attempt_201103311133_0021_m_000145_0
   - 2011-03-31 16:11:03,210 INFO org.apache.hadoop.mapred.TaskTracker: In
   TaskLauncher, current free slots : 2 and trying to launch
   attempt_201103311133_0021_m_000145_0
   - 2011-03-31 16:11:03,210 INFO org.apache.hadoop.mapred.TaskTracker:
   LaunchTaskAction (registerTask): attempt_201103311133_0021_m_000146_0 task's
   state:UNASSIGNED
   - 2011-03-31 16:11:04,975 INFO org.apache.hadoop.mapred.TaskTracker:
   Trying to launch : attempt_201103311133_0021_m_000146_0
   - 2011-03-31 16:11:04,975 INFO org.apache.hadoop.mapred.TaskTracker: In
   TaskLauncher, current free slots : 1 and trying to launch
   attempt_201103311133_0021_m_000146_0
   - 2011-03-31 16:11:05,022 INFO org.apache.hadoop.mapred.JvmManager: In
   JvmRunner constructed JVM ID: jvm_201103311133_0021_m_129451324
   - 2011-03-31 16:11:05,022 INFO org.apache.hadoop.mapred.JvmManager: JVM
   Runner jvm_201103311133_0021_m_129451324 spawned.


I await after last line "JVM with ID....", but after this line within 600
seconds I lose my Task Tracker.

I want to know, what happend after "JVM Runner spawned" in Hadoop??? If I
know that, I can maybe find a solution.

and one slice of my jobtracker.log until Lost Task Tracker problem:

2011-03-31 16:13:55,049 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201103311133_0021_m_000162
2011-03-31 16:13:55,049 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_201103311133_0021_m_000163_0' to tip
task_201103311133_0021_m_000163, for tracker
'tracker_eh410pvc.ww005.siemens.net:localhost/127.0.0.1:3264'
2011-03-31 16:13:55,049 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201103311133_0021_m_000163
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker:
attempt_201103311133_0021_m_000145_0 is 184557 ms debug.
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker: Launching
task attempt_201103311133_0021_m_000145_0 timed out.
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201103311133_0021_m_000145_0: Error launching task
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker:
attempt_201103311133_0021_m_000146_0 is 184557 ms debug.
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker: Launching
task attempt_201103311133_0021_m_000146_0 timed out.
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201103311133_0021_m_000146_0: Error launching task
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker: Lost
tracker 'tracker_eh40z1cc.ww005.siemens.net:localhost/127.0.0.1:2861'
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201103311133_0021_m_000000_0: Lost task tracker:
tracker_eh40z1cc.ww005.siemens.net:localhost/127.0.0.1:2861
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201103311133_0021_m_000001_0: Lost task tracker:
tracker_eh40z1cc.ww005.siemens.net:localhost/127.0.0.1:2861
2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201103311133_0021_m_000014_0: Lost task tracker:
tracker_eh40z1cc.ww005.siemens.net:localhost/127.0.0.1:2861


It can't run the task, and then "Error launching task". Why? or it can be
ok... but , why am I lose my Task Tracker??

Thanks,

Regards,

Baran

PS: I opened a new topic, because it looks like an another problem, that I
previously one.

Mime
View raw message