hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Zaliva <kroko...@gmail.com>
Subject lost TaskTrackers
Date Mon, 09 Feb 2009 03:27:36 GMT
Hi!

I am observing strange situation in my Hadoop cluster. While running
task, eventually it gets into
this strange mode where:

1. JobTracker reports 0 task trackers.

2. Task tracker processes are alive but log file is full of repeating
messages like this:

2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902
081049_0001_m_017698_0 done; removing files.
2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt
_200902081049_0001_m_017698_0 not found in cache
2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902
081049_0001_m_021212_0 done; removing files.
2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt
_200902081049_0001_m_021212_0 not found in cache
2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902
081049_0001_m_022133_0 done; removing files.

with new one appearing every couple of seconds.

In the task tracker log, before these repeating messages last 2 exceptions are:

2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_200902081049_0001_m_075408_3
2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker:
Trying to launch : attempt_200902081049_0001_m_075408_3
2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 8 and trying to launch
attempt_200902081049_0001_m_07
5408_3
2009-02-08 17:46:51,483 WARN org.apache.hadoop.mapred.TaskTracker:
Error initializing attempt_200902081049_0001_m_075408_3:
java.lang.NullPointerException
        at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459)
        at org.apache.hadoop.ipc.Client.call(Client.java:686)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy5.getFileInfo(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy5.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
        at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699)
        at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636)
        at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102)
        at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602)

2009-02-08 17:46:51,483 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 8
2009-02-08 17:46:51,483 INFO org.apache.hadoop.mapred.TaskTracker:
Error cleaning up task runner: java.lang.NullPointerException
        at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.cleanup(TaskTracker.java:2298)
        at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1648)
        at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102)
        at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602)

2009-02-08 17:46:55,622 INFO org.apache.hadoop.mapred.TaskTracker:
Received 'KillJobAction' for job: job_200902081049_0001
2009-02-08 17:46:55,622 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200902081049_0001_m_005647_0 done; removing files.
2009-02-08 17:46:59,270 INFO org.apache.hadoop.mapred.IndexCache: Map
ID attempt_200902081049_0001_m_005647_0 not found in cache

Any suggestions where I should look for the cause of this problem?

Sincerely,
Vadim

P.S. I am using hadoop-0.19.0 on Linux. Java:

java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)

Mime
View raw message