hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johan Oskarson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-707) Final map task gets stuck
Date Sat, 18 Nov 2006 12:46:39 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-707?page=comments#action_12451007 ] 
            
Johan Oskarson commented on HADOOP-707:
---------------------------------------

The jobtracker ui shows the task as still running. The tasktracker web ui also shows the task
as running. Howerver, on that node there is only a datanode and tasktracker running.

In the tasktracker log on that node:

2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 06/11/18
01:39:53 WARN mapred.TaskRunner: java.net.SocketTimeoutException: timed out waiting for rpc
response
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.ipc.Client.call(Client.java:460)
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.mapred.$Proxy0.progress(Unknown Source)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.mapred.Task.reportProgress(Task.java:173)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.mapred.Task.reportProgress(Task.java:162)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:200)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:215)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0  at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1247)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0
2006-11-18 01:39:54,166 INFO org.apache.hadoop.mapred.TaskTracker: task_0197_m_000137_0 0.8654954%
/user/hadoop/data/submissions/1160000000/1163000000/1163730000/1163730000:30169435+6033887
2006-11-18 01:39:54,166 INFO org.apache.hadoop.mapred.TaskRunner: task_0194_m_001875_0 done;
removing files.
2006-11-18 01:39:54,894 INFO org.apache.hadoop.mapred.TaskTracker: task_0197_m_000137_0 1.0%
/user/hadoop/data/submissions/1160000000/1163000000/1163730000/1163730000:30169435+6033887
2006-11-18 01:39:54,894 INFO org.apache.hadoop.mapred.TaskRunner: task_0194_m_001888_0 done;
removing files.
2006-11-18 01:39:55,268 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0197_m_000137_0
is done.

And a bit further down:

2006-11-18 01:39:56,257 WARN org.apache.hadoop.ipc.Server: handler output error
java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)
        at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108)
        at org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:532)

> Final map task gets stuck
> -------------------------
>
>                 Key: HADOOP-707
>                 URL: http://issues.apache.org/jira/browse/HADOOP-707
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.8.0
>         Environment: using latest trunk
>            Reporter: Johan Oskarson
>            Priority: Critical
>
> I've seen numerous jobs lately where the final map task gets stuck, never finishing.
> The jobtracker doesn't reassign the task. A restart of the tasktracker solves the issue
and the job can finish.
> In the web interface it turns up as:
> task_0028_m_000534_0 node17.herd1 RUNNING 0.00%    10-Nov-2006 12:21:12 10-Nov-2006 12:22:19
(1mins, 6sec)
> Task failed to report status for 604 seconds. Killing.
> Only exception I find in that tasktracker log is this (a few times):
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)
>         at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108)
>         at org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:532)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message