hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sri Ramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2167) Reduce tips complete 100%, but job does not complete saying reduces still running.
Date Wed, 07 Nov 2007 12:17:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amareshwari Sri Ramadasu updated HADOOP-2167:
---------------------------------------------

    Fix Version/s: 0.16.0

> Reduce tips complete 100%, but job does not complete saying reduces still running.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-2167
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2167
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: Amareshwari Sri Ramadasu
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>
> Job's reduces are stuck at 99.43% progress and 2 reduces in running state and Job is
not complete. 
> But the reduce task list on the job tracker shows they are complete 100% and marked as
SUCCEEDED and Finishtime is available jobtasks.jsp and jobhistory also.
> With ipc.client.timeout = 600000, the exceptions on TT's running the reduces are
> On one of the TTs, the logs show the following:
> 2007-11-07 08:34:16,092 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200711070637_0001_r_000150_0
is done.
> 2007-11-07 08:35:34,013 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200711070637_0001_r_000156_0
is done.
> 2007-11-07 08:42:44,751 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception:
java.net.SocketTimeoutException: timedout waiting for rpc response
>         at org.apache.hadoop.ipc.Client.call(Client.java:484)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
>         at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055)
> 2007-11-07 08:42:44,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status'
to .................
> On the other TT,
> 2007-11-07 08:40:30,484 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200711070637_0001_r_000160_0
is done.
> 2007-11-07 08:42:45,508 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception:
java.net.SocketTimeoutException: timedout waiting for rpc response
>         at org.apache.hadoop.ipc.Client.call(Client.java:484)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
>         at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055)
> 2007-11-07 08:42:45,508 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status'
to ..........
> On JT logs, the reduce tasks are done successfully:
> 2007-11-07 06:39:09,151 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200711070637_0001_r_000160_0'
to tip tip_200711070637_0001_r_000160, for tracker 'x'
> 2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task
'task_200711070637_0001_r_000160_0' to 'y'
> 2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200711070637_0001_r_000160_0'
has completed tip_200711070637_0001_r_000160 successfully.
> This would suggest that if tasks are done before the timeout, the problem occurs in progress
update. This is also not consistent since other reduce tasks in the same situation are successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message