hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1201) Progress reporting can be improved for both Map/Reduce tasks
Date Thu, 31 May 2007 21:03:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500489

Owen O'Malley commented on HADOOP-1201:

But once you have made the progress a separate thread, the ping provides little value. The
interface will look like:

boolean updateState(String taskid, int progressCount, float progress, String state, TaskStatus.Phase
phase, Counters count); 

I don't see the point of having two threads that are both calling upto the task tracker every
second, especially since the ping thread is so trivial.

> Progress reporting can be improved for both Map/Reduce tasks
> ------------------------------------------------------------
>                 Key: HADOOP-1201
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1201
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
> Both the map and reduce tasks do progress reporting in separate threads. However, in
the ReduceTask, after the sort phase, the progress reporting happens inline with the reducer
invocations. This slows down the Reduce phase since RPC is involved for every progress report.
The better thing to do would be to do progress reporting for all phases in separate threads
and have the tasks just update the progress fields.
> One proposal is to extract out the reporting stuff that is there in MapTask/ReduceTask
and put it in the Task superclass as a new class, and have methods in the new class that control
what/when progress is reported. Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message