hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shrinivas Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4381) Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable
Date Thu, 05 Jul 2012 21:00:36 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407480#comment-13407480
] 

Shrinivas Joshi commented on MAPREDUCE-4381:
--------------------------------------------

I have attached a revised version of this patch which addresses some of the code review comments
from Steve. Specifically changes in this version are:
* Renamed patch file name to match official contribution patch naming conventions
* Used appropriate naming style for PROGRESS_INTERVAL variable
* Used more appropriate name for the progress interval property
* The new property is now read using existing JobConf.getInt method instead of introducing
a new query method

I have not thought this through completely, but would it be possible to implement a test case
similar to TestCombineOutputCollector for this feature?
                
> Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-4381
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4381
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task, tasktracker
>            Reporter: Shrinivas Joshi
>            Priority: Minor
>         Attachments: MAPREDUCE-4381-branch-1.patch, progress_interval.patch
>
>
> Currently PROGRESS_INTERVAL is a hard-coded value and is set to 3000 msec. We tried making
it a tunable and experimented with different values. In some cases setting it to a smaller
value like 1000 msec helps significantly improve performance of short running jobs such as
piEstimator. This is because the task threads do not end up blocking for as many as 3 seconds
for their last progress update event. We also noticed close to 14% improvement on Mahout KMeans
iteration jobs which take more than 5 minutes on the test cluster that we are using. Please
let me know if this seems to be a good idea. I have an initial patch that I have attached
here. This is based on branch-1 tree. It may need some rework on MRv2 based branches I think.
Also note that I have not changed the variable naming style for PROGRESS_INTERVAL even though
it is not a public static final anymore. I can revise the patch if there are no objections
to this idea. 
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message