hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2790) TaskInProgress.hasSpeculativeTask is very inefficient
Date Thu, 07 Feb 2008 06:59:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566474#action_12566474

Amar Kamat commented on HADOOP-2790:

bq. Also, another obvious optimization is to check whether the speculative execution flag
is true up front.

Even I noticed that few days back. But I thought HADOOP-2141 might fix that.  
With HADOOP-2119, the calls to {{hasSpeculative()}} might reduce since we are optimizing the
look-ups for finding the higher priority runnable tasks and totally avoiding speculative ones
in these look-ups. So the check for speculative tasks will be done only if we have nothing
else to run. But +1 to do it better than making all the checks all the time. 
Following are the parameters used for deciding {{TaskInProgress.hasSpeculative()}} :
- activeTasks.size() <= MAX_TASK_EXECS _[seems ok]_
- runSpeculative _[should be done earlier, but ok]_
- averageProgress - progress >= SPECULATIVE_GAP _[seems ok]_
- System.currentTimeMillis() - startTime >= SPECULATIVE_LAG :
    This could be checked once in {{TaskInProgress.recomputeProgress()}} and a check will
only be done in {{hasSpeculative()}} if the earlier check resulted as {{false}}. I guess we
can still do better but my guess is that we cant totally avoid {{System.currentTimeMillis()}}
in {{TaskInProgress.hasSpeculative()}}, no?
- completes == 0 _[ok]_
- !isOnlyCommitPending() :
    May be a Map for _COMMIT_PENDING_ tasks can be maintained in _TaskInProgress_ and the
only check made is {{commitPendingStatuses.size() > 0 && commitPendingStatuses.contains(taskId)}}.
The space requirement will be same with a re-arrangement to be done in {{TaskInProgress.recomputeProgress()}}.

> TaskInProgress.hasSpeculativeTask is very inefficient
> -----------------------------------------------------
>                 Key: HADOOP-2790
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2790
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>             Fix For: 0.16.1
> Each call to JobInProgress.findNewTask can call TaskInProgress.hasSpeculativeTask once
per a task. Each call to hasSpeculativeTask calls System.getCurrentTimeMillis, which can result
in hundreds of thousands of calls to getCurrentTimeMillis. Additionally, it calls TaskInProgress.isOnlyCommitPending,
which calls .values() on the map from task id to host name and iterates through them to see
if any of the tasks are in commit pending. It would be better to have a commit pending boolean
flag in the TaskInProgress. It also looks like there are other opportunities here, but those
jumped out at me. We should also look at this method in the profiler.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message