hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2790) TaskInProgress.hasSpeculativeTask is very inefficient
Date Thu, 07 Feb 2008 08:35:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566496#action_12566496

Amar Kamat commented on HADOOP-2790:

!isOnlyCommitPending() :
May be a Map for COMMIT_PENDING tasks can be maintained in TaskInProgress and the only check
made is commitPendingStatuses.size() > 0 && commitPendingStatuses.contains(taskId).
The space requirement will be same with a re-arrangement to be done in TaskInProgress.recomputeProgress().
Actually the list of task statuses will be pretty small so either we can do what is currently
done or maintain a boolean flag as Owen mentioned, +1.
System.currentTimeMillis() - startTime >= SPECULATIVE_LAG
As suggested by Devaraj, the time can be calculated in {{JobInProgress.findNewTask()}} and
use this value in {{TaskInProgress.hasSpeculative()}}. The chances of ignoring a TIP for speculation
that should be speculated will be extremely low as compared to using the time in {{TaskInProgress.recomputeProgress()}}.

> TaskInProgress.hasSpeculativeTask is very inefficient
> -----------------------------------------------------
>                 Key: HADOOP-2790
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2790
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>             Fix For: 0.16.1
> Each call to JobInProgress.findNewTask can call TaskInProgress.hasSpeculativeTask once
per a task. Each call to hasSpeculativeTask calls System.getCurrentTimeMillis, which can result
in hundreds of thousands of calls to getCurrentTimeMillis. Additionally, it calls TaskInProgress.isOnlyCommitPending,
which calls .values() on the map from task id to host name and iterates through them to see
if any of the tasks are in commit pending. It would be better to have a commit pending boolean
flag in the TaskInProgress. It also looks like there are other opportunities here, but those
jumped out at me. We should also look at this method in the profiler.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message