hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4623) Running tasks are not maintained by JobInProgress if speculation is off
Date Thu, 20 Nov 2008 18:40:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649443#action_12649443

Runping Qi commented on HADOOP-4623:

Currently the data structure for runningMapCache (logically a map from Node->Collection<TaskInProgress>).
Whenever a task is scheduled, a tip is added to this structure. Whenever a task is completed,
the tip is deleted from the data structure.
This data structure is currently implemented as a LinkedHashMap. That means each operation
involves link manipulation and objection creation.

I suspect that the performance would improve if a more efficient data structure is used.
Here is an idea.
Use a HashMap mapping nodes to fix sized arrays of tips. The fix size should be the number
of slots per node. 
With this simple data structure, you need to initialize it once. Any add/delete operations
be simply setting a reference in a fix sized array.
No object creation is involved. Their overhead will be lower and predictable.


> Running tasks are not maintained by JobInProgress if speculation is off
> -----------------------------------------------------------------------
>                 Key: HADOOP-4623
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4623
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-4623-v1.1.patch, HADOOP-4623-v1.2.patch
> {{JobInProgress}} doesnt maintain any structure for running tasks if speculation is turned
_off_.  {{getRunningMapCache()}} in {{JobInProgress}} exposes the running map cache. This
api returns an empty {{Map}} if speculation turned off. 
> _Usage_ :
> {{CapicityScheduler}} requires a list of running tasks for both speculated and non-speculated
jobs. See HADOOP-4558 to see how this issue affects {{CapacityScheduler}}.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message