hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Fri, 21 Mar 2008 06:25:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581009#action_12581009
] 

Amar Kamat commented on HADOOP-2119:
------------------------------------

bq. siblingSetAtLevel seems really arcane.
Node is also used elsewhere. So changing Node might prove risky. So for now we thought we
will have a JT level mapping and address this as a separate issue.
bq. why is there yet another map from hostname to Node?
 There is not extra mapping in JobTracker. The variable is just renamed. Earlier the map was
from the tracker-name to tracker-node. Now even the datanodes are mapped.
bq. I'm really concerned that we are adding 5 new fields holding collections to the JobInProgress
This will be fine once we remove the array (maps/reduces)
bq. I'm bothered by all of the checks for null Nodes .... 
If nodes are null (anywhere in the cache topology) then there wont be any cache. The cache
(as per trunk) is created only if the configuration is correct. The only place the node can
be null is when a tracker just joins in. In that case we iterate over all the parent nodes
and schedule a task. I agree that there should be sufficient amount of logging.
bq. Shouldn't we remove the node from the nodesToMaps regardless of the level?
We cant remove from the runnable-cache since it will be a costly operation. Its a list!!
bq. nodesToMaps being null should be a fatal error
With the latest patch there will be Null pointer exception.
bq. nodesToMaps should be a Map<Node,Set<Tip>>  ...
It can be a LinkedHashSet. Where the order of sort is the order of insertion. Since this is
what exactly we wanted, But then we would not be able to add failed tips in the front. We
can maintain a separate cache for failed tips.

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2119-v4.1.patch, HADOOP-2119-v5.1.patch, HADOOP-2119-v5.1.patch,
hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message