hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-5632) Jobtracker leaves tasktrackers underutilized
Date Sun, 19 Apr 2009 06:40:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700550#action_12700550
] 

Devaraj Das edited comment on HADOOP-5632 at 4/18/09 11:40 PM:
---------------------------------------------------------------

If we go the route of lightweight/heavyweight heartbeat, I'd suggest that we explicitly call
those out as separate RPCs. Tasktrackers makes certain assumptions about a successful heartbeat,
and since tasktrackers always sends a regular (heavyweight) heartbeat, there is a problem
to do with status reporting for KILLED/FAILED tasks. Assume, at a certain TaskTracker node,
some task(s) fails just before sending the heartbeat. The tasktracker sends the status of
those tasks, and the JobTracker processes this heartbeat as a lightweight one (thereby doesn't
do the processing of status updates). The tasktracker removes these from the runningTasks
map after getting the heartbeat response, and won't report the statuses of those tasks again.
The JobTracker will be unaware of such task failures..

Also, maybe, we should process the failed/killed tasks' statuses in the lightweight heartbeat
as well. The logic being failed/killed tasks should be given the same treatment as virgin
tasks. It actually makes sense to give higher priority to failed tasks during task assignment
since if there is a deterministic failure on every attempt, the job would fail fast (after
a certain number of attempts of the same task), leading to better cluster utilization..

      was (Author: devaraj):
    If we go the route of lightweight/heavyweight heartbeat, I'd suggest that we explicitly
call those out as separate RPCs. Tasktrackers makes certain assumptions about a successful
heartbeat, and since tasktrackers always sends a regular (heavyweight) heartbeat, there is
a problem to do with status reporting for KILLED/FAILED tasks. Assume, at a certain TaskTracker
node, some task(s) fails just before sending the heartbeat. The tasktracker sends the status
of those tasks. The tasktracker removes these from the runningTasks map after getting the
heartbeat response, and won't report the statuses of those tasks again. The JobTracker will
be unaware of such task failures..

Also, maybe, we should process the failed/killed tasks' statuses in the lightweight heartbeat
as well. The logic being failed/killed tasks should be given the same treatment as virgin
tasks. It actually makes sense to give higher priority to failed tasks during task assignment
since if there is a deterministic failure on every attempt, the job would fail fast (after
a certain number of attempts of the same task), leading to better cluster utilization..
  
> Jobtracker leaves tasktrackers underutilized
> --------------------------------------------
>
>                 Key: HADOOP-5632
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5632
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux boxes, 100 node
cluster
>            Reporter: Khaled Elmeleegy
>         Attachments: hadoop-khaled-tasktracker.10s.uncompress.timeline.pdf, hadoop-khaled-tasktracker.150ms.uncompress.timeline.pdf,
jobtracker.patch, jobtracker20.patch
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even under heavy
load.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message