hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1900) the heartbeat and task event queries interval should be set dynamically by the JobTracker
Date Thu, 18 Oct 2007 11:41:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535894
] 

Devaraj Das commented on HADOOP-1900:
-------------------------------------

bq. So, one way to take this into account might be to maintain an average time-to-complete
for all tasks in the system (of current jobs) and factor that into the scaling of the intervals.

The TaskTracker currently pings the JobTracker asking for a task as soon as it finishes executing
a task. I think that should be the behavior to keep the utilization of the tasktrackers optimal
(of course, in general we could do better by sending it a bunch of tasks every time it asks
for a new task, but that's the subject of another jira).

bq. Also, while we are at this, I say we should start to consider busy-ness of JobTracker
too, along with the cluster-size. So, for e.g., if the individual tasks are taking in the
order of minutes, then it might not matter much if we send one every 20s or so, in some cases
it might. I know that the sort's map tasks take around 40s each... 

I propose a change to the status message in the heartbeat - the tasktracker can compare the
current task status with the previous one and if it finds the status to be the same, it doesn't
send the complete status object to the JobTracker, but just a flag saying it is a duplicate
or something to that effect. That will reduce the data per RPC considerably for long running
tasks whose statuses don't change frequently and also reduce the processing load on the JobTracker.

Thoughts?

> the heartbeat and task event queries interval should be set dynamically by the JobTracker
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1900
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1900
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sri Ramadasu
>
> The JobTracker should scale the intervals that the TaskTrackers use to contact it dynamically,
based on how the busy it is and the size of the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message