hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhaoning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1226) Granularity Variable Task Pre-Scheduler in Heterogeneous Environment
Date Tue, 01 Dec 2009 08:23:30 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhaoning Zhang updated MAPREDUCE-1226:
--------------------------------------

    Component/s: tasktracker
                 task
       Priority: Minor  (was: Major)

> Granularity Variable Task Pre-Scheduler in Heterogeneous Environment 
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1226
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>         Environment: Heterogeneous Cluster
>            Reporter: Zhaoning Zhang
>            Priority: Minor
>
> As we deploy the LATE scheduler of the OSDI08 paper, upon some of our cluster enviroments,
some slow nodes may be assigned tasks that every time run slowly and be re-executed then killed,
so we found these nodes are always with no use and waste the assigned task slots.
> In the LATE mechanism, we re-execute some of the tasks, so these tasks run on different
node twice or more, then this cause some waste of the calculating resources.
> Easily, we can remove these node out of the cluster or split the cluster into two or
more. But I think it's useful and significant to design a mechanism to help low utility nodes
to be effect.
>  
> We want to pre-schedule the tasks with the utility based on node historical logs, then
assign larger size tasks to the fast nodes. In Hadoop task scheduler, we assign the map task
in default splits of 64M. Some may split it into 128M. But, most of them are of the same granularity.
So I want to alter this mechanism to a granularity variable one.
> As we know, the Map task granularity depends on the DFS file size, while the Reduce task's
depends on the Partitioner to split the intermediate results. So I think this is feasible
to get the granularity variable mechanism.
> If we use the pre-schedule model, then we can expect all the tasks can start at a nearly
same time and finish at a nearly same time, and the job can fill a specific time slot. 
> History-Log-Based nodes Utility description
> This is the fundamental description of nodes for the pre-scheduler. And in the heterogeneous
environment, the cluster can be split into different sub-cluster, and within the sub-cluster
the nodes are homogeneous and between the sub-cluster the nodes are heterogeneous.
> Nodes Utility Stability
> We think this is important for the pre-scheduler depends on the stability of the nodes.
And we could pick the bad stability nodes up and treat them differently, but we haven't have
good method to handle this. 
> Error tolerant
> I think the original scheduler in the homogeneous cluster is designed to handle the error
nodes, if some nodes get exceptions, the JobTracker re-execute them, and handle these exceptions
dynamically.
> So if we use the pre-scheduler, we must face the problem of the exceptions.
> I propose that if some tasks got exceptions, we split the task into more than one part
and execute them on more than one different nodes, then the expected finish time will be shorten,
and the total job response time will not be too long.
> Job Priorities
> If we use this pre-scheduler, single job will fill the time slot, and if then will be
some other high-priority jobs, they will wait. And I don't get effect methods to solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message