hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordà Polo (JIRA) <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1380) Adaptive Scheduler
Date Thu, 04 Feb 2010 16:28:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829639#action_12829639

Jordà Polo commented on MAPREDUCE-1380:

Not sure I'd put the VM request policy in the scheduler. Better to give it some way of notifying
something that there isn't enough resources, include data on user and data, and give that
other thing the ability to add machines if it so chooses. There may be other concerns like
per-user quota, overall costs, etc, as well as the security issue of giving your scheduler
the credentials to work with the infrastructure.

Good point. The current description doesn't explain much, but that's exactly what we have
in mind: a multi-tiered system in which the Hadoop scheduler just provides information to
the resource manager/provider.

> Adaptive Scheduler
> ------------------
>                 Key: MAPREDUCE-1380
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jordà Polo
>            Priority: Minor
>         Attachments: MAPREDUCE-1380_0.1.patch
> The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the
amount of used resources depending on the performance of jobs and on user-defined high-level
business goals.
> Existing Hadoop schedulers are focused on managing large, static clusters in which nodes
are added or removed manually. On the other hand, the goal of this scheduler is to improve
the integration of Hadoop and the applications that run on top of it with environments that
allow a more dynamic provisioning of resources.
> The current implementation is quite straightforward. Users specify a deadline at job
submission time, and the scheduler adjusts the resources to meet that deadline (at the moment,
the scheduler can be configured to either minimize or maximize the amount of resources). If
multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that
the current approach to estimate the completion time of jobs is quite simplistic: it is based
on the time it takes to finish each task, so it works well with regular jobs, but there is
still room for improvement for unpredictable jobs.
> The idea is to further integrate it with cloud-like and virtual environments (such as
Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline,
the scheduler automatically requests more resources.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message