hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue
Date Mon, 01 Jun 2009 15:46:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715131#action_12715131

Hemanth Yamijala commented on HADOOP-5884:

Some comments:

- TaskSchedulingInfo.toString() - displaying the actual value had some problem in terms of
exactness and mismatch between cluster info and the state we kept. That's why we shifted to
percentages. May be a good idea to retain the model. Same argument can be made for running
tasks and numSlotsOccupiedByThisUser
- "Occupied slots" seems too techie. Call it 'Used capacity' ? Likewise instead of '% of total
slots occupied by all users', call it '% of used capacity' ?
- TaskSchedulingMgr.isUserOverLimit() - we add 1 if we're using more than the queue capacity.
It could be more than 1, depending on the task we are assigning (if it's part of high RAM
- MapSchedulingMgr constructor: typo: schedulr - should be scheduler. Similar for Reduce...
- Minor NIT: Use format instead of the complicated StringBuffer.append()... kind of code.
Makes it really hard to find what's happening.
- updateQSIObjects. The log statement is printing numMapSlotsForThisJob instead of numMapsRunningForThisJob.

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled
counts only once in the capacity of the queue, though it may actually be preventing other
jobs from using spare slots on that node because of its higher memory requirements. In order
to be fair, the capacity scheduler should proportionally (with respect to default memory)
account high memory jobs as using a larger capacity of the queue.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message