hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-532) Option to limit max tasks started per job/queue
Date Thu, 02 Jul 2009 11:43:47 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726439#action_12726439

Hemanth Yamijala commented on MAPREDUCE-532:

This is looking ok. Some comments (mostly minor):

- I would prefer the name LIMIT instead of 'CAP' everywhere.
- The formatting in CapacityTaskScheduler.start() where the new QueueSchedulingInfo is being
created seems to be indented in too many lines. Can we fold them ?
- Methods introduced in TaskSchedulingInfo need not be public.
- areTasksInQueueOverCap - the getTSI call is repeated enough number of times to call once
and cache.
- Since capacity is in terms of slots, I think we should compare against numSlotsOccupied
as opposed to numRunningTasks. This also includes reserved tasktrackers in case we are dealing
with high memory jobs.
- Just to be safe, I would recommend this check is for >=, rather than ==.
- Documentation of the maxTaskCap variable in capacity scheduler refers to 'map' slots, where
it could be both.
- Currently we display the current # of slots in a queue in the UI. This could be lesser than
the % of the cluster capacity configured if the limit parameter is defined and is lower. I
think that might be confusing to the user.
- In the display, can we shorten the name, like maybe "Map Tasks Limit" instead of "Maximum
map tasks in a queue at a time :". I also think it may be OK to not have a line separator
for the limits, but club them with the Queue configuration section.

I am still to look at the test cases.

> Option to limit max tasks started per job/queue 
> ------------------------------------------------
>                 Key: MAPREDUCE-532
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-532
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Rajiv Chittajallu
>         Attachments: MAPREDUCE-532-1.patch, MAPREDUCE-532-2.patch, MAPREDUCE-532-3.patch
> For jobs which call external services, (eg: distcp, crawlers) user/admin should be able
to control max parallel tasks spawned. There should be a mechanism to cap the capacity available
for a queue/job. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message