hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4558) Scheduler fails to reclaim capacity if Jobs are submitted to queue one after the other
Date Thu, 06 Nov 2008 05:45:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645402#action_12645402
] 

Amar Kamat commented on HADOOP-4558:
------------------------------------

Looks like the structure for running tasks is maintained only if speculation is _ON_. Hence
with speculation turned off we dont see any tasks getting killed. We have 3 options here
1. Maintain a list of running tasks per job in capacity scheduler and use that to kill tasks
instead. The drawback of this approach is 
   - Scheduler will do the same book keeping as done by JIP
   - Scheduler now needs to know about task completions.

2. Maintain the list of running tasks irrespective of speculation. The only drawback of this
approach is that this will modify the (framework) code path for jobs with speculation turned
_OFF_ and hence will require benchmarking

3. For jobs with speculation turned _OFF_, we walk over the map structure, find out the least
progressed maps and kill them. The benefit of this approach is that the framework code remains
unchanged and there is no code duplication. The drawback is that this approach does a linear
scan everytime.

Thoughts?

I am still investigating why the reclaim didnt happen as expected.

> Scheduler fails to reclaim capacity if Jobs are submitted to queue one after the other
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4558
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4558
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Cluster Capacity Maps=Reduces =210 each
> Two Queues: 
> Q1:  default, GC (%) =40, GC=84 (Maps and Reduces each). Reclaim time = 3 mins.
> Q2: test_q1, GC (%) =60, GC=126 (Maps and Reduces each) Reclaim time = 2 mins
>            Reporter: Karam Singh
>         Attachments: 4558.1.patch
>
>
> Scheduler fails to reclaim capacity if Jobs are submitted to queue one after the other.
> First job submitted with tasks equal to cluster's M/R Capacity
> Second is submitted to different queue when all tasks of First Job are running, scheduler
fails to reclaim capacity for second job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message