hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4558) Scheduler fails to reclaim capacity if Jobs are submitted to queue one after the other
Date Mon, 10 Nov 2008 04:52:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646169#action_12646169
] 

Vivek Ratan commented on HADOOP-4558:
-------------------------------------

I'd go with #2 (yes, you need to make sure that no code is relying on the fact that the data
structures for running tasks are empty if speculative execution is turned off). Granted, you're
keeping extra state for jobs with spec execution turned off, but the number of running tasks
cannot exceed the cluster capacity, so you're bounded. option #1 duplicates code between the
Capacity Scheduler & JobInProgress, and Option #3 is expensive, though we do a linear
scan only when killing tasks, which shouldn't happen very often. 

> Scheduler fails to reclaim capacity if Jobs are submitted to queue one after the other
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4558
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4558
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Cluster Capacity Maps=Reduces =210 each
> Two Queues: 
> Q1:  default, GC (%) =40, GC=84 (Maps and Reduces each). Reclaim time = 3 mins.
> Q2: test_q1, GC (%) =60, GC=126 (Maps and Reduces each) Reclaim time = 2 mins
>            Reporter: Karam Singh
>         Attachments: 4558.1.patch
>
>
> Scheduler fails to reclaim capacity if Jobs are submitted to queue one after the other.
> First job submitted with tasks equal to cluster's M/R Capacity
> Second is submitted to different queue when all tasks of First Job are running, scheduler
fails to reclaim capacity for second job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message