hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4658) User limit is not expanding back properly.
Date Fri, 14 Nov 2008 09:14:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647552#action_12647552
] 

Amar Kamat commented on HADOOP-4658:
------------------------------------

Looking at the logs, it seems that there are 2 problems
- when job3 finishes (pending = 0), it takes some time for it to exit the scheduler. While
job3 from user3 is actually done (doesnt require any scheduling cycles), user3 is still counted
as a valid user and thus affects the _limit_ computation
- Since the limit computation is slow in catching up, job1 always has the benefit and schedules
more tasks. The problem is that it sometimes goes ahead and schedules speculative tasks even
when  job2 has genuine tasks to run.

limit computation works as follows :
{code}
cap = min (running_tasks + 1, guaranteed_cap)
limit = max( cap/num_users, cap*ulimit)
{code}

I think whatever is extra should always be equally given back to all the contenders. This
can be achieved if we update _limits_ immediately based on how many users actually require
slots rather than waiting for the user to be removed from the scheduler. Also we should make
sure that speculative tasks should be run last else we will end up wasting resources.

new limit computation :
{code}
cap = min (running_tasks, guaranteed_cap)
num_actual_users = users with slot requirements // avoids users from jobs that are done with
their scheduling
limit = max( cap/num_actual_users, cap*ulimit)
{code}

> User limit is not expanding back properly.
> ------------------------------------------
>
>                 Key: HADOOP-4658
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4658
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: GC=100% nodes=104, map_capacity=208, reduce_capacity=208, user-limit=25%;
>            Reporter: Karam Singh
>            Assignee: Amar Kamat
>
> User limit is not expanding back properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message