hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
Date Thu, 25 Sep 2008 08:01:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634403#action_12634403
] 

Hemanth Yamijala commented on HADOOP-4211:
------------------------------------------

The scenario in which this problem occurs is slightly different from what is described. The
actual scenario is as follows:
- Suppose we have n slots in the system, and 25% is the minimum user limit.
- When we submit 3 jobs as 3 different users one after the other, in steady state, each user
gets n/3 slots.
- Let these 3 jobs complete.
- Now, submit 2 more jobs as 2 different users.
- The expectation is that the users get n/2 slots in steady state. However, the first user
gets 2n/3 slots and the other user gets n/3 slots.

The reason for this behavior is directly related to HADOOP-4053. Currently, there is no notification
to the schedulers that a job has completed. 

In the {{CapacityTaskScheduler}}, the limit is computed as follows:
{code}
limit = Math.max((int)(Math.ceil((double)currentCapacity/
          (double)qsi.numJobsByUser.size())), 
          (int)(Math.ceil((double)(qsi.ulMin*currentCapacity)/100.0)));
{code}

A user is added to the map {{numJobsByUser}} when a job is added. The intent was that the
user is removed from this map upon job completion. However, since this event is not yet raised,
the number of users is not correctly updated. As a result, the limit is still computed as
n/3, instead of n/2. And currently, if all users have hit the limit, then the first user with
running jobs is given any remaining slots, explaining the behavior observed.

In summary, if HADOOP-4053 is fixed, this issue will automatically get fixed. In fact, I applied
the patch currently available on HADOOP-4053 and verified the behavior is correct now. That
is, the limit is recomputed correctly.

I discussed this with Karam, and we agree that the observations are correct. I'll mark HADOOP-4053
a blocker for this bug. When that gets committed, Karam can try out again and close this bug.

> Capacity Scheduler does not divide queue resources properly among users, when jobs are
submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit
=25% and only one queue.
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Capacity Scheduler does not divide queue resources  properly among users, when job are
submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits
a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message