hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Min Shen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5520) [Capacity Scheduler] Change the logic for when to trigger user/group mappings to queues
Date Mon, 15 Aug 2016 21:15:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421682#comment-15421682
] 

Min Shen edited comment on YARN-5520 at 8/15/16 9:15 PM:
---------------------------------------------------------

[~venkateshrin],

Thanks for providing feedbacks on this ticket.

To answer your questions, I'd like to use the following example to make the explanation more
clear:
Assume we have 4 queues configured, i.e. root.orgA, root.orgB, root.default, and root.public
root.orgA's capacity and max capacity are configured as 45% and 45%, respectively. Same for
root.orgB.
root.default is configured as 0%, 0% while root.public is configured as 5% and 30%.
Preemption is also enabled. Thus, root.orgA and root.orgB each has 45% of guaranteed resources,
while root.public has access to certain elastic resources w/o too much guarantee.

Also, assume we have 3 users, userA which belongs to orgA, userB which belongs to orgB, and
userC which also belongs to orgB.
Admins want to route users to their corresponding organization queue, so they have configured
the following in capacity-scheduler.xml:
{noformat}
u:userA:orgA, u:userB:orgB, u:userC:orgB
{noformat}
In addition, YarnConfiguration.DEFAULT_QUEUE_NAME is set to "default".

In my proposed change, when {{yarn.scheduler.capacity.queue-mappings-override.enable}} is
set to false, user's application will always get submitted to whichever queue the user requests,
or root.default if user does not specify a queue.

When {{yarn.scheduler.capacity.queue-mappings-override.enable}} is set to true, we have the
following possible scenarios:
# userA/userB/userC submits jobs which do not specify a queue. My proposed change will override
the application's queue with the one specified in the queue mappings configuration.
# userA/userB/userC submits jobs to root.default, root.orgA, or root.orgB. The application's
queue will still be overridden with the one specified in the queue mappings configuration.
# userA/userB/userC submits jobs to root.public. The application will be submitted to root.public.
This could happen in the following case: userB consumed all available resources in root.orgB
but userA is not using resources in root.orgA. In the mean time, userC wants to launch his
job. If we enforce queue overriding for all queues, then userC has to wait for userB's job
to release resources. However, if we disable queue overriding for root.public, userC can use
root.public to get resources much more quickly. 

In this way, the admin can override queues for applications submitted to a certain subset
of queues in the cluster, while still allowing users to use the "adhoc" queues. It also distinguishes
well between the cases when queue overriding is enabled vs. when it's not, since users have
to explicitly specify to use the "adhoc" queues in order to disable queue overriding. As a
result, the users should understand disabling queue overriding comes with the cost of less
resource guarantees (because of preemption).

We can introduce an additional parameter {{yarn.scheduler.capacity.queue-mappings.disabled.queues}}
to control the list of queues where queue overriding is disabled when {{yarn.scheduler.capacity.queue-mappings-override.enable}}
is set to true.


was (Author: mshen):
[~venkateshrin],

Thanks for providing feedbacks on this ticket.

To answer your questions, I'd like to use the following example to make the explanation more
clear:
Assume we have 4 queues configured, i.e. root.orgA, root.orgB, root.default, and root.public
root.orgA's capacity and max capacity are configured as 45% and 45%, respectively. Same for
root.orgB.
root.default is configured as 0%, 0% while root.public is configured as 5% and 30%.
Preemption is also enabled. Thus, root.orgA and root.orgB each has 45% of guaranteed resources,
while root.public has access to certain elastic resources w/o too much guarantee.

Also, assume we have 3 users, userA which belongs to orgA, userB which belongs to orgB, and
userC which also belongs to orgB.
Admins want to route users to their corresponding organization queue, so they have configured
the following in capacity-scheduler.xml:
{noformat}
u:userA:orgA, u:userB:orgB, u:userC:orgB
{noformat}
In addition, YarnConfiguration.DEFAULT_QUEUE_NAME is set to "default".

In my proposed change, when {{yarn.scheduler.capacity.queue-mappings-override.enable}} is
set to false, user's application will always get submitted to whichever queue the user requests,
or root.default if user does not specify a queue.

When {{yarn.scheduler.capacity.queue-mappings-override.enable}} is set to true, we have the
following possible scenarios:
# userA/userB/userC submits jobs which do not specify a queue. My proposed change will override
the application's queue with the one specified in the queue mappings configuration.
# userA/userB/userC submits jobs to root.default, root.orgA, or root.orgB. The application's
queue will still be overridden with the one specified in the queue mappings configuration.
# userA/userB/userC submits jobs to root.public. The application will be submitted to root.public.
This could happen in the following case: userB consumed all available resources in root.orgB
but userA is not using resources in root.orgA. In the mean time, userC wants to launch his
job. If we enforce queue overriding for all queues, then userC has to wait for userB's job
to release resources. However, if we disable queue overriding for root.public, userC can use
root.public to get resources much more quickly. 

In this way, the admin can override queues for applications submitted to a certain subset
of queues in the cluster, while still allowing users to use the "adhoc" queues. It also distinguishes
well between the cases when queue overriding is enabled vs. when it's not, since users have
to explicitly specify to use the "adhoc" queues in order to disable queue overriding. As a
result, the users should understand disabling queue overriding comes with the cost of less
resource guarantees (because of preemption).

We can introduce an additional parameter {{yarn.scheduler.capacity.queue-mappings.disabled.queues}}
to control the list of queues where queue overriding is disabled when {yarn.scheduler.capacity.queue-mappings-override.enable}}
is set to true.

> [Capacity Scheduler] Change the logic for when to trigger user/group mappings to queues
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-5520
>                 URL: https://issues.apache.org/jira/browse/YARN-5520
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.6.0, 2.7.0, 2.6.1
>            Reporter: Min Shen
>
> In YARN-2411, the feature in Capacity Scheduler to support user/group based mappings
to queues was introduced.
> In the original implementation, the configuration key {{yarn.scheduler.capacity.queue-mappings-override.enable}}
was added to control when to enable overriding user requested queues.
> However, even if this configuration is set to false, queue overriding could still happen
if the user didn't request for any specific queue or choose to simply submit his job to "default"
queue, according to the following if condition which triggers queue overriding:
> {code}
> if (queueName.equals(YarnConfiguration.DEFAULT_QUEUE_NAME)
>               || overrideWithQueueMappings)
> {code}
> This logic does not seem very reasonable, as there's no way to fully disable queue overriding
when mappings are configured inside capacity-scheduler.xml.
> In addition, in our environment, we have setup a few organization dedicated queues as
well as some "adhoc" queues. The organization dedicated queues have better resource guarantees
and we want to be able to route users to the corresponding organization queues. On the other
hand, the "adhoc" queues have less resource guarantees but everyone can use it to get some
opportunistic resources when the cluster is free.
> The current logic will also prevent this type of use cases as when you enable queue overriding,
users cannot use these "adhoc" queues any more. They will always be routed to the dedicated
organization queues.
> To address the above 2 issues, I propose to change the implementation so that:
> * Admin can fully disable queue overriding even if mappings are already configured.
> * Admin have finer grained control to cope queue overriding with the above mentioned
organization/adhoc queue setups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message