hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gergo Repas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart
Date Wed, 09 May 2018 12:08:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468755#comment-16468755

Gergo Repas commented on YARN-8191:

Regarding the TestRMWebServicesReservation failures: it will be resolved by YARN-8268.

[~wilfreds] - Thanks for the review. Please find my replies to your comments below.
# In removeEmptyDynamicQueues why are we not using getLeafQueues instead of getQueues?
The queue cleanup should start with the leafs and it removes the need to check if the queue
has children.
** This is to address the following scenario: there is an empty parent=“true” childless
queue in the allocation config, which gets removed from the config (and hence converted to
a dynamic queue in updateAllocationConfiguration). The next removeEmptyDynamicQueues() should
clean this queue up.
# In removeEmptyDynamicQueues instead of adding to parentQueuesToCheck without checks should
we not only add a parent queue if that queue is dynamic? That also removes the need for the
root queue check while iterating and the only check for child queues is needed.
** Good point, thanks, I addressed it.
# In updateAllocationConfiguration there should be no need to go over all the queues and
set the isDynamic flag. The flag should be set on creation and never updated after that unless
it was an incompatible queue. We should add a new getLeafQueue and getParentQueue method
which takes the isDynamic flag and sets it. It then can call the existing method. The only
place that we would call the new method would be in updateAllocationConfiguration. The old
version of the method would stay as is as the isDynamic flag set to true by default. That
should handle all placement rules etc.
** The dynamic flag can change, I can add a previously dynamic queue (e.g. root.username)
to the config, which becomes a static queue. Because of this, it’s easier just to set all
queues present in the config to static queue (instead of branching: if the queue exists, then
call setDynamic, else call the new getLeafQueue and getParentQueue methods).
# To clean up any removed configured, non dynamic, queues we need to be smart and we do not
want to walk over all queues. In onReload we have the old and new configured queues and
we can easily build a list of .. The changeToDynamic just needs to walk over the set and
set the flag. The standard cleanup code for the dynamic queues will then take care of the
cleanup without needing lots of iterations. Building this diff of queues can happen outside
of locking in the FS.
** Good point, I got rid of that loop. However instead of relying on the diff the be passed
in to the QueueManager, the QueueManager keeps track of the set of static queues, and only
sets the ones to dynamic, that are not present in the allocation config anymore.
# The logic behind getDynamicQueueNames is not correct: the configuredQueues set as stored
in the AllocationConfiguration only has queues that are defined in the configuration. There
can never be a dynamic queue in that list. Dynamic queues can only be created via the placement
rules. (see previous comments)
** This method got removed as part of the previous change.
# As a side node: getLeafQueues in the QueueManager should return an ImmutableList of the
leaf queues like we do in getQueues
** Yes, that would be clearer, however returning a CopyOnWriteArrayList should be also fine
in terms of thread-safety. Since it’s not related to my change, I will not touch this piece
of code.

> Fair scheduler: queue deletion without RM restart
> -------------------------------------------------
>                 Key: YARN-8191
>                 URL: https://issues.apache.org/jira/browse/YARN-8191
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>    Affects Versions: 3.0.1
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>            Priority: Major
>         Attachments: Queue Deletion in Fair Scheduler.pdf, YARN-8191.000.patch, YARN-8191.001.patch,
YARN-8191.002.patch, YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, YARN-8191.006.patch,
YARN-8191.007.patch, YARN-8191.008.patch
> The Fair Scheduler never cleans up queues even if they are deleted in the allocation
file, or were dynamically created and are never going to be used again. Queues always remain
in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml should
be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that point in time.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message