hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart
Date Tue, 22 May 2018 20:10:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484491#comment-16484491

Haibo Chen commented on YARN-8191:

{quote}The point here is that the set of removed queues can be gathered in {{AllocationReloadListener.onReload()}}
outside of the writeLock
I see. That does make sense.
{quote}what do you mean by "What about the other case where some dynamic queues are not added
as static in the new allocation file?"
In this patch,  getRemovedStaticQueues() and QueueManager.setQueuesToDynamic() together identify
the queues that were added in the previous allocation file, but are now removed in the new
allocation file, and then mark them as dynamic. What I mean is that, is it possible that some
queues that were created dynamically, but are now included in the new allocation file? If
so, we need to mark them as static.

The behavior of  QueueManager.removeLeafQueue() is still changed with the refactoring. 
Previously it would return true if there is no incompatible queue found, but it now returns
false.  We should also return true if removeEmptyIncompatibleQueues(name, FSQueueType.PARENT)
returns null.  Similarly, in IncompatibleQueueRemovalTask.execute(), the task shall be removed
if `removed == null`.
{quote}{{updateAllocationConfiguration()}} is only called when the configuration file has
been modified, so if for example there's only one configuration modification during the lifetime
of the RM, incompatible queues would not be cleaned up until a restart
I see.

Let's add some javadoc to newly added QueueManager public methods.

`reloadListener.onCheck();` makes me worried what if the listener is not set. Looking closely
at the code, the setReloadListener() is always set right after the AllocationFileLoaderService
constructor, so I think we can move reloadListener as a construnctor argument, so that we
never worry if listener is null.


> Fair scheduler: queue deletion without RM restart
> -------------------------------------------------
>                 Key: YARN-8191
>                 URL: https://issues.apache.org/jira/browse/YARN-8191
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>    Affects Versions: 3.0.1
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>            Priority: Major
>         Attachments: Queue Deletion in Fair Scheduler.pdf, YARN-8191.000.patch, YARN-8191.001.patch,
YARN-8191.002.patch, YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, YARN-8191.006.patch,
YARN-8191.007.patch, YARN-8191.008.patch, YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch,
YARN-8191.012.patch, YARN-8191.013.patch, YARN-8191.014.patch
> The Fair Scheduler never cleans up queues even if they are deleted in the allocation
file, or were dynamically created and are never going to be used again. Queues always remain
in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml should
be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that point in time.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message