hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wilfred Spiegelenburg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart
Date Fri, 11 May 2018 04:18:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471471#comment-16471471
] 

Wilfred Spiegelenburg commented on YARN-8191:
---------------------------------------------

We're almost there.

First the simple one:
{quote}testRemovalOfDynamicParentQueue needs to cover a dynamic parent queue without a leaf
is removal.
 * How can I create a dynamic parent queue without a leaf? I thought the only way to have
a parent queue without a leaf is to add it to the allocation config with parent="true", but
in this case it'd be a static queue.{quote}
It is almost covered in the new test {{testRemovalOfChildlessParentQueue}} but I want to mimic
what happens in the {{onReload}}.
 BTW: I think we need to swap the two calls around in onReload. First mark removed queues
as dynamic then update so that the queue manager removes the queues immediately if possible.
{quote}We already check all queues defined in the configuration on each reload for existence
in the updateAllocationConfiguration via the call to removeEmptyIncompatibleQueues. If the
queue of the correct type exists we currently just return. The only thing we should do now
before we return is unset the isDynamic flag there and there is no need for a separate loop.
removeEmptyIncompatibleQueues is called for each configured queue with each reload.
 * Sorry, I do not understand what you're suggesting here. Could you please elaborate a bit
more?{quote}
Sure I can:
 When we walk over the list of queues that are loaded from the config in {{updateAllocationConfiguration}}
we call for every queue that is configured the method {{removeEmptyIncompatibleQueues}}. That
method will check if the queue with that name already exists or not. If the queue exists it
also checks if the queue is of the correct type (parent or leaf) if the queue is of the correct
type we currently just return:
{code:java}
    FSQueue queue = queues.get(queueToCreate);
    // Queue exists already.
    if (queue != null) {
      if (queue instanceof FSLeafQueue) {
        if (queueType == FSQueueType.LEAF) {
          // if queue is already a leaf then return true
          return true;
        }
        // remove incompatibility since queue is a leaf currently
        // needs to change to a parent.
        return removeQueueIfEmpty(queue);
      } else {
        if (queueType == FSQueueType.PARENT) {
          return true;
        }
        // If it's an existing parent queue and needs to change to leaf, 
        // remove it if it's empty.
        return removeQueueIfEmpty(queue);
      }
    }
{code}
What I am proposing is to add one line of code per queue type. If the queue exists and it
is the correct type than we should make sure the dynamic flag is set to false.That will have
no effect if the queue was already defined in the config. However if the queue was created
as a dynamic queue it will then turn that queue into a queue defined in the configuration.
if the queue was already In all other cases we remove the queue and create a new one later
on which will have the correct dynamic flag set.
 So in the above code we would get two lines extra, this is the one for the LEAF queues:
{code:java}
        if (queueType == FSQueueType.LEAF) {
          queue.setDynamic(false);
          // if queue is already a leaf then return true
          return true;
        }
{code}
With that change we do not have to do anything in the \{{updateAllocationConfiguration}} for
static/dynamic change.

Does that make sense?

I just noticed two bugs in that code, neither are newly introduced:
 # if I try to create a LEAF from the config and it is already a LEAF queue we should not
return true but false. Same for the PARENT check. The return code triggers the queue creation
which is not needed because the queue already exists even with the right type.
 # if the queue exists with the wrong type we try to remove it via \{{removeQueueIfEmpty}}
The result is passed back without checks and don't follow up. If the remove failed I have
a queue with the wrong type in the system. That leaves the system in an inconsistent state:
whatever I have in the queue manager is now not what is in the configuration. This should
be at least logged and probably even throw.

1 we can fix now: just change the return value, 2 probably needs a follow up Jira as it is
more complex.

> Fair scheduler: queue deletion without RM restart
> -------------------------------------------------
>
>                 Key: YARN-8191
>                 URL: https://issues.apache.org/jira/browse/YARN-8191
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>    Affects Versions: 3.0.1
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>            Priority: Major
>         Attachments: Queue Deletion in Fair Scheduler.pdf, YARN-8191.000.patch, YARN-8191.001.patch,
YARN-8191.002.patch, YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, YARN-8191.006.patch,
YARN-8191.007.patch, YARN-8191.008.patch, YARN-8191.009.patch, YARN-8191.010.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the allocation
file, or were dynamically created and are never going to be used again. Queues always remain
in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml should
be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message