hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished
Date Fri, 08 Mar 2013 15:44:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597226#comment-13597226
] 

Thomas Graves commented on YARN-460:
------------------------------------

We think this might be a race between when the application gets removed (LeafQueue.removeApplication)
and between a possibly in flight allocate call from the AM.  If LeafQueue.removeApplication
is called it removes the user from the list of active users.  If a CapacityScheduler.allocate
call comes in before the application gets removed from application datastructures in CapacityScheduler.doneApplication,
it could add it back to the activeUser list because the allocate() call just checks to make
sure application isn't null.  We either need to make the check in allocate better or prevent
the race between finish and removing.

This is basically in CapacityScheduler.doneApplication:
    } else {
      queue.finishApplication(application, queue.getQueueName());
    }
    
    // Remove from our data-structure
    applications.remove(applicationAttemptId); 
                
> CS user left in list of active users for the queue even when application finished
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-460
>                 URL: https://issues.apache.org/jira/browse/YARN-460
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 0.23.7, 2.0.4-alpha
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>            Priority: Critical
>
> We have seen a user get left in the queues list of active users even though the application
was removed. This can cause everyone else in the queue to get less resources if using the
minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message