hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
Date Wed, 05 Nov 2014 00:27:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197206#comment-14197206
] 

Wangda Tan commented on YARN-2753:
----------------------------------


Hi Zhihai,
I just review your patch, some comments:

1) Would you please add a comment above

{code}
+      if (this.labelCollections.get(label) == null) {
{code}
To indicate we shouldn't overwrite it

{code}
+      if (labels.isEmpty()) {
{code}
To indicate the labels will never be null

{code}
+      if (originalLabels == null || !originalLabels.containsAll(labels)) {
{code}
To indicate why is it possible originalLabels can be null

which will help with other's review and read the code in the future :)

2) Would you please remove the changes to addToClusterNodeLabels in this patch, another patch
I'm working on (YARN-2800) need to print a warning here, I think it's better to address the
case there since no conflict no matter YARN-2800 or this JIRA go first.

3) In TestRMNodeLabelsManager,
It's better to change the comment
{code}
+    // add label "p1" again, then check whether resource for "p1" is changed.
{code}
To explicitly say to check add labels multiple times shouldn't overwrite original attributes
on labels like resource

4) Would you mind to add one more fix from my side in CommonNodeLabelsManager
Fix is in CommonNodeLabelsManager#ForwardingEventHandler:
Remove {{if (isInState(STATE.STARTED))}} checking, do handleStoreEvent() directly.
The reason is: in RMAdminCLI, we will create a CommonNodeLabelsManager when we need access
local node labels store directly. And we will call stop() of CommonNodeLabelsManager when
method invoke returned (like calling addToClusterNodelabels).
What CommonNodeLabelsManager do is async add the actual node labels store event to a blockingQueue
via AsyncDispatcher, and will be finally handled by CommonNodeLabelsManager#ForwardingEventHandler.
A race condition is, even if we called setDrainEventsOnStop when initialize the dispatcher,
but the state of AbstractService (her wis CommonNodeLabelsManager) will become “stopped”
as soon as we called the stop() of CommonNodeLabelsManager. So even though AbstractDispatcher
can get these event delivery to ForwardingEventHandler.handle, but it is still possible ignored
by ForwardingEventHandler since the service is not in “started” state any longer. 
The behavior of this issue is, sometimes we called methods like addToClusterNodeLabels, and
it successfully returned, but the CLI program exit before edit log persisted.
I don’t have a good idea about how to test this without too much effort, but I think this
fix should be simple for review.

Thanks,
Wangda

> Fix potential issues and code clean up for *NodeLabelsManager
> -------------------------------------------------------------
>
>                 Key: YARN-2753
>                 URL: https://issues.apache.org/jira/browse/YARN-2753
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch,
YARN-2753.004.patch, YARN-2753.005.patch
>
>
> Issues include:
> * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections
if the key already exists otherwise the Label.resource will be changed(reset).
> * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager.
> ** because when a Node is created, Node.labels can be null.
> ** In this case, nm.labels; may be null. So we need check originalLabels not null before
use it(originalLabels.containsAll).
> * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java.
because we should protect labelCollections in RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message