hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sandflee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg
Date Thu, 25 Feb 2016 08:36:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166923#comment-15166923
] 

sandflee commented on YARN-4673:
--------------------------------

Hi, [~ozawa], in ResourceTrackService we may concurrently process nodeHeartBeat() with same
nodeId and responseId, they may both pass the lastResonseId check,  this will cause the lost
of RM message. With the Nodelock, we could process one by one, and the above exception could
be catched.
{code}
      if (remoteNodeStatus.getResponseId() + 1 == lastNodeHeartbeatResponse
          .getResponseId()) {
        LOG.info("Received duplicate heartbeat from node " +
            rmNode.getNodeAddress() + " responseId=" +
            remoteNodeStatus.getResponseId());
        return lastNodeHeartbeatResponse;
      }
{code}

actually I have not encounter the bug caused by this, but this may be a risk.

> race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated
msg
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-4673
>                 URL: https://issues.apache.org/jira/browse/YARN-4673
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: sandflee
>            Assignee: sandflee
>         Attachments: YARN-4673.01.patch
>
>
> we could add a lock like ApplicationMasterService#allocate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message