hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1776) renewDelegationToken should survive RM failover
Date Thu, 20 Mar 2014 17:20:44 GMT

    [ https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941988#comment-13941988
] 

Zhijie Shen commented on YARN-1776:
-----------------------------------

[~kkambatl], sure, please go a head. [~ozawa], thanks for your input. I was thinking about
 the temp file approach, but I didn't think it can completely resolve the issue, and make
load the DT state much more complex in the failure case. If I understand correctly, FilieSystem
interface methods do not ensure atomic (the exception is that we previously considered rename
is atomic). Therefore, RM can fail during and between each of the 4 steps (IMO, 1 and 4 is
not necessary, and after 3 we need rename new DT file to old file name), and load the DT state
needs to handle them all. Another issue is that, if you can look at the current FileSystemRMStateStore:
{code}
writeFile(nodeCreatePath, os.toByteArray());
    fsOut.close();

    // store sequence number
    Path latestSequenceNumberPath = getNodePath(rmDTSecretManagerRoot,
          DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX + latestSequenceNumber);
    LOG.info("Storing " + DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX
        + latestSequenceNumber);
{code}
Storing a DT requires accessing two files. Even if we can ensure accessing DT file is atomic,
the method can still at the comment's place, and DT file is updated but dtSequenceNumberPath
isn't. Also, see updateApplicationStateInternal and updateApplicationAttemptStateInternal.
They call updateFile:
{code}
  protected void updateFile(Path outputPath, byte[] data) throws Exception {
    if (fs.exists(outputPath)) {
      deleteFile(outputPath);
    }
    writeFile(outputPath, data);
  }
{code}
RM can fail after deleting the file, before writing the file. 

I didn't closely follow the HA feature, but if RM failover relies on FSRMStateStore, we may
expect some problems due to non-atomic behavior. Thoughts?

> renewDelegationToken should survive RM failover
> -----------------------------------------------
>
>                 Key: YARN-1776
>                 URL: https://issues.apache.org/jira/browse/YARN-1776
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-1776.1.patch
>
>
> When a delegation token is renewed, two RMStateStore operations: 1) removing the old
DT, and 2) storing the new DT will happen. If RM fails in between. There would be problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message