hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
Date Sat, 27 Dec 2014 02:29:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259288#comment-14259288
] 

Hadoop QA commented on YARN-2958:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12689205/YARN-2958.003.patch
  against trunk revision 40ee4bf.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 2 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 15 new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6195//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6195//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6195//console

This message is automatically generated.

> RMStateStore seems to unnecessarily and wronly store sequence number separately
> -------------------------------------------------------------------------------
>
>                 Key: YARN-2958
>                 URL: https://issues.apache.org/jira/browse/YARN-2958
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Zhijie Shen
>            Assignee: Varun Saxena
>            Priority: Blocker
>         Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch
>
>
> It seems that RMStateStore updates last sequence number when storing or updating each
individual DT, to recover the latest sequence number when RM restarting.
> First, the current logic seems to be problematic:
> {code}
>   public synchronized void updateRMDelegationTokenAndSequenceNumber(
>       RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
>       int latestSequenceNumber) {
>     if(isFencedState()) {
>       LOG.info("State store is in Fenced state. Can't update RM Delegation Token.");
>       return;
>     }
>     try {
>       updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate,
>           latestSequenceNumber);
>     } catch (Exception e) {
>       notifyStoreOperationFailed(e);
>     }
>   }
> {code}
> {code}
>   @Override
>   protected void updateStoredToken(RMDelegationTokenIdentifier id,
>       long renewDate) {
>     try {
>       LOG.info("updating RMDelegation token with sequence number: "
>           + id.getSequenceNumber());
>       rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
>         renewDate, id.getSequenceNumber());
>     } catch (Exception e) {
>       LOG.error("Error in updating persisted RMDelegationToken with sequence number:
"
>             + id.getSequenceNumber());
>       ExitUtil.terminate(1, e);
>     }
>   }
> {code}
> According to code above, even when renewing a DT, the last sequence number is updated
in the store, which is wrong. For example, we have the following sequence:
> 1. Get DT 1 (seq = 1)
> 2. Get DT 2( seq = 2)
> 3. Renew DT 1 (seq = 1)
> 4. Restart RM
> The stored and then recovered last sequence number is 1. It makes the next created DT
after RM restarting will conflict with DT 2 on sequence num.
> Second, the aforementioned bug doesn't happen actually, because the recovered last sequence
num has been overwritten at by the correctly one.
> {code}
>   public void recover(RMState rmState) throws Exception {
>     LOG.info("recovering RMDelegationTokenSecretManager.");
>     // recover RMDTMasterKeys
>     for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
>       .getMasterKeyState()) {
>       addKey(dtKey);
>     }
>     // recover RMDelegationTokens
>     Map<RMDelegationTokenIdentifier, Long> rmDelegationTokens =
>         rmState.getRMDTSecretManagerState().getTokenState();
>     this.delegationTokenSequenceNumber =
>         rmState.getRMDTSecretManagerState().getDTSequenceNumber();
>     for (Map.Entry<RMDelegationTokenIdentifier, Long> entry : rmDelegationTokens
>       .entrySet()) {
>       addPersistedDelegationToken(entry.getKey(), entry.getValue());
>     }
>   }
> {code}
> The code above recovers delegationTokenSequenceNumber by reading the last sequence number
in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to
the right number.
> {code}
>     if (identifier.getSequenceNumber() > getDelegationTokenSeqNum()) {
>       setDelegationTokenSeqNum(identifier.getSequenceNumber());
>     }
> {code}
> All the stored identifiers will be gone through, and delegationTokenSequenceNumber will
be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned
a sequence number which is always larger than that of all the recovered DT.
> To sum up, two negatives make a positive, but it's good to fix the issue. Please let
me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message