hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
Date Sat, 13 Dec 2014 00:19:13 GMT
Zhijie Shen created YARN-2958:
---------------------------------

             Summary: RMStateStore seems to unnecessarily and wronly store sequence number
separately
                 Key: YARN-2958
                 URL: https://issues.apache.org/jira/browse/YARN-2958
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
            Reporter: Zhijie Shen


It seems that RMStateStore updates last sequence number when storing or updating each individual
DT, to recover the latest sequence number when RM restarting.

First, the current logic seems to be problematic:
{code}
  public synchronized void updateRMDelegationTokenAndSequenceNumber(
      RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
      int latestSequenceNumber) {
    if(isFencedState()) {
      LOG.info("State store is in Fenced state. Can't update RM Delegation Token.");
      return;
    }
    try {
      updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate,
          latestSequenceNumber);
    } catch (Exception e) {
      notifyStoreOperationFailed(e);
    }
  }
{code}
{code}
  @Override
  protected void updateStoredToken(RMDelegationTokenIdentifier id,
      long renewDate) {
    try {
      LOG.info("updating RMDelegation token with sequence number: "
          + id.getSequenceNumber());
      rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
        renewDate, id.getSequenceNumber());
    } catch (Exception e) {
      LOG.error("Error in updating persisted RMDelegationToken with sequence number: "
            + id.getSequenceNumber());
      ExitUtil.terminate(1, e);
    }
  }
{code}
According to code above, even when renewing a DT, the last sequence number is updated in the
store, which is wrong. For example, we have the following sequence:
1. Get DT 1 (seq = 1)
2. Get DT 2( seq = 2)
3. Renew DT 1 (seq = 1)
4. Restart RM
The stored and then recovered last sequence number is 1. It makes the next created DT after
RM restarting will conflict with DT 2 on sequence num.

Second, the aforementioned bug doesn't happen actually, because the recovered last sequence
num has been overwritten at by the correctly one.
{code}
  public void recover(RMState rmState) throws Exception {

    LOG.info("recovering RMDelegationTokenSecretManager.");
    // recover RMDTMasterKeys
    for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
      .getMasterKeyState()) {
      addKey(dtKey);
    }

    // recover RMDelegationTokens
    Map<RMDelegationTokenIdentifier, Long> rmDelegationTokens =
        rmState.getRMDTSecretManagerState().getTokenState();
    this.delegationTokenSequenceNumber =
        rmState.getRMDTSecretManagerState().getDTSequenceNumber();
    for (Map.Entry<RMDelegationTokenIdentifier, Long> entry : rmDelegationTokens
      .entrySet()) {
      addPersistedDelegationToken(entry.getKey(), entry.getValue());
    }
  }
{code}
The code above recovers delegationTokenSequenceNumber by reading the last sequence number
in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to
the right number.
{code}
    if (identifier.getSequenceNumber() > getDelegationTokenSeqNum()) {
      setDelegationTokenSeqNum(identifier.getSequenceNumber());
    }
{code}
All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be
set to the largest sequence number among these identifiers. Therefore, new DT will be assigned
a sequence number which is always larger than that of all the recovered DT.

To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know
if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message