hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
Date Tue, 03 Mar 2015 21:17:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345747#comment-14345747

zhihai xu commented on YARN-2893:

I find there is another possibility which can also cause this exception for none-secure one:
the JobClient corrupted the tokens buffer.
The RM code only check the tokens buffer in RMAppManager#submitApplication for secure one.
    if (UserGroupInformation.isSecurityEnabled()) {
      try {
      } catch (Exception e) {
        LOG.warn("Unable to parse credentials.", e);
        // Sending APP_REJECTED is fine, since we assume that the
        // RMApp is in NEW state and thus we haven't yet informed the
        // scheduler about the existence of the application
        assert application.getState() == RMAppState.NEW;
          .handle(new RMAppRejectedEvent(applicationId, e.getMessage()));
        throw RPCUtil.getRemoteException(e);

  protected Credentials parseCredentials(
      ApplicationSubmissionContext application) throws IOException {
    Credentials credentials = new Credentials();
    DataInputByteBuffer dibb = new DataInputByteBuffer();
    ByteBuffer tokens = application.getAMContainerSpec().getTokens();
    if (tokens != null) {
    return credentials;
I think we should do the same for none-secure one, so we can fail the application earlier
to avoid confusion.

Also I find out a cascading patch to fix the credentials corruption at the jobClient.

I will update the patch to check the  tokens buffer for for none-secure one in RMAppManager#submitApplication.

> AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
> ------------------------------------------------------------------------------
>                 Key: YARN-2893
>                 URL: https://issues.apache.org/jira/browse/YARN-2893
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Gera Shegalov
>            Assignee: zhihai xu
>         Attachments: YARN-2893.000.patch
> MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in
the AM launch context.

This message was sent by Atlassian JIRA

View raw message