hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable
Date Tue, 21 Aug 2012 16:38:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438840#comment-13438840
] 

Jason Lowe commented on YARN-24:
--------------------------------

One thing we could consider is marking the node as UNHEALTHY if it encounters issues trying
to create the initial app log directory or when it encounters issues trying to aggregate for
a particular app.  That way we won't pile up more apps on a node that's already having issues
trying to aggregate, and we're at least reporting on the cluster status page that the node
needs someone to take a look at what's going on.

As for notifying an app that the log aggregation isn't quite complete, I'm not sure how best
to handle that.  Since currently log aggregation is asynchronous from app execution, the app
will often have exited before the aggregation completes even when there isn't an issue accessing
the aggregation filesystem.  If we need to provide a way for apps to know for certain that
all of their container logs have been aggregated then we'd need to have log aggregation support
a notification service or minimally a way for AM's to query nodes to see if an aggregation
of a container has completed.

Does it make sense to split this into two parts?  We can use this JIRA to have NMs become
UNHEALTHY while they are having issues accessing the aggregation filesystem (and add retries
in such cases), and file a separate JIRA to add the log aggregation status/notification feature.
 The former would still be useful to have without the latter.
                
> Nodemanager fails to start if log aggregation enabled and namenode unavailable
> ------------------------------------------------------------------------------
>
>                 Key: YARN-24
>                 URL: https://issues.apache.org/jira/browse/YARN-24
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.1.0-alpha, 0.23.3
>            Reporter: Jason Lowe
>
> If log aggregation is enabled and the namenode is currently unavailable, the nodemanager
fails to startup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message