hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
Date Thu, 10 Mar 2016 15:31:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189434#comment-15189434
] 

Jason Lowe commented on YARN-4783:
----------------------------------

Thanks for posting the details from the logs!  The problem is as I suspected -- the RM cancelled
the delegation token before log aggregation had started from the nodemanager.  In this case
it was well before the nodemanager had a chance to aggregate, as the nodemanager wasn't recovered
until 13.5 hours after the application completed.

I'm not sure what YARN can do to fix this scenario.  It's a security risk to leave the delegation
token around too long after the application completed, and in the general case we can't leave
it around forever because it will eventually expire on its own.  Therefore we can't support
arbitrary delays between the application completing and the log aggregation starting.

> Log aggregation failure for application when Nodemanager is restarted 
> ----------------------------------------------------------------------
>
>                 Key: YARN-4783
>                 URL: https://issues.apache.org/jira/browse/YARN-4783
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: Surendra Singh Lilhore
>
> Scenario :
> =========
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user 
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped  (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal directory
> Expect Output :
> ===============
> Log aggregation should be succesfull
> Actual Output :
> ===============
> Log aggreation not successfull



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message