hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4302) NM goes down if error encountered during log aggregation
Date Fri, 01 Jun 2012 17:42:23 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287558#comment-13287558
] 

Daryn Sharp commented on MAPREDUCE-4302:
----------------------------------------

For a little background, the problem was detected due to a NN token issue.  The NMs all went
down because log aggregation init failed to connect to the NN to create its log dirs.  The
NMs were started up again, and they all went down again because the AMs were retrying the
tasks.  The problem was also induced by restricting permissions on the log dir and stopping
the NN.
                
> NM goes down if error encountered during log aggregation
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-4302
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4302
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.0, 2.0.0-alpha, trunk
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: MAPREDUCE-4302.patch
>
>
> When a container launch request is sent to the NM, if _any_ exception occurs during the
init of log aggregation then the NM goes down.  The problem can be induced by situations including,
but certainly not limited to: transient rpc connection issues, missing tokens, expired tokens,
permissions, full/quota exceeded dfs, etc.  The problem may occur with and without security
enabled.
> The ramification is an entire cluster can be rather easily brought down either maliciously,
accidentally, or via a submission bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message