hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7697) NM goes down with OOM due to leak in log-aggregation
Date Wed, 03 Jan 2018 19:33:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310156#comment-16310156
] 

Xuan Gong commented on YARN-7697:
---------------------------------

The issue happens after file truncate process. Looks like the truncate API return false instead
of throw exception, so we still read the corrupted aggregated log. 

In the process of reading logs, we would allocate a byte array
{code}
byte[] array = new byte[offset]; // this line throws OOM
fsDataIStream.seek(
          fileLength - offset - Integer.SIZE/ Byte.SIZE - UUID_LENGTH);
{code}
So, the offset is in-correct, and probably a invalid big value, we could get OOM in NM.

> NM goes down with OOM due to leak in log-aggregation
> ----------------------------------------------------
>
>                 Key: YARN-7697
>                 URL: https://issues.apache.org/jira/browse/YARN-7697
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Santhosh B Gowda
>            Assignee: Xuan Gong
>
> 2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51))
- Thread Thread[LogAggregationService #0,5,main] threw an Error.  Shutting down now...
> java.lang.OutOfMemoryError: Java heap space
>         at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
>         at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
>         at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
>         at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
>         at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>         at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:284)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2017-12-29 01:43:50,601 INFO  application.ApplicationImpl (ApplicationImpl.java:handle(464))
- Application ap



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message