hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4581) thread leak makes RM crash while RM is recovering
Date Tue, 12 Jan 2016 14:10:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093925#comment-15093925
] 

Junping Du commented on YARN-4581:
----------------------------------

Hi [~sandflee], thanks for reporting the issue and delivering the patch.
Like Naga mentioned above, AHS is already a deprecated feature in community and ATS (Application
Timeline Service) is a replacement for it since 2.6.0. Do you have plan to migrate to ATS
instead of AHS?

> thread leak makes RM crash while RM is recovering
> -------------------------------------------------
>
>                 Key: YARN-4581
>                 URL: https://issues.apache.org/jira/browse/YARN-4581
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: sandflee
>            Assignee: sandflee
>         Attachments: YARN-4581.01.patch
>
>
> we enable ApplicationHistoryWriter, and find thousands of  Errors:
> {quote}
> 2016-01-08 03:13:03,441 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
Error when openning history file of application application_1451878591907_0197
> java.io.IOException: Output file not at zero offset.
>         at org.apache.hadoop.io.file.tfile.BCFile$Writer.<init>(BCFile.java:288)
>         at org.apache.hadoop.io.file.tfile.TFile$Writer.<init>(TFile.java:288)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.<init>(FileSystemApplicationHistoryStore.java:728)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
>         at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
>         at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
>         at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
>         at java.lang.Thread.run(Thread.java:745)
> {quote}
> and this leads rm crashed:
> {quote}
> 2016-01-08 03:13:08,335 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in
dispatcher thread
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:714)
>         at org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2033)
>         at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1652)
>         at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1573)
>         at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1603)
>         at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1591)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:324)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:324)
>         at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.<init>(FileSystemApplicationHistoryStore.java:723)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
>         at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
>         at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
>         at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
>         at java.lang.Thread.run(Thread.java:745)
> {quote}
> after serveval failover, rm finish recovering, thousands of hdfs client thread are leaked
in rm.
> {quote}
> "Thread-22723" #22893 daemon prio=5 os_prio=0 tid=0x00007f75f0346000 nid=0x132e in Object.wait()
[0x00007f74ea7ca000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:502)
>         - locked <0x0000000745f88b98> (a java.util.LinkedList)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message