hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9684) DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.
Date Tue, 26 Jan 2016 15:32:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117387#comment-15117387
] 

Kihwal Lee commented on HDFS-9684:
----------------------------------

If we are to make datanode recoverable from such conditions, we need to take care of other
essential services running in datanode. E.g. I've seen DU threads silently terminating, causing
storage report to be stale. Sometimes crippled datanodes keep heartbeating so clients are
sent there and cause more failures.  It feels like we need a self healthcheck in datanode
along with recovery mechanism. 

> DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9684
>                 URL: https://issues.apache.org/jira/browse/HDFS-9684
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>            Priority: Blocker
>         Attachments: HDFS-9684.01.patch
>
>
> {noformat}
> java.lang.OutOfMemoryError: unable to create new native thread
> 	at java.lang.Thread.start0(Native Method)
> 	at java.lang.Thread.start(Thread.java:714)
> 	at org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1999)
> 	at org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2008)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:857)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message