hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3655) datenode recoverRbw could hang sometime
Date Fri, 13 Jul 2012 22:00:46 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aaron T. Myers updated HDFS-3655:

    Target Version/s: 0.22.1
       Fix Version/s:     (was: 0.22.1)

Please only set the "fix version" once the JIRA has been fixed/committed. Until then, please
use the "target version" field to indicate where you intend to fix this.

Also, please note that in order for the automated pre-commited tests to run, you'll need to
upload a patch that applies to trunk, and mark this JIRA "patch available." If indeed this
bug exists in trunk/branch-2 as well as branch-0.22, we'll need to fix it in trunk/branch-2
before we can commit it to branch-0.22.
> datenode recoverRbw could hang sometime
> ---------------------------------------
>                 Key: HDFS-3655
>                 URL: https://issues.apache.org/jira/browse/HDFS-3655
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.22.0, 1.0.3, 2.0.1-alpha
>            Reporter: Ming Ma
>         Attachments: HDFS-3655-0.22-use-join-instead-of-wait.patch, HDFS-3655-0.22.patch
> This bug seems to apply to 0.22 and hadoop 2.0. I will upload the initial fix done by
my colleague Xiaobo Peng shortly ( there is some logistics issue being worked on so that he
can upload patch himself later ).
> recoverRbw try to kill the old writer thread, but it took the lock (FSDataset monitor
object) which the old writer thread is waiting on ( for example the call to data.getTmpInputStreams
> "DataXceiver for client / [Receiving block blk_-3037542385914640638_57111747
client=DFSClient_attempt_201206021424_0001_m_000401_0]" daemon prio=10 tid=0x00007facf8111800
nid=0x6b64 in Object.wait() [0x00007facd1ddb000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1186)
> ■locked <0x00000007856c1200> (a org.apache.hadoop.util.Daemon)
> at java.lang.Thread.join(Thread.java:1239)
> at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:158)
> at org.apache.hadoop.hdfs.server.datanode.FSDataset.recoverRbw(FSDataset.java:1347)
> ■locked <0x00000007838398c0> (a org.apache.hadoop.hdfs.server.datanode.FSDataset)
> at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:119)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlockInternal(DataXceiver.java:391)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:327)
> at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:405)
> at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:344)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183)
> at java.lang.Thread.run(Thread.java:662)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message