hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: DFSClient got deadlock when close file and failed to renew lease
Date Mon, 19 Oct 2015 20:21:06 GMT
Hi daniedeng,

Please file a JIRA at https://issues.apache.org/jira/browse/HDFS with
details about your issue, and someone will take a look.

Best,
Andrew

On Sun, Oct 18, 2015 at 6:43 PM, daniedeng(邓飞) <daniedeng@tencent.com>
wrote:

>
>
> ------------------------------
> daniedeng(邓飞)
>
>
> *发件人:* daniedeng(邓飞) <daniedeng@tencent.com>
> *发送时间:* 2015-10-16 15:44
> *收件人:* hdfs-issues <hdfs-issues@hadoop.apache.org>; user@hadoop.apache.org
> *主题:* DFSClient got deadlock when close file and failed to renew lease
> Hi,All
>     We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version
> is 2.2.0),and it should be HDFS BUG,at the time our network is not stable.
>  below is the stack:
>
>
> *************************************************************************************************************************************
> Found one Java-level deadlock:
> =============================
> "MemStoreFlusher.1":
>   waiting to lock monitor 0x00007ff27cfa5218 (object 0x00000002fae5ebe0, a
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   waiting to lock monitor 0x00007ff2e67e16a8 (object 0x0000000486ce6620, a
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
>   waiting to lock monitor 0x00007ff27cfa5218 (object 0x00000002fae5ebe0, a
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
>
> Java stack information for the threads listed above:
> ===================================================
> "MemStoreFlusher.1":
> at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
> - waiting to lock <0x00000002fae5ebe0> (a
> org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
> at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
> at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
> - locked <0x000000055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
> at
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> at
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
> - locked <0x000000059869eed8> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
> at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
> at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
> - waiting to lock <0x0000000486ce6620> (a
> org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
> at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
> at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453)
> - locked <0x00000002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
> at java.lang.Thread.run(Thread.java:744)
> "MemStoreFlusher.0":
> at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
> - waiting to lock <0x00000002fae5ebe0> (a
> org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
> at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
> at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
> - locked <0x0000000486ce6620> (a org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
> at
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> at
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
> - locked <0x00000004888f6848> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
> at java.lang.Thread.run(Thread.java:744)
>
> Found 1 deadlock.
>
> **********************************************************************
>
>
> the thread "MemStoreFlusher.0" is closing outputStream and remove it's
> lease ;
> other side the daemon thread "LeaseRenewer" failed to connect active nn
>  for renewing  lease,but  got SocketTimeoutException   cause of network is
> not good,so abort outputstream.
> then deadlock is made.
>
> and it seems not solved at Hadoop 2.7.1 .If confirmed , we can fixed the
> issue.
>
>
> daniedeng(邓飞)
>
>
>

Mime
View raw message