hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brahma Reddy Battula (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease
Date Tue, 01 Dec 2015 06:45:11 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brahma Reddy Battula updated HDFS-9294:
---------------------------------------
    Attachment: HDFS-9294.patch

> DFSClient  deadlock when close file and failed to renew lease
> -------------------------------------------------------------
>
>                 Key: HDFS-9294
>                 URL: https://issues.apache.org/jira/browse/HDFS-9294
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.2.0, 2.7.1
>         Environment: Hadoop 2.2.0
>            Reporter: 邓飞
>            Assignee: 邓飞
>            Priority: Blocker
>         Attachments: HDFS-9294.patch
>
>
> We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is 2.2.0),and it
should be HDFS BUG,at the time our network is not stable.
>  below is the stack:
> *************************************************************************************************************************************
> Found one Java-level deadlock:
> =============================
> "MemStoreFlusher.1":
>   waiting to lock monitor 0x00007ff27cfa5218 (object 0x00000002fae5ebe0, a org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   waiting to lock monitor 0x00007ff2e67e16a8 (object 0x0000000486ce6620, a org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
>   waiting to lock monitor 0x00007ff27cfa5218 (object 0x00000002fae5ebe0, a org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> Java stack information for the threads listed above:
> ===================================================
> "MemStoreFlusher.1":
> 	at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
> 	- waiting to lock <0x00000002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
> 	at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
> 	at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
> 	at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
> 	- locked <0x000000055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
> 	at org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
> 	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
> 	at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> 	at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
> 	at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
> 	- locked <0x000000059869eed8> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
> 	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
> 	at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
> 	at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
> 	- waiting to lock <0x0000000486ce6620> (a org.apache.hadoop.hdfs.DFSOutputStream)
> 	at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
> 	at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
> 	at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453)
> 	- locked <0x00000002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
> 	at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> 	at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
> 	at java.lang.Thread.run(Thread.java:744)
> "MemStoreFlusher.0":
> 	at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
> 	- waiting to lock <0x00000002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
> 	at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
> 	at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
> 	at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
> 	- locked <0x0000000486ce6620> (a org.apache.hadoop.hdfs.DFSOutputStream)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
> 	at org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
> 	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
> 	at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> 	at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
> 	at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
> 	- locked <0x00000004888f6848> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
> 	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
> 	at java.lang.Thread.run(Thread.java:744)
> Found 1 deadlock. 
> **********************************************************************
> the thread "MemStoreFlusher.0" is closing outputStream and remove it's lease ;
> other side the daemon thread "LeaseRenewer" failed to connect active nn  for renewing
 lease,but  got SocketTimeoutException   cause of network is not good,so abort outputstream.
> then deadlock is made.
> and it seems not solved at Hadoop 2.7.1 .If confirmed , we can fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message