[ https://issues.apache.org/jira/browse/HDFS-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238481#comment-14238481
]
Hadoop QA commented on HDFS-7489:
---------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12685850/HDFS-7489-v2.patch
against trunk revision 6c5bbd7.
{color:red}-1 patch{color}. Trunk compilation may be broken.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8955//console
This message is automatically generated.
> Slow FsVolumeList checkDirs can hang datanodes
> ----------------------------------------------
>
> Key: HDFS-7489
> URL: https://issues.apache.org/jira/browse/HDFS-7489
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.5.0, 2.6.0
> Reporter: Noah Lorang
> Attachments: HDFS-7489-v1.patch, HDFS-7489-v2.patch
>
>
> Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes hanging their
heartbeat and requests from clients. After some digging, I identified the culprit as being
the checkDiskError() triggered by catching IOExceptions (in our case, SocketExceptions being
triggered on one datanode by ReplicaAlreadyExistsExceptions on another datanode).
> Thread dumps reveal that the checkDiskErrors() thread is holding a lock on the FsVolumeList:
> {code}
> "Thread-409" daemon prio=10 tid=0x00007f4e50200800 nid=0x5b8e runnable [0x00007f4e2f855000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.list(Native Method)
> at java.io.File.list(File.java:973)
> at java.io.File.listFiles(File.java:1051)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
> - locked <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
> at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> Other things would then lock the FsDatasetImpl while waiting for the FsVolumeList, e.g.:
> {code}
> "DataXceiver for client at /10.10.0.52:46643 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_1099996962574]"
daemon prio=10 tid=0x00007f4e55561000 nid=0x406d waiting for monitor entry [0x00007f4e3106d000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
> - waiting to lock <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
> - locked <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
> at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:167)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
> at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
> at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> That lock on the FsDatasetImpl then causes other threads to block:
> {code}
> "Thread-127" daemon prio=10 tid=0x00007f4e4c67d800 nid=0x2e02 waiting for monitor entry
[0x00007f4e33390000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:228)
> - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
> at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
> at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
> at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:650)
> at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101)
> "DataNode: [[[DISK]file:/data/0/dfs/dn/, [DISK]file:/data/1/dfs/dn/, [DISK]file:/data/2/dfs/dn/,
[DISK]file:/data/3/dfs/dn/, [DISK]file:/data/4/dfs/dn/, [DISK]file:/data/5/dfs/dn/, [DISK]file:/data/6/dfs/dn/,
[DISK]file:/data/7/dfs/dn/, [DISK]file:/data/8/dfs/dn/, [DISK]file:/data/9/dfs/dn/, [DISK]file:/data/10/dfs/dn/,
[DISK]file:/data/11/dfs/dn/, [DISK]file:/data/12/dfs/dn/, [DISK]file:/data/13/dfs/dn/, [DISK]file:/data/14/dfs/dn/,
[DISK]file:/data/15/dfs/dn/, [DISK]file:/data/16/dfs/dn/, [DISK]file:/data/17/dfs/dn/, [DISK]file:/data/18/dfs/dn/,
[DISK]file:/data/19/dfs/dn/, [DISK]file:/data/20/dfs/dn/, [DISK]file:/data/21/dfs/dn/, [DISK]file:/data/22/dfs/dn/,
[DISK]file:/data/23/dfs/dn/, [DISK]file:/data/24/dfs/dn/, [DISK]file:/data/25/dfs/dn/, [DISK]file:/data/26/dfs/dn/,
[DISK]file:/data/27/dfs/dn/, [DISK]file:/data/28/dfs/dn/, [DISK]file:/data/29/dfs/dn/, [DISK]file:/data/30/dfs/dn/,
[DISK]file:/data/31/dfs/dn/, [DISK]file:/data/32/dfs/dn/, [DISK]file:/data/33/dfs/dn/, [DISK]file:/data/34/dfs/dn/,
[DISK]file:/data/35/dfs/dn/]] heartbeating to bigdata-01.sc-chi-int.37signals.com/10.10.0.211:8020"
daemon prio=10 tid=0x00007f4e553a5800 nid=0x2d66 waiting for monitor entry [0x00007f4e361be000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:116)
> - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:132)
> - locked <0x000000063b182d80> (a java.lang.Object)
> at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:572)
> at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
> at java.lang.Thread.run(Thread.java:662)
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1948416574_103 at /10.10.0.169:44229
[Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073776794_1099996963072]" daemon
prio=10 tid=0x00007f4e55431000 nid=0x4ab8 waiting for monitor entry [0x00007f4e2a448000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:782)
> - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:101)
> at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:171)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
> at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
> at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> This occurs, I believe, because FsVolumeList#checkDirs() locks the entire FsVolumeList
to prevent multiple simultaneous checkDirs() calls. On a slow system under high IO or with
many disks, the checkDirs() call can take dozens of seconds or longer (empirically).
> I think this can be improved by holding a separate mutex limited to checkDirs (must like
datanode.checkDiskErrors() does) and only locking the full FsVolumeList if needed to update
the volume list if any disks are removed. I'll attach a patch that does this.
> We're running this patch in production and it's working as expected -- a lock is held
locally instead of on the entire FsVolumeList while checkDirs() is running:
> {code}
> "Thread-614" daemon prio=10 tid=0x000000004037b000 nid=0x7331 runnable [0x00007f4d45391000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1157)
> at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:182)
> - locked <0x000000063b24f540> (a java.lang.Object)
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
> at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> Feedback would be appreciated!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|