hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noah Lorang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-7489) Slow FsVolumeList checkDirs can hang datanodes
Date Mon, 08 Dec 2014 17:19:14 GMT
Noah Lorang created HDFS-7489:
---------------------------------

             Summary: Slow FsVolumeList checkDirs can hang datanodes
                 Key: HDFS-7489
                 URL: https://issues.apache.org/jira/browse/HDFS-7489
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.6.0, 2.5.0
            Reporter: Noah Lorang


Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes hanging their heartbeat
and requests from clients. After some digging, I identified the culprit as being the checkDiskError()
triggered by catching IOExceptions (in our case, SocketExceptions being triggered on one datanode
by ReplicaAlreadyExistsExceptions on another datanode).

Thread dumps reveal that the checkDiskErrors() thread is holding a lock on the FsVolumeList:
{code}
"Thread-409" daemon prio=10 tid=0x00007f4e50200800 nid=0x5b8e runnable [0x00007f4e2f855000]
   java.lang.Thread.State: RUNNABLE
        at java.io.UnixFileSystem.list(Native Method)
        at java.io.File.list(File.java:973)
        at java.io.File.listFiles(File.java:1051)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
        - locked <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
        at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
        at java.lang.Thread.run(Thread.java:662)
{code}

Other things would then lock the FsDatasetImpl while waiting for the FsVolumeList, e.g.:

{code}
"DataXceiver for client  at /10.10.0.52:46643 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_1099996962574]"
daemon prio=10 tid=0x00007f4e55561000 nid=0x406d waiting for monitor entry [0x00007f4e3106d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
        - waiting to lock <0x000000063b182ea0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
        - locked <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:167)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
        at java.lang.Thread.run(Thread.java:662)
{code}

That lock on the FsDatasetImpl then causes other threads to block:

{code}
"Thread-127" daemon prio=10 tid=0x00007f4e4c67d800 nid=0x2e02 waiting for monitor entry [0x00007f4e33390000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:228)
        - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:650)
        at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101)

"DataNode: [[[DISK]file:/data/0/dfs/dn/, [DISK]file:/data/1/dfs/dn/, [DISK]file:/data/2/dfs/dn/,
[DISK]file:/data/3/dfs/dn/, [DISK]file:/data/4/dfs/dn/, [DISK]file:/data/5/dfs/dn/, [DISK]file:/data/6/dfs/dn/,
[DISK]file:/data/7/dfs/dn/, [DISK]file:/data/8/dfs/dn/, [DISK]file:/data/9/dfs/dn/, [DISK]file:/data/10/dfs/dn/,
[DISK]file:/data/11/dfs/dn/, [DISK]file:/data/12/dfs/dn/, [DISK]file:/data/13/dfs/dn/, [DISK]file:/data/14/dfs/dn/,
[DISK]file:/data/15/dfs/dn/, [DISK]file:/data/16/dfs/dn/, [DISK]file:/data/17/dfs/dn/, [DISK]file:/data/18/dfs/dn/,
[DISK]file:/data/19/dfs/dn/, [DISK]file:/data/20/dfs/dn/, [DISK]file:/data/21/dfs/dn/, [DISK]file:/data/22/dfs/dn/,
[DISK]file:/data/23/dfs/dn/, [DISK]file:/data/24/dfs/dn/, [DISK]file:/data/25/dfs/dn/, [DISK]file:/data/26/dfs/dn/,
[DISK]file:/data/27/dfs/dn/, [DISK]file:/data/28/dfs/dn/, [DISK]file:/data/29/dfs/dn/, [DISK]file:/data/30/dfs/dn/,
[DISK]file:/data/31/dfs/dn/, [DISK]file:/data/32/dfs/dn/, [DISK]file:/data/33/dfs/dn/, [DISK]file:/data/34/dfs/dn/,
[DISK]file:/data/35/dfs/dn/]]  heartbeating to bigdata-01.sc-chi-int.37signals.com/10.10.0.211:8020"
daemon prio=10 tid=0x00007f4e553a5800 nid=0x2d66 waiting for monitor entry [0x00007f4e361be000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:116)
        - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:132)
        - locked <0x000000063b182d80> (a java.lang.Object)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:572)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
        at java.lang.Thread.run(Thread.java:662)

"DataXceiver for client DFSClient_NONMAPREDUCE_-1948416574_103 at /10.10.0.169:44229 [Receiving
block BP-1573746465-127.0.1.1-1352244533715:blk_1073776794_1099996963072]" daemon prio=10
tid=0x00007f4e55431000 nid=0x4ab8 waiting for monitor entry [0x00007f4e2a448000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:782)
        - waiting to lock <0x000000063b1f9a48> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:101)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:171)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
        at java.lang.Thread.run(Thread.java:662)

{code}


This occurs, I believe, because FsVolumeList#checkDirs() locks the entire FsVolumeList to
prevent multiple simultaneous checkDirs() calls. On a slow system under high IO or with many
disks, the checkDirs() call can take dozens of seconds or longer (empirically).

I think this can be improved by holding a separate mutex limited to checkDirs (must like datanode.checkDiskErrors()
does) and only locking the full FsVolumeList if needed to update the volume list if any disks
are removed. I'll attach a patch that does this. 

We're running this patch in production and it's working as expected -- a lock is held locally
instead of on the entire FsVolumeList while checkDirs() is running:

{code}
"Thread-614" daemon prio=10 tid=0x000000004037b000 nid=0x7331 runnable [0x00007f4d45391000]
   java.lang.Thread.State: RUNNABLE
        at java.io.UnixFileSystem.createDirectory(Native Method)
        at java.io.File.mkdir(File.java:1157)
        at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
        at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:182)
        - locked <0x000000063b24f540> (a java.lang.Object)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
        at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
        at java.lang.Thread.run(Thread.java:662)
{code}


Feedback would be appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message