hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9668) Many long-time BLOCKED threads on FsDatasetImpl in a tiered storage test
Date Mon, 14 Mar 2016 16:36:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193595#comment-15193595
] 

Colin Patrick McCabe commented on HDFS-9668:
--------------------------------------------

Thanks for revising this, [~jingcheng.du@intel.com].  I think that it looks much better now
that it is no longer a separate dataset implementation.  I revoke my -1.

A 10 gigabyte HDFS file that uses 5 MB HDFS blocks seems like an extremely unusual case. 
That would result in just that single file having 2,097,152 blocks.  I guess perhaps this
is intended to simulate a case where we have many small files leading to small blocks?

One thing that I can see about this code is that there are many cases where we could drop
the lock earlier than we do.  For example, in this function:

{code}
  @Override // FsDatasetSpi
  public synchronized Block getStoredBlock(String bpid, long blkid)
      throws IOException {
    File blockfile = getFile(bpid, blkid, false);
    if (blockfile == null) {
      return null;
    }
    final File metafile = FsDatasetUtil.findMetaFile(blockfile);
    final long gs = FsDatasetUtil.parseGenerationStamp(blockfile, metafile);
    return new Block(blkid, blockfile.length(), gs);
  }
{code}

The only thing that needs to be protected by the lock is the call to {{FsDatasetImpl#getFile}},
since it reads from the {{volumeMap}}.  {{FsDatasetUtil#findMetaFile}} doesn't need protection
since it just lists the block files in the directory, and {{parseGenerationStamp}} just applies
a regular expression to the metadata file name.

There are a lot of other cases like this.  I think reducing the unnecessary locking would
be better than making the locking more complex.  After all, even with lock striping, we may
find that several "hot" blocks share the same lock stripe, and therefore that we gain no more
concurrency.  I wonder what numbers you get if you just change these functions to drop the
lock except when they really need it to access the {{volumeMap}}?

I notice that this patch adds a reader/writer lock.  While this allows many concurrent readers,
it seems like it could allow starvation of writer threads.  If we are going to use an R/W
lock, I think we should choose a fair R/W lock to avoid this issue.

> Many long-time BLOCKED threads on FsDatasetImpl in a tiered storage test
> ------------------------------------------------------------------------
>
>                 Key: HDFS-9668
>                 URL: https://issues.apache.org/jira/browse/HDFS-9668
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in SSD/RAMDISK, and
all other files are stored in HDD), we observe many long-time BLOCKED threads on FsDatasetImpl
in DataNode. The following is part of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48521
[Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread
t@93336
>    java.lang.Thread.State: BLOCKED
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111)
> 	- waiting to lock <18324c9> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
owned by "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48520
[Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- None
> 	
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48520
[Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread
t@93335
>    java.lang.Thread.State: RUNNABLE
> 	at java.io.UnixFileSystem.createFileExclusively(Native Method)
> 	at java.io.File.createNewFile(File.java:1012)
> 	at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
> 	- locked <18324c9> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the test. Here following
is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy load take a
really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a slow storage
can block all the other same operations in the same DataNode, especially in HBase when many
wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation and users
can choose the implementation by configuring "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message